The data size savings is almost certainly not worth the probable 20-40% increase in CPU usage in most cases no?
I think it should be optional for those who have extremely large indices and want to save some space (seems not necessary these days), and those who want to maximize performance. -----Original Message----- From: Bernhard Messer [mailto:[EMAIL PROTECTED] Sent: Monday, August 30, 2004 4:41 PM To: [EMAIL PROTECTED] Subject: Binary fields and data compression hi developers, a few month ago, there was a very interesting discussion about field compression and the possibility to store binary field values within a lucene document. Regarding to this topic, Drew Farris came up with a patch to add the necessary functionality. I ran all the necessary tests on his implementation and didn't find one problem. So the original implementation from Drew could now be enhanced to compress the binary field data (maybe even the text fields if they are stored only) before writing to disc. I made some simple statistical measurements using the java.util.zip package for data compression. Enabling it, we could save about 40% data when compressing plain text files with a size from 1KB to 4KB. If there is still some interest, we could first try to update the patch, because it's outdated due to several changes within the Fields class. After finishing that, compression could be added to the updated version of the patch. sounds good to me, what do you think ? best regards Bernhard --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]