The data size savings is almost certainly not worth the probable 20-40%
increase in CPU usage in most cases no?

I think it should be optional for those who have extremely large indices and
want to save some space (seems not necessary these days), and those who want
to maximize performance.


-----Original Message-----
From: Bernhard Messer [mailto:[EMAIL PROTECTED]
Sent: Monday, August 30, 2004 4:41 PM
To: [EMAIL PROTECTED]
Subject: Binary fields and data compression


hi developers,

a few month ago, there was a very interesting discussion about field
compression and the possibility to store binary field values within a
lucene document. Regarding to this topic, Drew Farris came up with a
patch to add the necessary functionality. I ran all the necessary tests
on his implementation and didn't find one problem. So the original
implementation from Drew could now be enhanced to compress the binary
field data (maybe even the text fields if they are stored only) before
writing to disc. I made some simple statistical measurements using the
java.util.zip package for data compression. Enabling it, we could save
about 40% data when compressing plain text files with a size from 1KB to
4KB. If there is still some interest, we could first try to update the
patch, because it's outdated due to several changes within the Fields
class. After finishing that, compression could be added to the updated
version of the patch.

sounds good to me, what do you think ?

best regards
Bernhard




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to