Re: Lucene Compression

eks dev Wed, 02 Apr 2008 07:10:46 -0700

the example you have sent is too small for the type of compression implemented 
in lucene. The problem is that you have to store decoding symbol table , header 
...* for each* document you compress.  
The best you can do for this would be to use some compressor with static 
decoding table (some entropy codec, huffman or so) and maybe some static 
dictionary compressor. As Grant said, the best way to do it would be to 
compress your field before storing it into lucene


----- Original Message ----
> From: Grant Ingersoll <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Wednesday, 2 April, 2008 2:09:37 PM
> Subject: Re: Lucene Compression
> 
> It's generally considered best practice to compress things first in  
> your app and then add them as a binary field.   That being said, I  
> don't see why that would blow up on it's own.  Have you tried  
> compressing it outside of Lucene to see what happens?  If you can  
> reproduce it as a test case for Lucene, that would be great.
> 
>  From FieldsWriter, Lucene's compression code looks like:
> private final byte[] compress (byte[] input) {
> 
>        // Create the compressor with highest level of compression
>        Deflater compressor = new Deflater();
>        compressor.setLevel(Deflater.BEST_COMPRESSION);
> 
>        // Give the compressor the data to compress
>        compressor.setInput(input);
>        compressor.finish();
> 
>        /*
>         * Create an expandable byte array to hold the compressed data.
>         * You cannot use an array that's the same size as the orginal  
> because
>         * there is no guarantee that the compressed data will be  
> smaller than
>         * the uncompressed data.
>         */
>        ByteArrayOutputStream bos = new  
> ByteArrayOutputStream(input.length);
> 
>        // Compress the data
>        byte[] buf = new byte[1024];
>        while (!compressor.finished()) {
>          int count = compressor.deflate(buf);
>          bos.write(buf, 0, count);
>        }
> 
>        compressor.end();
> 
>        // Get the compressed data
>        return bos.toByteArray();
>      }
> 
> 
> There is an interesting comment in that code about how the compressed  
> data won't necessarily be smaller, so maybe you have entered the  
> compression twilight zone.
> 
> HTH
> -Grant
> 
> 
> On Apr 2, 2008, at 12:51 AM, Sebastin wrote:
> 
> >
> > Hi All,
> >       is there any possibility to create compression store for the
> > following types of string in lucene index store?
> >
> >
> > String str = "II0264.D05|00022745|ABCDE|03/01/2008 00:23:12|00035|
> > 9840836588| 129382152520| 04F4243B600408|04F4243B600408|
> > |11919898456123|354943011025810L| "CPTBS2I"| "ABCD3E"|11| 
> > 1234510003243219I|"
> >
> >
> > I try to store these fields as Field.Store.COMPRESSION  but it  
> > exceeds the
> > original size of the file?
> >
> >
> > -- 
> > View this message in context: 
> http://www.nabble.com/Lucene-Compression-tp16442112p16442112.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
> 
> Lucene Helpful Hints:
> http://wiki.apache..org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 




      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Inbox http://uk.docs.yahoo.com/nowyoucan.html


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene Compression

Reply via email to