Re: [jira] Commented: (LUCENE-648) Allow changing of ZIP compression level for compressed fields

robert engels Thu, 10 Aug 2006 22:47:17 -0700

I don't understand why the compressed fields are not just handledexternally in the Document class - just add uncompress/compressmethods. This way all Lucene needs to understand is binary fields,and you don't have any of these problems during merging or initialindexing.


On Aug 11, 2006, at 12:18 AM, Michael Busch (JIRA) wrote:

[ http://issues.apache.org/jira/browse/LUCENE-648?page=comments#action_12427421 ]
Michael Busch commented on LUCENE-648:
--------------------------------------
I think the compression level is only one part of the performanceproblem. Another drawback of the current implementation is howcompressed fields are being merged: the FieldsReader uncompressesthe fields, the SegmentMerger concatenates them and theFieldsWriter compresses the data again. The uncompress/compresssteps are completely unnecessary and result in a large overhead.Before a document is written to the disk, the data of its fields iseven being compressed twice. Firstly, when the DocumentWriterwrites the single-document segment to the RAMDirectory, secondly,when the SegmentMerger merges the segments inside the RAMDirectoryto write the merged segment to the disk.
Please checkout Jira Issue 629 (http://issues.apache.org/jira/browse/LUCENE-629), where I recently posted a patch that fixes thisproblem and increases the indexing speed significantly. I alsoincluded some performance test results which quantify theimprovement. Mike, it would be great if you could also try out thepatched version for your tests with the compression level.
Allow changing of ZIP compression level for compressed fields
-------------------------------------------------------------

                Key: LUCENE-648
                URL: http://issues.apache.org/jira/browse/LUCENE-648
            Project: Lucene - Java
         Issue Type: Improvement
         Components: Index
   Affects Versions: 2.0.0, 1.9, 2.0.1, 2.1
           Reporter: Michael McCandless
           Priority: Minor

In response to this thread:
      http://www.gossamer-threads.com/lists/lucene/java-user/38810
I think we should allow changing the compression level used in thecall to java.util.zip.Deflator in FieldsWriter.java. Right nowit's hardwired to "best":
      compressor.setLevel(Deflater.BEST_COMPRESSION);
Unfortunately, this can apparently cause the zip library to take avery long time (10 minutes for 4.5 MB in the above thread) and sopeople may want to change this setting.One approach would be to read the default from a Java systemproperty, but, it seems recently (pre 2.0 I think) there was aneffort to not rely on Java System properties (many were removed).A second approach would be to add static methods (and static classattr) to globally set the compression level?A third method would be in document.Field class, eg asetCompressLevel/getCompressLevel? But then every time a documentis created with this field you'd have to call setCompressLevelsince Lucene doesn't have a global Field schema (like Solr).
Any other ideas / prefererences for either of these methods?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-648) Allow changing of ZIP compression level for compressed fields

Reply via email to