I don't understand why the compressed fields are not just handled
externally in the Document class - just add uncompress/compress
methods. This way all Lucene needs to understand is binary fields,
and you don't have any of these problems during merging or initial
indexing.
On Aug 11, 2006, at 12:18 AM, Michael Busch (JIRA) wrote:
[ http://issues.apache.org/jira/browse/LUCENE-648?
page=comments#action_12427421 ]
Michael Busch commented on LUCENE-648:
--------------------------------------
I think the compression level is only one part of the performance
problem. Another drawback of the current implementation is how
compressed fields are being merged: the FieldsReader uncompresses
the fields, the SegmentMerger concatenates them and the
FieldsWriter compresses the data again. The uncompress/compress
steps are completely unnecessary and result in a large overhead.
Before a document is written to the disk, the data of its fields is
even being compressed twice. Firstly, when the DocumentWriter
writes the single-document segment to the RAMDirectory, secondly,
when the SegmentMerger merges the segments inside the RAMDirectory
to write the merged segment to the disk.
Please checkout Jira Issue 629 (http://issues.apache.org/jira/
browse/LUCENE-629), where I recently posted a patch that fixes this
problem and increases the indexing speed significantly. I also
included some performance test results which quantify the
improvement. Mike, it would be great if you could also try out the
patched version for your tests with the compression level.
Allow changing of ZIP compression level for compressed fields
-------------------------------------------------------------
Key: LUCENE-648
URL: http://issues.apache.org/jira/browse/LUCENE-648
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: 2.0.0, 1.9, 2.0.1, 2.1
Reporter: Michael McCandless
Priority: Minor
In response to this thread:
http://www.gossamer-threads.com/lists/lucene/java-user/38810
I think we should allow changing the compression level used in the
call to java.util.zip.Deflator in FieldsWriter.java. Right now
it's hardwired to "best":
compressor.setLevel(Deflater.BEST_COMPRESSION);
Unfortunately, this can apparently cause the zip library to take a
very long time (10 minutes for 4.5 MB in the above thread) and so
people may want to change this setting.
One approach would be to read the default from a Java system
property, but, it seems recently (pre 2.0 I think) there was an
effort to not rely on Java System properties (many were removed).
A second approach would be to add static methods (and static class
attr) to globally set the compression level?
A third method would be in document.Field class, eg a
setCompressLevel/getCompressLevel? But then every time a document
is created with this field you'd have to call setCompressLevel
since Lucene doesn't have a global Field schema (like Solr).
Any other ideas / prefererences for either of these methods?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/
Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/
software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]