Re: [jira] Commented: (LUCENE-648) Allow changing of ZIP compression level for compressed fields

Grant Ingersoll Wed, 16 Aug 2006 05:52:01 -0700


On Aug 16, 2006, at 8:32 AM, Nicolas Lalevée wrote:

Hi,
In the issue, you wrote that "This way the indexing level juststores opaquebinary fields, and then Document handles compress/uncompressing asneeded."
I have looked into the Lucene code, and it seems to me that it isField thatshould take care of compress/uncompress, and it is the FieldsReaderand
FieldsWriter that should only view binary data.
Or you mean that compression should be completely external to Lucene ?


I believe the consensus is it should be done externally.

In fact, from the end of the other thread "Flexible index format /PayloadsCont'd", I was discussing about how to cutomize the way data arestored. So Ihave looked deeper in the code and I think I have found a way to doso. Andas you could change the way is it stored, you also can define thecompressionlevel, or handle your own compression algorithm. I will show you apatch, butI have modified so much code because of my sevral tries, that Ineed first to
remove the unecessary changes. To describe it shortly :
- I have provided a way to provide you own FieldsReader andFieldsWriter (viaa factory). To create a IndexReader, you have to provide thatfactory; the
actual API is just using a default factory.
- I have moved the code of FieldsReader and FieldsReader that dothe field
data reading to a new class FieldData. The FieldsReader instanciates a
FieldData, do a fielddata.read(input), and do a new Field(fielddata,...). The
FieldsReader do a field.getFieldData().write(output);
- so extending FieldsReader, you can provide you own implementation of
FieldData, so you can implement the way you want how data arestored and
read.
The tests pass successfully, but I have an issue with that design :one thingthat is important I think is that in the current design, we canread an indexin an old format, and just do a writer.addIndexes() into a newformat. Withthe new design, you cannot, because the writer will use theFieldData.write
provided by the reader.
To be continued...

I would love to see this patch. I think one could make a pretty goodargument for this kind of implementation being done "cleanly", thatis, it shouldn't necessarily involve reworking the internals, butinstead could represent the foundation for a new, codec basedindexing mechanism (with an implementation that can read/write theexisting file format.)


cheers,
Nicolas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org

Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-648) Allow changing of ZIP compression level for compressed fields

Reply via email to