Problems Refactoring a Lucene Index

Stuart Goldberg Fri, 08 Jul 2016 09:13:15 -0700

As our software goes through its lifecycle, we sometimes have to alter
existing Lucene indexes. The way I have done that in the past is to open the
existing index for reading, read each Document, modify it and write that
Document to a new index. At the end of the process, I delete the old index
and rename the new index to the old name.


I do not do any tokenizing and use no analyzers.

I recently upgraded from Lucene 3.x to 4.10.4. Now I have the following
problem: Suppose the existing document has 10 fields in it and there's one I
have to modify. I remove that field and re-add it with the new settings.
Then I add the Document in its entirety to the new index. I run into the
following problems:

*       I get Exceptions thrown for the fields I don't even touch. That's
because their FieldType has 'tokenized' set to true and it fails because I
am using no analyzers. 'tokenized' is set to true even though when I
originally added the field to the original index I had 'tokenized' set to
false!
                
*       I have LongFields that come back with 'indexed' set to false even
though in the original index they were indexed! This makes the new index not
searchable on these fields and hence unusable. 

*       I can't even alter 'indexed' for these LongFields because for some
reason the FieldType instance comes back frozen from the IndexReader. Once
frozen,  you can't alter it. Even if I create a new FieldType, there is no
way to change the FieldType of a Field
                
It seems the returned FieldType contents are kind of random!

I did see in the Javadoc of IndexReader.document() that field metadata is
not returned and that, in fact, that they should have new kind of object
returned like 'StoredField' so there is no pretense of there being any
metadata.

I thought perhaps I could use FieldInfos. But that class returns the same
bogus metadata.  What then is the purpose of FieldInfos if the info is
bogus?

Am I not understanding something here? This is not very usable. What can I
do to work around this? Is this a Lucene bug? Oversight?

Problems Refactoring a Lucene Index

Reply via email to