Hi Albert, Unfortunately, you have fallen into a common and sneaky Lucene trap.
The problem happens because you loaded a Document from the index's stored fields (the one you previously indexed) and then tried to modify that one and re-index. Lucene does not guarantee that this will work, because Lucene does not store all information necessary to precisely reconstruct the original document you had indexed. The Document you loaded from the index is subtly different from the one you had previously indexed. In particular, your custom FIELD_TYPE details were lost. To sidestep this tar pit you must fully reconstruct the document yourself each time you add it to the index. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 29, 2020 at 9:56 AM Albert MacSweeny < albert.macswe...@profium.com> wrote: > Hi, > > I'm upgrading a project to lucene 8.5.2 which had been using 3.0.0. > > Some tests are failing with a strange issue. The gist of it is, we create > fields that need position and offset information. Inserting one field works > ok, but then searching for the document and adding another value for the > same field results in the following exception > > java.lang.IllegalArgumentException: all instances of a given field name > must have the same term vectors settings (storeTermVectorPositions changed > for field="f1") > at > org.apache.lucene.index.TermVectorsConsumerPerField.start(TermVectorsConsumerPerField.java:166) > at > org.apache.lucene.index.TermsHashPerField.start(TermsHashPerField.java:294) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.start(FreqProxTermsWriterPerField.java:72) > at > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:810) > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442) > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406) > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213) > at com.profium.sir.LuceneTest.writeDoc(LuceneTest.java:66) > at com.profium.sir.LuceneTest.testLucene(LuceneTest.java:58) > > This is happening even though the exact same FieldType object is being > used in the field each time, and it is frozen. > > I've isolated the problem to the following code snippet which reproduces > it: > > > import java.io.IOException; > import java.nio.file.Path; > > import org.apache.lucene.analysis.en.EnglishAnalyzer; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > import org.apache.lucene.document.FieldType; > import org.apache.lucene.index.DirectoryReader; > import org.apache.lucene.index.IndexOptions; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.search.IndexSearcher; > import org.apache.lucene.store.Directory; > import org.apache.lucene.store.MMapDirectory; > > public class LuceneTest { > > private static FieldType FIELD_TYPE = new FieldType(); > > static { > FIELD_TYPE.setStored(true); > FIELD_TYPE.setTokenized(true); > > FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > FIELD_TYPE.setStoreTermVectors(true); > FIELD_TYPE.setStoreTermVectorPayloads(true); > FIELD_TYPE.setStoreTermVectorPositions(true); > FIELD_TYPE.setStoreTermVectorOffsets(true); > FIELD_TYPE.freeze(); > } > > public static void main(String[] args) throws IOException { > testLucene(); > } > > public static void testLucene() throws IOException { > Document doc = new Document(); > doc.add(new Field("f1", "foo", FIELD_TYPE)); > writeDoc(doc); > IndexSearcher searcher = new > IndexSearcher(DirectoryReader.open(getDirectory())); > doc = searcher.doc(0); > > doc.add(new Field("f1", "bar", FIELD_TYPE)); > writeDoc(doc); > } > > private static void writeDoc(Document doc) > throws IOException { > Directory directory = getDirectory(); > IndexWriterConfig conf = new IndexWriterConfig(new > EnglishAnalyzer()); > IndexWriter writer = new IndexWriter(directory , conf); > writer.addDocument(doc); > writer.flush(); > writer.close(); > } > > private static Directory getDirectory() throws IOException { > return new MMapDirectory(Path.of("lucenttest")); > } > } > > Experimenting shows that if the following three properties are not set on > the FieldType, the exception is no longer thrown, but removing them breaks > functionality we have that depends on the position and offset info. > > FIELD_TYPE.setStoreTermVectorPayloads(true); > FIELD_TYPE.setStoreTermVectorPositions(true); > FIELD_TYPE.setStoreTermVectorOffsets(true); > > Perhaps I'm doing something I shouldn't be, thanks in advance for any help! > > Regards, > Albert > > > > Albert MacSweeny > Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland > Tel. +358 (0)9 855 98 000 Mob. +353 (0)87 664 2560 > Internet: http://www.profium.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >