Hi MIke, Thanks for the quick reply. I guess this approach worked ok in version 3.0.0 since the project I'm working on relied on it? I know it's a long time ago maybe you don't remember :)
I'm worried on my side that reconstructing a full doc in this situation might have a high performance cost so I'd like to avoid it if I can (or I might not have the original value of all fields available). Do you think it would work to just reconstruct the values for the field being modified, or am I likely to just run into more issues by modifying a loaded Document? Regards, Albert > From: "Michael McCandless" <luc...@mikemccandless.com> > To: "java-user" <java-user@lucene.apache.org>, "albert macsweeny" > <albert.macswe...@profium.com> > Sent: Monday, 29 June, 2020 15:23:43 > Subject: Re: Adding fields with same field type complains that they have > different term vector settings > Hi Albert, > Unfortunately, you have fallen into a common and sneaky Lucene trap. > The problem happens because you loaded a Document from the index's stored > fields > (the one you previously indexed) and then tried to modify that one and > re-index. > Lucene does not guarantee that this will work, because Lucene does not store > all > information necessary to precisely reconstruct the original document you had > indexed. > The Document you loaded from the index is subtly different from the one you > had > previously indexed. In particular, your custom FIELD_TYPE details were lost. > To sidestep this tar pit you must fully reconstruct the document yourself each > time you add it to the index. > Mike McCandless > [ http://blog.mikemccandless.com/ | http://blog.mikemccandless.com ] > On Mon, Jun 29, 2020 at 9:56 AM Albert MacSweeny < [ > mailto:albert.macswe...@profium.com | albert.macswe...@profium.com ] > wrote: >> Hi, >> I'm upgrading a project to lucene 8.5.2 which had been using 3.0.0. >> Some tests are failing with a strange issue. The gist of it is, we create >> fields >> that need position and offset information. Inserting one field works ok, but >> then searching for the document and adding another value for the same field >> results in the following exception >> java.lang.IllegalArgumentException: all instances of a given field name must >> have the same term vectors settings (storeTermVectorPositions changed for >> field="f1") >> at >> org.apache.lucene.index.TermVectorsConsumerPerField.start(TermVectorsConsumerPerField.java:166) >> at >> org.apache.lucene.index.TermsHashPerField.start(TermsHashPerField.java:294) >> at >> org.apache.lucene.index.FreqProxTermsWriterPerField.start(FreqProxTermsWriterPerField.java:72) >> at >> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:810) >> at >> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442) >> at >> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406) >> at >> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250) >> at >> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495) >> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594) >> at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213) >> at com.profium.sir.LuceneTest.writeDoc(LuceneTest.java:66) >> at com.profium.sir.LuceneTest.testLucene(LuceneTest.java:58) >> This is happening even though the exact same FieldType object is being used >> in >> the field each time, and it is frozen. >> I've isolated the problem to the following code snippet which reproduces it: >> import java.io.IOException; >> import java.nio.file.Path; >> import org.apache.lucene.analysis.en.EnglishAnalyzer; >> import org.apache.lucene.document.Document; >> import org.apache.lucene.document.Field; >> import org.apache.lucene.document.FieldType; >> import org.apache.lucene.index.DirectoryReader; >> import org.apache.lucene.index.IndexOptions; >> import org.apache.lucene.index.IndexWriter; >> import org.apache.lucene.index.IndexWriterConfig; >> import org.apache.lucene.search.IndexSearcher; >> import org.apache.lucene.store.Directory; >> import org.apache.lucene.store.MMapDirectory; >> public class LuceneTest { >> private static FieldType FIELD_TYPE = new FieldType(); >> static { >> FIELD_TYPE.setStored(true); >> FIELD_TYPE.setTokenized(true); >> FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); >> FIELD_TYPE.setStoreTermVectors(true); >> FIELD_TYPE.setStoreTermVectorPayloads(true); >> FIELD_TYPE.setStoreTermVectorPositions(true); >> FIELD_TYPE.setStoreTermVectorOffsets(true); >> FIELD_TYPE.freeze(); >> } >> public static void main(String[] args) throws IOException { >> testLucene(); >> } >> public static void testLucene() throws IOException { >> Document doc = new Document(); >> doc.add(new Field("f1", "foo", FIELD_TYPE)); >> writeDoc(doc); >> IndexSearcher searcher = new >> IndexSearcher(DirectoryReader.open(getDirectory())); >> doc = searcher.doc(0); >> doc.add(new Field("f1", "bar", FIELD_TYPE)); >> writeDoc(doc); >> } >> private static void writeDoc(Document doc) >> throws IOException { >> Directory directory = getDirectory(); >> IndexWriterConfig conf = new IndexWriterConfig(new EnglishAnalyzer()); >> IndexWriter writer = new IndexWriter(directory , conf); >> writer.addDocument(doc); >> writer.flush(); >> writer.close(); >> } >> private static Directory getDirectory() throws IOException { >> return new MMapDirectory(Path.of("lucenttest")); >> } >> } >> Experimenting shows that if the following three properties are not set on the >> FieldType, the exception is no longer thrown, but removing them breaks >> functionality we have that depends on the position and offset info. >> FIELD_TYPE.setStoreTermVectorPayloads(true); >> FIELD_TYPE.setStoreTermVectorPositions(true); >> FIELD_TYPE.setStoreTermVectorOffsets(true); >> Perhaps I'm doing something I shouldn't be, thanks in advance for any help! >> Regards, >> Albert >> Albert MacSweeny >> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland >> Tel. +358 (0)9 855 98 000 Mob. +353 (0)87 664 2560 >> Internet: [ http://www.profium.com/ | http://www.profium.com ] >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [ mailto:java-user-unsubscr...@lucene.apache.org | >> java-user-unsubscr...@lucene.apache.org ] >> For additional commands, e-mail: [ mailto:java-user-h...@lucene.apache.org | >> java-user-h...@lucene.apache.org ]