Hi MIke, 

Thanks for the quick reply. I guess this approach worked ok in version 3.0.0 
since the project I'm working on relied on it? I know it's a long time ago 
maybe you don't remember :) 

I'm worried on my side that reconstructing a full doc in this situation might 
have a high performance cost so I'd like to avoid it if I can (or I might not 
have the original value of all fields available). Do you think it would work to 
just reconstruct the values for the field being modified, or am I likely to 
just run into more issues by modifying a loaded Document? 

Regards, 
Albert 

> From: "Michael McCandless" <luc...@mikemccandless.com>
> To: "java-user" <java-user@lucene.apache.org>, "albert macsweeny"
> <albert.macswe...@profium.com>
> Sent: Monday, 29 June, 2020 15:23:43
> Subject: Re: Adding fields with same field type complains that they have
> different term vector settings

> Hi Albert,
> Unfortunately, you have fallen into a common and sneaky Lucene trap.
> The problem happens because you loaded a Document from the index's stored 
> fields
> (the one you previously indexed) and then tried to modify that one and
> re-index.

> Lucene does not guarantee that this will work, because Lucene does not store 
> all
> information necessary to precisely reconstruct the original document you had
> indexed.

> The Document you loaded from the index is subtly different from the one you 
> had
> previously indexed. In particular, your custom FIELD_TYPE details were lost.

> To sidestep this tar pit you must fully reconstruct the document yourself each
> time you add it to the index.

> Mike McCandless

> [ http://blog.mikemccandless.com/ | http://blog.mikemccandless.com ]

> On Mon, Jun 29, 2020 at 9:56 AM Albert MacSweeny < [
> mailto:albert.macswe...@profium.com | albert.macswe...@profium.com ] > wrote:

>> Hi,

>> I'm upgrading a project to lucene 8.5.2 which had been using 3.0.0.

>> Some tests are failing with a strange issue. The gist of it is, we create 
>> fields
>> that need position and offset information. Inserting one field works ok, but
>> then searching for the document and adding another value for the same field
>> results in the following exception

>> java.lang.IllegalArgumentException: all instances of a given field name must
>> have the same term vectors settings (storeTermVectorPositions changed for
>> field="f1")
>> at
>> org.apache.lucene.index.TermVectorsConsumerPerField.start(TermVectorsConsumerPerField.java:166)
>> at 
>> org.apache.lucene.index.TermsHashPerField.start(TermsHashPerField.java:294)
>> at
>> org.apache.lucene.index.FreqProxTermsWriterPerField.start(FreqProxTermsWriterPerField.java:72)
>> at
>> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:810)
>> at
>> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442)
>> at
>> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406)
>> at
>> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
>> at
>> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495)
>> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
>> at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
>> at com.profium.sir.LuceneTest.writeDoc(LuceneTest.java:66)
>> at com.profium.sir.LuceneTest.testLucene(LuceneTest.java:58)

>> This is happening even though the exact same FieldType object is being used 
>> in
>> the field each time, and it is frozen.

>> I've isolated the problem to the following code snippet which reproduces it:

>> import java.io.IOException;
>> import java.nio.file.Path;

>> import org.apache.lucene.analysis.en.EnglishAnalyzer;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>> import org.apache.lucene.document.FieldType;
>> import org.apache.lucene.index.DirectoryReader;
>> import org.apache.lucene.index.IndexOptions;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.index.IndexWriterConfig;
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.store.Directory;
>> import org.apache.lucene.store.MMapDirectory;

>> public class LuceneTest {

>> private static FieldType FIELD_TYPE = new FieldType();

>> static {
>> FIELD_TYPE.setStored(true);
>> FIELD_TYPE.setTokenized(true);
>> FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
>> FIELD_TYPE.setStoreTermVectors(true);
>> FIELD_TYPE.setStoreTermVectorPayloads(true);
>> FIELD_TYPE.setStoreTermVectorPositions(true);
>> FIELD_TYPE.setStoreTermVectorOffsets(true);
>> FIELD_TYPE.freeze();
>> }

>> public static void main(String[] args) throws IOException {
>> testLucene();
>> }

>> public static void testLucene() throws IOException {
>> Document doc = new Document();
>> doc.add(new Field("f1", "foo", FIELD_TYPE));
>> writeDoc(doc);
>> IndexSearcher searcher = new
>> IndexSearcher(DirectoryReader.open(getDirectory()));
>> doc = searcher.doc(0);

>> doc.add(new Field("f1", "bar", FIELD_TYPE));
>> writeDoc(doc);
>> }

>> private static void writeDoc(Document doc)
>> throws IOException {
>> Directory directory = getDirectory();
>> IndexWriterConfig conf = new IndexWriterConfig(new EnglishAnalyzer());
>> IndexWriter writer = new IndexWriter(directory , conf);
>> writer.addDocument(doc);
>> writer.flush();
>> writer.close();
>> }

>> private static Directory getDirectory() throws IOException {
>> return new MMapDirectory(Path.of("lucenttest"));
>> }
>> }

>> Experimenting shows that if the following three properties are not set on the
>> FieldType, the exception is no longer thrown, but removing them breaks
>> functionality we have that depends on the position and offset info.

>> FIELD_TYPE.setStoreTermVectorPayloads(true);
>> FIELD_TYPE.setStoreTermVectorPositions(true);
>> FIELD_TYPE.setStoreTermVectorOffsets(true);

>> Perhaps I'm doing something I shouldn't be, thanks in advance for any help!

>> Regards,
>> Albert

>> Albert MacSweeny
>> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland
>> Tel. +358 (0)9 855 98 000 Mob. +353 (0)87 664 2560
>> Internet: [ http://www.profium.com/ | http://www.profium.com ]

>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [ mailto:java-user-unsubscr...@lucene.apache.org |
>> java-user-unsubscr...@lucene.apache.org ]
>> For additional commands, e-mail: [ mailto:java-user-h...@lucene.apache.org |
>> java-user-h...@lucene.apache.org ]

Reply via email to