Alan, Thank you for taking your time to look at this. Nice catch. I was using a copy of the org.apache.lucene.index.TestPayloads.
I made the necessary changes to the analyzer to address this: I use non-mocked tokenizer and a new filter which would create a random payload (see attached). So, doc one and two will have the same token, but different payloads. I extended the testing to these versions: <!--5.1.0--> <!--5.5.5--> <!--6.3.0--> <!--7.7.2--> <!--8.3.1--> Same idea, SimpleTextCodec passes the test, but these ones don't: //import org.apache.lucene.codecs.lucene50.Lucene50Codec; //import org.apache.lucene.codecs.lucene54.Lucene54Codec; //import org.apache.lucene.codecs.lucene62.Lucene62Codec; //import org.apache.lucene.codecs.lucene70.Lucene70Codec; //import org.apache.lucene.codecs.lucene80.Lucene80Codec; This is also an issue on a running 6.3.0 Solr instance. Thanks, Ivan On Friday, February 28, 2020, 03:09:15 AM PST, Alan Woodward <romseyg...@gmail.com> wrote: Your TokenStreamComponents object is getting re-used, so only the first PayloadData object gets referenced by the PayloadFilter. > On 28 Feb 2020, at 06:55, Ivan Provalov <iprov...@yahoo.com.INVALID> wrote: > > I tested these versions and I can reproduce for each one: > > v6.3.0 > v7.7.2 > v8.3.1 > > <dependencies> > <dependency> > <groupId>org.apache.lucene</groupId> > <artifactId>lucene-test-framework</artifactId> > <version>8.3.1</version> > </dependency> > </dependencies> > > For 8.3.1, you will need to change MultiFields.getTermPositionsEnum(...) to > MultiTerms.getTermPostingsEnum(...). > > > On Thursday, February 27, 2020, 09:45:32 PM PST, Ivan Provalov > <iprov...@yahoo.com.invalid> wrote: > > > > > > I noticed a weird payload behavior with Solr 6.3.0. After writing the > Lucene62Codec specific unit test (attached) I think there could be a bug > which allows for the same term payloads to be written into another document's > same term payload (or the second payload for the second document being > skipped). > > For comparison, I added SimpleTextCodec which doesn't behave this way. > Should I open a JIRA for this? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
TestPayloads.java
Description: Binary data
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org