Alan, 

Thank you for taking your time to look at this.  Nice catch.  I was using a 
copy of the org.apache.lucene.index.TestPayloads.  

I made the necessary changes to the analyzer to address this: I use non-mocked 
tokenizer and a new filter which would create a random payload (see attached).  
So, doc one and two will have the same token, but different payloads.  

I extended the testing to these versions:
<!--5.1.0-->
<!--5.5.5-->
<!--6.3.0-->
<!--7.7.2-->
<!--8.3.1-->

Same idea, SimpleTextCodec passes the test, but these ones don't:

//import org.apache.lucene.codecs.lucene50.Lucene50Codec;
//import org.apache.lucene.codecs.lucene54.Lucene54Codec;
//import org.apache.lucene.codecs.lucene62.Lucene62Codec;
//import org.apache.lucene.codecs.lucene70.Lucene70Codec;
//import org.apache.lucene.codecs.lucene80.Lucene80Codec;

This is also an issue on a running 6.3.0 Solr instance.  

Thanks,

Ivan










On Friday, February 28, 2020, 03:09:15 AM PST, Alan Woodward 
<romseyg...@gmail.com> wrote: 





Your TokenStreamComponents object is getting re-used, so only the first 
PayloadData object gets referenced by the PayloadFilter.

> On 28 Feb 2020, at 06:55, Ivan Provalov <iprov...@yahoo.com.INVALID> wrote:
> 
> I tested these versions and I can reproduce for each one: 
> 
> v6.3.0 
> v7.7.2 
> v8.3.1
> 
> <dependencies>
>    <dependency>
>        <groupId>org.apache.lucene</groupId>
>        <artifactId>lucene-test-framework</artifactId>
>        <version>8.3.1</version>
>    </dependency>
> </dependencies>
> 
> For 8.3.1, you will need to change MultiFields.getTermPositionsEnum(...) to 
> MultiTerms.getTermPostingsEnum(...).
> 
> 
> On Thursday, February 27, 2020, 09:45:32 PM PST, Ivan Provalov 
> <iprov...@yahoo.com.invalid> wrote: 
> 
> 
> 
> 
> 
> I noticed a weird payload behavior with Solr 6.3.0.  After writing the 
> Lucene62Codec specific unit test (attached) I think there could be a bug 
> which allows for the same term payloads to be written into another document's 
> same term payload (or the second payload for the second document being 
> skipped).  
> 
> For comparison, I added SimpleTextCodec which doesn't behave this way.  
> Should I open a JIRA for this? 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Attachment: TestPayloads.java
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to