[jira] [Updated] (LUCENE-6680) BlendedInfixSuggester dedup bug

Arcadius Ahouansou (JIRA) Wed, 15 Jul 2015 08:24:24 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arcadius Ahouansou updated LUCENE-6680:
---------------------------------------
    Description: 
I expect the following test to pass, but it's failing in the latest Lucene 
5.2.1: 

{code:title=FailingTest.java|borderStyle=solid}
public void testBlendedInfixSuggesterDedupsOnWeightTitleAndPayload() throws 
Exception {

//Only the payload is different
    Input[] inputDocuments = new Input[]{
        new Input("lend me your ear", 7, new BytesRef("uid1")),
        new Input("lend me your ear", 7, new BytesRef("uid2")),
    };

    Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
    BlendedInfixSuggester suggester = new BlendedInfixSuggester(newDirectory(), 
a, a, AnalyzingInfixSuggester.DEFAULT_MIN_PREFIX_CHARS,
        BlendedInfixSuggester.BlenderType.POSITION_RECIPROCAL, 10, false);

    InputArrayIterator inputArrayIterator = new 
InputArrayIterator(inputDocuments);
    suggester.build(inputArrayIterator);

    List<Lookup.LookupResult> results = 
suggester.lookup(TestUtil.stringToCharSequence("ear", random()), 10, true, 
true);

    suggester.close();
    a.close();

    assertEquals(2, results.size());

  }

{code}

This test is failing because the BlendedInfixSuggester internally uses a 
TreeSet for storing the results and the corresponding Comparator only uses 
text+weight meaning that results with different payloads are collapsed into one.

[~mikemccand], The idea here is that if two ingested documents have the same 
title and weight, but different payloads, then they are two different things 
and folding them into a single document would mean loosing the payload 
information.

  was:
I expect the following test to pass, but it's failing in the latest Lucene 
5.2.1: 

{code:title=FailingTest.java|borderStyle=solid}
public void testBlendedInfixSuggesterDedupsOnWeightTitleAndPayload() throws 
Exception {

//Only the payload is different
    Input[] inputDocuments = new Input[]{
        new Input("lend me your ear", 7, new BytesRef("uid1")),
        new Input("lend me your ear", 7, new BytesRef("uid2")),
    };

    Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
    BlendedInfixSuggester suggester = new BlendedInfixSuggester(newDirectory(), 
a, a, AnalyzingInfixSuggester.DEFAULT_MIN_PREFIX_CHARS,
        BlendedInfixSuggester.BlenderType.POSITION_RECIPROCAL, 10, false);

    InputArrayIterator inputArrayIterator = new 
InputArrayIterator(inputDocuments);
    suggester.build(inputArrayIterator);

    List<Lookup.LookupResult> results = 
suggester.lookup(TestUtil.stringToCharSequence("ear", random()), 10, true, 
true);

    suggester.close();
    a.close();

    assertEquals(2, results.size());

  }

{code}

This test is failing because the BlendedInfixSuggester internally uses a 
TreeSet for storing the results and the corresponding Comparator only uses 
text+weight meaning that results with different payloads are collapsed into one.

The point here is that if two ingested documents have same title, weight but 
different payloads, then they are two different things and folding them into a 
single document would mean loosing the payload information 


> BlendedInfixSuggester dedup bug
> -------------------------------
>
>                 Key: LUCENE-6680
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6680
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 5.2.1
>            Reporter: Arcadius Ahouansou
>         Attachments: LUCENE-6680.patch
>
>
> I expect the following test to pass, but it's failing in the latest Lucene 
> 5.2.1: 
> {code:title=FailingTest.java|borderStyle=solid}
> public void testBlendedInfixSuggesterDedupsOnWeightTitleAndPayload() throws 
> Exception {
> //Only the payload is different
>     Input[] inputDocuments = new Input[]{
>         new Input("lend me your ear", 7, new BytesRef("uid1")),
>         new Input("lend me your ear", 7, new BytesRef("uid2")),
>     };
>     Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
>     BlendedInfixSuggester suggester = new 
> BlendedInfixSuggester(newDirectory(), a, a, 
> AnalyzingInfixSuggester.DEFAULT_MIN_PREFIX_CHARS,
>         BlendedInfixSuggester.BlenderType.POSITION_RECIPROCAL, 10, false);
>     InputArrayIterator inputArrayIterator = new 
> InputArrayIterator(inputDocuments);
>     suggester.build(inputArrayIterator);
>     List<Lookup.LookupResult> results = 
> suggester.lookup(TestUtil.stringToCharSequence("ear", random()), 10, true, 
> true);
>     suggester.close();
>     a.close();
>     assertEquals(2, results.size());
>   }
> {code}
> This test is failing because the BlendedInfixSuggester internally uses a 
> TreeSet for storing the results and the corresponding Comparator only uses 
> text+weight meaning that results with different payloads are collapsed into 
> one.
> [~mikemccand], The idea here is that if two ingested documents have the same 
> title and weight, but different payloads, then they are two different things 
> and folding them into a single document would mean loosing the payload 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6680) BlendedInfixSuggester dedup bug

Reply via email to