[ 
http://issues.apache.org/jira/browse/LUCENE-627?page=comments#action_12420954 ] 

Yonik Seeley commented on LUCENE-627:
-------------------------------------

>>The original token stream is a valid one though right?
> I don't think so, see below...

Ah, right... I constructed the wrong one first.  I wanted pod and ipod in the 
same position... so the token stream looks like "i" ("pod"|"ipod") "foo".
Now this token-stream is correct, I believe, but the same problem happens.

A work-around is to swap the order that "pod" and "ipod" tokens appear, but it 
seems like any such workaround should be put into the highlighter rather than 
external to it.


  public void testOverlapAnalyzer2() throws Exception
  {

    String s = "iPod foo";
    // the token stream for the string above:
    TokenStream ts = new TokenStream() {
      Iterator iter;
      {
        List lst = new ArrayList();
        Token t;
        t = new Token("i",0,1);
        lst.add(t);
        t = new Token("pod",1,4);
        lst.add(t);
        t = new Token("ipod",0,4);
        t.setPositionIncrement(0);   // pod and ipod occupy the same token 
position.
        lst.add(t);
        t = new Token("foo",5,8);
        lst.add(t);
        iter = lst.iterator();
      }
      public Token next() throws IOException {
        return iter.hasNext() ? (Token)iter.next() : null;
      }
    };

    String srchkey = "foo";

    QueryParser parser=new QueryParser("text",new WhitespaceAnalyzer());
    Query query = parser.parse(srchkey);

    Highlighter highlighter = new Highlighter(new QueryScorer(query));

// Get 3 best fragments and seperate with a "..."
    String result = highlighter.getBestFragments(ts, s, 3, "...");
    String expectedResult="iPod <B>foo</B>";
    assertEquals(expectedResult,result);
  }

> highlighter problems with overlapping tokens
> --------------------------------------------
>
>          Key: LUCENE-627
>          URL: http://issues.apache.org/jira/browse/LUCENE-627
>      Project: Lucene - Java
>         Type: Bug

>   Components: Other
>     Versions: 2.0.1
>     Reporter: Yonik Seeley

>
> The lucene highlighter has problems when tokens that overlap are generated.
> For example, if analysis of iPod generates the tokens "i", "pod", "ipod" 
> (with pod and ipod in the same position),
> then the highlighter will output this as iipod, regardless of if any of those 
> tokens are highlighted.
> Discovered via http://issues.apache.org/jira/browse/SOLR-24

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to