That's a multibyte charset issue, I had the same problem. If you're using
the default analyzer (not the UTF8 analyzer) you need to convert the text
you want to highlight to ASCII//TRANSLIT if you have any non-ASCII
characters in it, as does the analyzer. 

I had to do the following to get it to work with non-ASCII characters
(assuming $content is utf8):

$query->highlightMatches(iconv('UTF-8', 'ASCII//TRANSLIT', $content));

Otherwise the token offsets as returned by the analyzer won't match the
offsets in the text you're highlighting.

This seems to be a bug either in highlightingMatches or the documentation
which fails to mention that fact.

best regards,
 
Stefan Oestreicher
 
-----Ursprüngliche Nachricht-----
Von: Jordan Moore [mailto:[EMAIL PROTECTED] 
Gesendet: Montag, 10. März 2008 23:53
An: [EMAIL PROTECTED]
Cc: Bradley Holt; [email protected]
Betreff: Re: Re: [fw-general] Lucene Highlighting

I'm parsing the query and highlighting the response output based on that
query.

$query = Zend_Search_Lucene_Search_QueryParser::parse($query);

$this->getResponse()->setBody(
    $query->highlightMatches(
        $this->getResponse()->getBody()
    )
);

In some cases, I'm seeing things like " tes" or "est " being highlighted
instead of "test".

On Mon, Mar 10, 2008 at 3:43 PM, Markus Fischer <[EMAIL PROTECTED]> wrote:
> How are you actually doing your highlight?
>
>  I think I remember there was a off-by-one bug in some deep code in the
1.0.* release. We hadn't had the problem since we moved to 1.5* which was
done anyway because of the wildcard search.
>
>  HTH
>  - Markus
>
>  -- Ursprüngl. Mitteil. --
>  Betreff:        Re: [fw-general] Lucene Highlighting
>  Von:    "Jordan Moore" <[EMAIL PROTECTED]>
>  Datum:          10.03.2008 21:17
>
>
>
>  No, it hasn't, and I'm pretty sure that highlighting doesn't use the  
> index anyway... you just parse a query and give it some HTML to  
> highlight.
>
>  On Mon, Mar 10, 2008 at 2:09 PM, Bradley Holt  
> <[EMAIL PROTECTED]> wrote:
>  > I have yet to use Lucene so my answer will probably sound 
> completely  > ignorant. With that disclaimer, is it possible that your 
> document has  > changed since the last time you created your index?
>  >
>  >
>  >
>  >  On Mon, Mar 10, 2008 at 4:54 PM, Jordan Moore 
> <[EMAIL PROTECTED]>  > wrote:
>  > > Has anyone ever had the problem of highlighting being off by a 
> few  > characters?
>  > >
>  > > --
>  > > Jordan Moore
>  > >
>  >
>  >
>  >
>  > --
>  > Bradley Holt
>  > [EMAIL PROTECTED]
>  >
>  >
>
>
>
>  --
>  Jordan Moore
>
>
>



--
Jordan Moore

Reply via email to