That's a multibyte charset issue, I had the same problem. If you're using
the default analyzer (not the UTF8 analyzer) you need to convert the text
you want to highlight to ASCII//TRANSLIT if you have any non-ASCII
characters in it, as does the analyzer.
I had to do the following to get it to work with non-ASCII characters
(assuming $content is utf8):
$query->highlightMatches(iconv('UTF-8', 'ASCII//TRANSLIT', $content));
Otherwise the token offsets as returned by the analyzer won't match the
offsets in the text you're highlighting.
This seems to be a bug either in highlightingMatches or the documentation
which fails to mention that fact.
best regards,
Stefan Oestreicher
-----Ursprüngliche Nachricht-----
Von: Jordan Moore [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 10. März 2008 23:53
An: [EMAIL PROTECTED]
Cc: Bradley Holt; [email protected]
Betreff: Re: Re: [fw-general] Lucene Highlighting
I'm parsing the query and highlighting the response output based on that
query.
$query = Zend_Search_Lucene_Search_QueryParser::parse($query);
$this->getResponse()->setBody(
$query->highlightMatches(
$this->getResponse()->getBody()
)
);
In some cases, I'm seeing things like " tes" or "est " being highlighted
instead of "test".
On Mon, Mar 10, 2008 at 3:43 PM, Markus Fischer <[EMAIL PROTECTED]> wrote:
> How are you actually doing your highlight?
>
> I think I remember there was a off-by-one bug in some deep code in the
1.0.* release. We hadn't had the problem since we moved to 1.5* which was
done anyway because of the wildcard search.
>
> HTH
> - Markus
>
> -- Ursprüngl. Mitteil. --
> Betreff: Re: [fw-general] Lucene Highlighting
> Von: "Jordan Moore" <[EMAIL PROTECTED]>
> Datum: 10.03.2008 21:17
>
>
>
> No, it hasn't, and I'm pretty sure that highlighting doesn't use the
> index anyway... you just parse a query and give it some HTML to
> highlight.
>
> On Mon, Mar 10, 2008 at 2:09 PM, Bradley Holt
> <[EMAIL PROTECTED]> wrote:
> > I have yet to use Lucene so my answer will probably sound
> completely > ignorant. With that disclaimer, is it possible that your
> document has > changed since the last time you created your index?
> >
> >
> >
> > On Mon, Mar 10, 2008 at 4:54 PM, Jordan Moore
> <[EMAIL PROTECTED]> > wrote:
> > > Has anyone ever had the problem of highlighting being off by a
> few > characters?
> > >
> > > --
> > > Jordan Moore
> > >
> >
> >
> >
> > --
> > Bradley Holt
> > [EMAIL PROTECTED]
> >
> >
>
>
>
> --
> Jordan Moore
>
>
>
--
Jordan Moore