Hi, Koji, I had the same problem as you. This is because CJK's n-gram analysis is different from single character's.
My get around is to use CJKHighlighter and CJKHighlightAnalyzer in sandbox. -- Chris Lu ------------ Lucene Search RAD on Any Database http://www.dbsight.net On 9/5/05, Koji Sekiguchi <[EMAIL PROTECTED]> wrote: > Hi again, > > I'm using highlighter to highlight terms in Japanese text, > but I cannot get preferable output. > > If I use StandardAnalyzer or SnowballAnalyzer w/ English, > getBestFragment() returns preferable outputs: > > Sample: (SnowballAnalyzer) > Text: A meeting will be held in the City Hall > TokenStream: > [a][meet][will][be][held][in][the][citi][hall] > Query Text: meet > Output: A <B>meeting</B> will be held in the City Hall > > But if I use JapaneseAnalyzer, which is most popular Analyzer > in Japan to get TokenStream from Japanese text, to highlight > Japanese text with Highlighter, whole text is highlighted: > > Sample: (JapaneseAnalyzer) > Text: AMeetingWillBeHeldInTheCityHall > TokenStream: > [A][Meeting][Will][Be][Held][In][The][City][Hall] > Query Text: Meeting > Output: <B>AMeetingWillBeHeldInTheCityHall</B> > > Please note that I use alphabet to show the Text at second sample > because most users in this mailing list can read it, but in reality, > I used Japanese characters for the Text. And you'll see that > JapaneseAnalyzer, > which uses Japanese dictionary on background to extract tokens > from text stream, can recognize tokens and produce TokenStream. > But highlighter.getBestFragment() highlighted whole text. > > Do I need to implement Fragmenter to highlight tokens correctly > for Japanese text? > > Thanks in advance, > > Koji > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]