I could verify Token byte offsets
The sytsem outputs
aaa:0:3
bbb:0:3
ccc:4:7
offset is initialized
Is this problem Analyzer? Or, is it Tokenizer?
----- Original Message -----
From: "mark harwood" <markharw...@yahoo.co.uk>
To: <java-user@lucene.apache.org>
Sent: Thursday, July 02, 2009 12:55 AM
Subject: Re: Highligheter fails using JapaneseAnalyzer
How should I verify it?
Make sure the Token.startOffset and endOffset properties of Tokens produced
by your TokenStream correctly define the location of Token.termBuffer in the
original text.
----- Original Message ----
From: k.sayama <sake-gin...@nifty.com>
To: java-user@lucene.apache.org
Sent: Wednesday, 1 July, 2009 16:13:17
Subject: Re: Highligheter fails using JapaneseAnalyzer
Sorry
I can not verify the Token byte offsets produced by JapaneseAnalyzer
How should I verify it?
----- Original Message -----
From: "mark harwood" <markharw...@yahoo.co.uk>
To: <java-user@lucene.apache.org>
Sent: Wednesday, July 01, 2009 11:31 PM
Subject: Re: Highligheter fails using JapaneseAnalyzer
Can you verify the Token byte offsets produced by this particular analyzer
are correct?
----- Original Message ----
From: k.sayama <sake-gin...@nifty.com>
To: java-user@lucene.apache.org
Sent: Wednesday, 1 July, 2009 15:22:37
Subject: Re: Highligheter fails using JapaneseAnalyzer
hi
I verified it by using SimpleAnalyzer, StandardAnalyzer, and CJKAnalyzer.
but, The problem did not happen.
I think the problem of JapaneseAnalyzer.
Can this problem be solved?
Does the same thing happen when you use SimpleAnalyzer, or
StandardAnalyzer?
I have a sneaking suspicion that the : in your contents string is what's
causing your issue here, as : is a reserved character that denotes a
field specification. But I could be wrong.
Try swapping analyzers, if you no longer have the same issue with
Simple, try Standard. Assuming the same problem shows up there, I think
you might need to do something about the :.
Matt
k.sayama wrote:
hello.
i've tried to highlight string using Highligheter(2.4.1) and
JapaneseAnalyzer
but the following code extract show the problem
String F = "f";
String CONTENTS = "AAA :BBB CCC";
JapaneseAnalyzer analyzer = new JapaneseAnalyzer();
QueryParser qp = new QueryParser( F, analyzer );
Query query = qp.parse( "BBB" );
Highlighter h = new Highlighter( new QueryScorer( query, F ) );
System.out.println( h.getBestFragment( analyzer, F, CONTENTS ) );
The sytsem outputs
<B>AAA</B> :BBB CCC
When you change CONTENTS to "AAA _BBB CCC"
the system outputs
AAA _<B>BBB</B> CCC
Are there any problems?
Thanks in advance
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org