On Jul 27, 2005, at 7:26 PM, Indu Abeyaratna wrote:
I have a field index as keyword. And have two records "J400-C-V1-
S10-T1" and
"J400-C-V-S10-T1"
When I search for "J400-C-V1-S10-T1", it returns me matching
record, but
when I Search for "J400-C-V-S10-T1" it doesn't return the matching
one.
Further I found that "J400-C-V-S10-T1" is incorrectly tokenised to
"J400-C"
and "V-S10-T1" but nothing like that happened to "J400-C-V1-S10-T1".
This happens when there is combination like "?-?-" and its get
tokenised
into "?" and "?-".
I attached test case for further clarification.
I am using StandardAnalyser and query parser.
Is this a bug in the lucene or JavaCC?? Or am I missing something
here? any
suggestion to get away with this?
It's not a "bug" per se.... but rather just how StandardAnalyzer
works. StandardAnalyzer is a general-purpose text analyzer, and
cannot reasonably deal with this issue and also deal with the much
more common scenario of "hyphenated-text" that should be split into
separate tokens.
As Otis said, this is really the job for a custom analyzer.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]