On 21 Nov 2005, at 19:39, [EMAIL PROTECTED] wrote:
This is the results for the StandardTokenizer:
input - output token -
output type
1. 1.2 - 1.2 - <HOST>
2. 1.2. - 1.2 - <HOST>
3. a.b - a.b - <HOST>
4. a.b. - a.b. - <ACRONYM>
5.
www.apache.org - www.apache.org - <HOST>
6. www.apache.org. - www.apache.org.
- <ACRONYM>
Number 6 should still be a <HOST> type, shouldn't it? This
causes problems for the StandardFilter. Why is it saying its an
<ACRONYM>?
Because it's grammar is imperfect?!
The trailing '.' is throwing it off from what you expect. We'd
certainly welcome fixes to StandardTokenizer.jj in this regard.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]