On 21 Nov 2005, at 19:39, [EMAIL PROTECTED] wrote:
This is the results for the StandardTokenizer:
   input - output token -
output type
1. 1.2   - 1.2          - <HOST>
2. 1.2.  - 1.2          - <HOST>

3. a.b   - a.b          - <HOST>
4. a.b.  - a.b.         - <ACRONYM>
5.
www.apache.org  - www.apache.org  - <HOST>
6. www.apache.org. - www.apache.org.
- <ACRONYM>

Number 6 should still be a <HOST> type, shouldn't it?  This
causes problems for the StandardFilter. Why is it saying its an <ACRONYM>?

Because it's grammar is imperfect?!

The trailing '.' is throwing it off from what you expect. We'd certainly welcome fixes to StandardTokenizer.jj in this regard.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to