9 aug 2007 kl. 16.36 skrev Donna L Gresh:
Is there a good way to handle the following scenario:
I have certain terms with embedded periods for which I want to
leave them
intact (not split at the periods). For example in my application a
particular skill might be SAP.FIN (SAP financial), and it should
not be
split into SAP and FIN. Is there a way to specify a list of terms
such as
these which should not be split?
Updating the standard analyzer BNF to allow terms with punctuation is
not a
big deal. If there is a list of terms you want to allow, you would
handle
them in a TokenFilter. See StandadardTokenizer and StandardFilter.
You might save a couple of clock ticks by implementing a BNF rule rather
than a filter though.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]