> If we change StandardTokenizer in this way then we risk breaking all > the applications that currently use it and depend on its current > behaviour.
My personal issue with the StandardTokenizer is that it splits off single letter prefixes, as in 't-shirt'. A query for 't-shirt' therefore also returns documents with 't. miller's shirt'. I can't imagine how this behavior could ever be considered useful or depended upon, but I may be wrong (perhaps someone has an example where it does make sense). -- Eric Jain --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
