On Tue, Jan 31, 2012 at 01:19:38PM -0500, Desilets, Alain wrote:
> I was wondering if there was a way to tokenize the string into individual
> characters instead, and whether that is advisable from a performance point
> of view.

You can experiment with changing the 'pattern' argument to RegexTokenizer#new
to be '.' or '\\S'.  It will definitely be worse from a performance
standpoint, as matching a URL will now require a PhraseQuery with one term for
each letter rather than one term for each component matching \w+ in the URL,
and these terms will exist in virtually every document.

Marvin Humphrey

Reply via email to