I have a Lucy index with one field called URL. I would like to do substring searchs on this field, for example, find all the records whose URL includes http://www.somewhere.com/abc/ (i.e. all the urls which are part of the abc directory on that site).
Is there a way to do this? I guess I could always treat the field as a tokenized string: --- my $string_tokenizer = Lucy::Analysis::RegexTokenizer->new( pattern => '\w+'); my $analyzer = Lucy::Analysis::PolyAnalyzer->new( analyzers => [$string_tokenizer]); --- But then I would probably have to do some pos-search processing to make sure that the URLS of the retrieved records actually DO fit the pattern, and that there are no differences in the non-word characters that were stripped out by the indexer. I was wondering if there was a way to tokenize the string into individual characters instead, and whether that is advisable from a performance point of view. Thx. Alain Désilets Agent de recherche | Research Officer Institut de technologie de l'information | Institute for Information Technology Conseil national de recherches du Canada | National Research Council of Canada
