I have a Lucy index with one field called URL.

I would like to do substring searchs on this field, for example, find all the 
records whose URL includes http://www.somewhere.com/abc/ (i.e. all the urls 
which are part of the abc directory on that site).

Is there a way to do this?

I guess I could always treat the field as a tokenized string:


---
my $string_tokenizer = Lucy::Analysis::RegexTokenizer->new( pattern => '\w+');
my $analyzer = Lucy::Analysis::PolyAnalyzer->new( analyzers => 
[$string_tokenizer]);
---

But then I would probably have to do some pos-search processing to make sure 
that the URLS of the retrieved records actually DO fit the pattern, and that 
there are no differences in the non-word characters that were stripped out by 
the indexer.

I was wondering if there was a way to tokenize the string into individual 
characters instead, and whether that is advisable from a performance point of 
view.

Thx.

Alain Désilets
Agent de recherche | Research Officer 
Institut de technologie de l'information | Institute for Information Technology 
Conseil national de recherches du Canada | National Research Council of Canada

Reply via email to