I have a special situation where my corpus to be indexed contains strings 
like

Please see number 72/111,222 for more ...

I would like my users to be able to perform successful searches on terms 
like:

72/111,222
72/111222 or
72111222

At first, the solution appears easy. Set

allow_numbers: true

in htdig.conf. Doing this, however, reveals a problem: htdig refuses to 
index the target string (72/111,222) as a single entity. That is, no matter 
what combination of conf directives I use (see next), htdig always indexes 
72/111,222 into two terms: one is 72111 and another is 222. [I should note, 
I believe this is what is happening...I can successfully search on 72111 and 
I can successfully search on 222.] That is, htdig recognizes that I want to 
index the numbers in the corpus, but it insists that strings like 72/111,222 
are two separate numbers.

I have tried these config directives:

valid_punctuation: ,
extra_word_characters: ,

in all the permutations. Unfortunately, I can't get htdig to index 
72/111,222 as a single entry: 72111222

At the very worse, if my users can't perform all three types of searches 
(72/111,222 72/111222 72111222), I would accept if they would succeed on the 
last.

I did try some limited locale: en_GB experiments to see if I could make the 
comma treated as a decimal, but still no positive result. htdig still 
insists on parsing 72/111,222 as two words.

Your thoughts would be appreciated.

3.1.6 Solaris 2.7




_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com



-------------------------------------------------------
This sf.net email is sponsored by: Dice - The leading online job board
for high-tech professionals. Search and apply for tech jobs today!
http://seeker.dice.com/seeker.epl?rel_code=31
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to