I have a special situation where my corpus to be indexed contains strings like
Please see number 72/111,222 for more ... I would like my users to be able to perform successful searches on terms like: 72/111,222 72/111222 or 72111222 At first, the solution appears easy. Set allow_numbers: true in htdig.conf. Doing this, however, reveals a problem: htdig refuses to index the target string (72/111,222) as a single entity. That is, no matter what combination of conf directives I use (see next), htdig always indexes 72/111,222 into two terms: one is 72111 and another is 222. [I should note, I believe this is what is happening...I can successfully search on 72111 and I can successfully search on 222.] That is, htdig recognizes that I want to index the numbers in the corpus, but it insists that strings like 72/111,222 are two separate numbers. I have tried these config directives: valid_punctuation: , extra_word_characters: , in all the permutations. Unfortunately, I can't get htdig to index 72/111,222 as a single entry: 72111222 At the very worse, if my users can't perform all three types of searches (72/111,222 72/111222 72111222), I would accept if they would succeed on the last. I did try some limited locale: en_GB experiments to see if I could make the comma treated as a decimal, but still no positive result. htdig still insists on parsing 72/111,222 as two words. Your thoughts would be appreciated. 3.1.6 Solaris 2.7 _________________________________________________________________ Send and receive Hotmail on your mobile device: http://mobile.msn.com ------------------------------------------------------- This sf.net email is sponsored by: Dice - The leading online job board for high-tech professionals. Search and apply for tech jobs today! http://seeker.dice.com/seeker.epl?rel_code=31 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

