According to Paul Smith:
> I have a special situation where my corpus to be indexed contains strings 
> like
> 
> Please see number 72/111,222 for more ...
> 
> I would like my users to be able to perform successful searches on terms 
> like:
> 
> 72/111,222
> 72/111222 or
> 72111222
> 
> At first, the solution appears easy. Set
> 
> allow_numbers: true
> 
> in htdig.conf. Doing this, however, reveals a problem: htdig refuses to 
> index the target string (72/111,222) as a single entity. That is, no matter 
> what combination of conf directives I use (see next), htdig always indexes 
> 72/111,222 into two terms: one is 72111 and another is 222. [I should note, 
> I believe this is what is happening...I can successfully search on 72111 and 
> I can successfully search on 222.] That is, htdig recognizes that I want to 
> index the numbers in the corpus, but it insists that strings like 72/111,222 
> are two separate numbers.
> 
> I have tried these config directives:
> 
> valid_punctuation: ,
> extra_word_characters: ,
> 
> in all the permutations. Unfortunately, I can't get htdig to index 
> 72/111,222 as a single entry: 72111222
> 
> At the very worse, if my users can't perform all three types of searches 
> (72/111,222 72/111222 72111222), I would accept if they would succeed on the 
> last.
> 
> I did try some limited locale: en_GB experiments to see if I could make the 
> comma treated as a decimal, but still no positive result. htdig still 
> insists on parsing 72/111,222 as two words.
> 
> Your thoughts would be appreciated.

What you need is to set allow_numbers to true, and make sure that both "/"
and "," are in valid_punctuation, but neither is in extra_word_characters.
That way, 72/111,222 can be searched as 72/111,222 or 72111,222 or
72/111222 or 72111222.  Note, however, that because of htdig's handling
of valid_punctuation, the number 72/111,222 will not only go into the
index as 72111222, but it will also go in as 72111, 111222, 72 (if
minimum_word_length is 2), 111 and 222.  So, searches for parts of one
of these compound numbers will still yield a match.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to