Thanks for your response, Gilles. I think my biggest mistake was not reading *all* the documentation. I only read part of it and didn't realize I'd missed some of the instructions in other parts of the FAQ. If I can get everything working I'll write you an English version of what I did.
I've been working (initially) from: Question 4.10 The detailed instructions are at: http://www.quartier-rural.org/dl/elucu/htdig-vf/lisezmoi.html I'm using the "kit de francisation" 1.05 (the ht://dig web site has 1.03). I have ht://dig 3.1.6 installed with the cookies patch. I have two different "language" config files that include the main config file with my header sizes and pagination options and other generic config things. I run htdig using either en.conf or fr.conf. Each of those then includes ccfdp.conf which is the generic conf file. > Are you sure you haven't applied a patch to ht://Dig that would force it > to strip out accented characters? I think I am. I would like people to be able to enter a word either with or without the accent and find the right word. I thought that's what this paragraph meant, but now that I see the output I understand it's not what I wanted: If you're running version 3.1.6 of ht://Dig, you may also be interested in the accents fuzzy match algorithm in the search_algorithm attribute, which lets you treat accented and unaccented letters as equivalent in words. Note that if you use the accents algorithm, you need to rebuild the accents database each time you update your word database, using "htfuzzy accents". This command isn't in the default rundig script, so you may want to add it there. The accents fuzzy match algorithm is also in the 3.2 beta releases. There are also the boolean_keywords and boolean_syntax_errors attributes in 3.1.6 for changing other language-specific messages in htsearch. > 3.1.6 and 3.2 betas, and the other was a hack that mapped all ISO-8859-1 > (Latin 1) accented letters to their unaccented counterparts. If you've > applied that latter patch, > ftp://ftp.ccsf.org/htdig-patches/3.1.5/accents.zip, to any ht://Dig > version, then that would explain the problem. I'm confused about the difference between these two patches. I have 3.1.6 installed and am pretty sure I ran htfuzzy accents. Which of the two patches allows users to search for "francais" OR "fran�ais" and get matches for both? (i.e. one word, two matches not the boolean OR) > In an earlier e-mail, you mentioned that you "also grabbed the language > pack (for lack of a better term) from the web site." What pack are you > referring to specifically? http://www.quartier-rural.org/dl/elucu/htdig-vf/htdig-fr-1.0.5.tar.gz > It would be helpful to know the actual file > name and the location from which you got it. Also, which version of > ht://Dig are you running, and what patches, if any, have been applied. > See http://www.htdig.org/FAQ.html#q5.33 I used the instructions in 5.33. I am running 3.1.6 and have the cookies patch installed. The patch is from here: ftp://ftp.ccsf.org/htdig-patches/3.1.6/cookies.gz.0 >> Did you work from the language package on the web site? I will go >> through it again and try to understand what steps I missed. I've just >> found at one mistake. I was using fr_FR but I only have fr in >> /usr/share/locale Found that tid bit in: >> http://htdig.org/FAQ.html#q4.13 >> I also noticed that I don't have an LC_CTYPE in my >> /usr/share/locale/fr folder. I'll bug the list again once I've figured >> out how to fix this situation. I believe it involves "installing" >> fr_CA instead of the generic fr. > > Also have a look in /usr/lib/locale, as some systems (namely Linux > systems using more recent versions of glibc) put locale definitions > there. Any French locale that has the LC_CTYPE file should do, as any > national variations of a language shouldn't affect the character set > used. The locale is listed when I do locale -a. However I cannot find an LC_CTYPE anywhere on the system (for any language). I am running Debian woody distro (stable). (My home machine is woody unstable and does not have LC_CTYPE files either. I've emailed the debian-user email list for help on this.) The woody stable server has fr_CA as the system language but I still can't find the LC_CTYPE file. > Based on what you've reported, it doesn't sound like a locale problem to > me. Usually, if the locale you pick doesn't support accented letters, > these letters are treated as punctuation, causing words to be split up > wherever an accented character appears in a word, but the accented > characters still show up in the excerpts. You just can't search for > accented words because these words aren't put in the database. However, > you reported that the accents are stripped from the results page. Am I > misunderstanding you in interpretting this as meaning that accented > letters are replaced with their unaccented counterparts, for example, > that "�" appears as "e"? Or do you mean they disappear altogether? Correct, I meant the accent is stripped leaving an unaccented equivalent. "�" becomes "e" in my search results page and I cannot find words if I spell it with the "�." I think you've identified my problem as being the htfuzzy accents. My current problem is that the search engine is spiralling out of control. I.e. won't stop crawling. I have to read the log file tonight to see if it's a problem with the cookie patch, or a problem with our URL parameters. Thanks again for your help, it's much appreciated! -- Emma Jane Hogbin Xtrinsic ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

