After patching release 3.2.0b3 to compile with Mac OS X Server 1.2r3 (patch
see below), I found that rundig will not index words containing german
umlauts coded by plain HTML character entities.

While seeing many discussions around locales, I don't even think that the
locale is the real problem. The system is setup correctly (in this case
german) and even htsearch seems to behave "german".

Example: assume you have a HTML file containing the word phrase "Büro".
Run rundig and try to find the phrase "B�ro" with htsearch.

"htsearch" will correctly interpret and encode umlauts like �, �, �, etc. to
their HTML entities "ä", "ö", "ü", etc. You can see this by
typing the search phrase "B�ro". As a result htsearch will show up a page
with HTML source "No matches were found for 'Büro'"

When looking into the ascii wordlist retreived by "htdig -t ..." you don't
find any word containing umlauts. However most of my documents beeing
indexed contain words encoded by HTML character entities.

Thus, I think the problem seems to be related to indexing (htdig?) and not
locales.

Any more experiences?
Any hints for further patches?



----------------------------
Patch for htdig 3.2.0b3 / Mac OS X Server 1.2r3

1. Edit file htlib/mktime.c
2. Change the debug section (line 48-55) to be

#if DEBUG
# include <stdio.h>
# if STDC_HEADERS
# include <stdlib.h>
# endif
/* Make it work even if the system's libc has its own mktime routine. */
//# define mktime my_mktime
#endif /* DEBUG */
// AS COMMENT ABOVE INTRODUCES:
// SHOULD BE OUTSIDE FOR OS X Server 1.2r3:
// system libc.a contains mktime here!
# define mktime my_mktime

3. Not htdig should compile without further maintenance


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to