Hello all,
I am from Hong Kong and I am a newbie to htdig but I would like to
implement a system with it. However, I wonder if the htdig supports the
East Asian characters sets.
The story is as follows. Unlike the English, French, Spanish, German,
Latin, Greek or whatever, the East Asian, or more specifically Chinese,
Japanese and Korean, uses no space characters to separate words. In the
Western languages, the concept of word is those separated by two
delimiters such as punctuations or spaces. But in the CJK character
sets, every character is a word. Therefore there is tens of thousands
of characters in the character sets. Because that every character has
its meaning, therefore, just put them in a chunk can become a sentence
and no separators needed.
Now, I found that htdig could be a great package for implementing my
system. However, I need the searching of East Asian characters. Can
htdig capable of parsing the CJK documents? If not, is that just the
parser need to be rewrited? If the answer is no, I would like to get
help from all of you to wrtie and contribute my patch for htdig.
Thanks a lot.
Adrian
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html