Hello all,

I am from Hong Kong and I am a newbie to htdig but I would like to 
implement a system with it.  However, I wonder if the htdig supports the 
East Asian characters sets.
The story is as follows.  Unlike the English, French, Spanish, German, 
Latin, Greek or whatever, the East Asian, or more specifically Chinese, 
Japanese and Korean, uses no space characters to separate words.  In the 
Western languages, the concept of word is those separated by two 
delimiters such as punctuations or spaces.  But in the CJK character 
sets, every character is a word.  Therefore there is tens of thousands 
of characters in the character sets.  Because that every character has 
its meaning, therefore, just put them in a chunk can become a sentence 
and no separators needed.
Now, I found that htdig could be a great package for implementing my 
system.  However, I need the searching of East Asian characters.  Can 
htdig capable of parsing the CJK documents?  If not, is that just the 
parser need to be rewrited? If the answer is no, I would like to get 
help from all of you to wrtie and contribute my patch for htdig.
Thanks a lot.

Adrian


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to