Nikita V. Borodikhin
Tue, 06 Apr 2004 04:39:28 -0700
Hello, everybody! There are some problems in ASPSeek when parsing UTF-8 pages, it doesn't understand some chars, assumes they are zeros and stops parsing page. There is an example where page parsing is ended at Unicode 'EM DASH' symbol: http://forum.aspseek.org/index.php?t=msg&th=1080&start=0&
There is actually problem in include/ucharset.h, where UTF->unicode algorithm doesn't conform to Unicode standard. I attached a patch for both latest 1.2.10 and CVS versions here. Nikita
aspseek-1.2.10-utf_fix.patch
Description: Binary data
aspseek-cvs-utf_fix.patch
Description: Binary data