aseek-devel  

[aseek-devel] [Patch] Improved UTF-8 conformance

Nikita V. Borodikhin
Tue, 06 Apr 2004 04:39:28 -0700

Hello, everybody!

There are some problems in ASPSeek when parsing UTF-8 pages, it doesn't
understand some chars, assumes they are zeros and stops parsing page.
There is an example where page parsing is ended at Unicode 'EM DASH' symbol:
http://forum.aspseek.org/index.php?t=msg&th=1080&start=0&;

There is actually problem in include/ucharset.h, where UTF->unicode algorithm
doesn't conform to Unicode standard. I attached a patch for both latest 1.2.10
and CVS versions here.

Nikita

Attachment: aspseek-1.2.10-utf_fix.patch
Description: Binary data

Attachment: aspseek-cvs-utf_fix.patch
Description: Binary data

  • [aseek-devel] [Patch] Improved UTF-8 conformance Nikita V. Borodikhin