Re: I18n and l10n

Motoharu Kubo Tue, 17 Jan 2006 16:30:21 -0800

Justin Mason wrote:
>>>(1) rewrite from BODY to RAWBODY as Matsuda-san says.
>>>(2) invent NBODY (or something else) apart from BODY.  NBODY contains
>>>     normalized and tokenized version of body.  I once thought of this
>>>     idea but did not propose because BODY has problems I mentioned
>>>     above and overhead of executing nbody_test increases.
>>
>>I want (2), for the reason of compatibility of rules.
> 
> 
> +1, agreed.


I talked to Matsuda-san and I also now agree the idea of NBODY because
of compatibility issue for existing ruleset is extremly important.

I wrote that I don't like charset normalization and related features to
be option, but I changed my position.  It should be compile option or SA
option because UTF-8 aware regex will result performance loss.  Not all
SA users want this feature.  Instead, I want NBODY and Bayes with
normalized and tokenized text to be fully UTF-8 aware.

-- 
Motoharu Kubo
[EMAIL PROTECTED]

Re: I18n and l10n

Reply via email to