Justin Mason wrote: >>>(1) rewrite from BODY to RAWBODY as Matsuda-san says. >>>(2) invent NBODY (or something else) apart from BODY. NBODY contains >>> normalized and tokenized version of body. I once thought of this >>> idea but did not propose because BODY has problems I mentioned >>> above and overhead of executing nbody_test increases. >> >>I want (2), for the reason of compatibility of rules. > > > +1, agreed.
I talked to Matsuda-san and I also now agree the idea of NBODY because of compatibility issue for existing ruleset is extremly important. I wrote that I don't like charset normalization and related features to be option, but I changed my position. It should be compile option or SA option because UTF-8 aware regex will result performance loss. Not all SA users want this feature. Instead, I want NBODY and Bayes with normalized and tokenized text to be fully UTF-8 aware. -- Motoharu Kubo [EMAIL PROTECTED]
