-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,

I took a quick look at the source code (tokenizer.c, config.h) and
delimiters for both headers and body seem to be a list non-alphabetic
ascii characters:

" .,;:\"/\\[]}{=+_()<>|&\n\t\r@-*~`?#$%^"

or a subset of those, depending on tokenizer. So, no support for
non-ascii token delimiters.


Tom

On 14-03-14 13:03, rl...@cirruscomputing.com wrote:
> The problem is more complex. Some charsets are used without the
> space char so where do you break the string int tokens? If kana and
> katakana are present then I have been told that you should break at
> transitions between charsets. Chinese chars could each be
> considered a token. Stevan, please tell more.
> 
> Though most of my mail is in English, charset support would be
> useful because some correspondents have Cantonese in the sig. 
> Cheers-- Rick
> 
> ML mail <mlnos...@yahoo.com> wrote:
> 
> Hi Tom,
> 
> Thanks for your feedback...
> 
> @Stevan, any input regarding support of dspam of various charsets?
> 
> Regards ML
> 
> 
> On Thursday, March 13, 2014 9:53 PM, Tom Hendrikx 
> <t...@whyscream.net> wrote: On 12-03-14 14:29, ML mail wrote:
>> Hello,
> 
>> Two questions:
> 
>> How well does dspam perform with more "exotic" foreign languages 
>> such as arabic, chinese, etc?
> 
> As long as dspam can break up the strings into tokens, it should
> able to do smething wth it. I don't know if the charset actually
> has any effect, maybe that's more a question for Stevan...
> 
> 
>> and how does dspam also work for fighting spam mails which
>> include their content in pictures jpeg/png/etc ?
> 
> Dspam does nothing with pictures. Extracting text from images (OCR)
> is completely different task, where dspam has support for. You
> could look into various projects with support for OCR.
> 
> Tom
> 
> 
> ------------------------------------------------------------------------------
>
> 
Learn Graph Databases - Download FREE O'Rei lly Book
> "Graph Databases" is the definitive new guide to graph databases
> and their applications. Written by three acclaimed leaders in the
> field, this first edition is now available. Download your free book
> today! http://p.sf.net/sfu/13534_NeoTech 
> _______________________________________________ Dspam-user mailing
> list Dspam-user@lists.sourceforge.net 
> <mailto:Dspam-user@lists.sourceforge.net> 
> https://lists.sourceforge.net/lists/listinfo/dspam-user
> 
> 
> ------------------------------------------------------------------------
>
>  Learn Graph Databases - Download FREE O'Reilly Book "Graph
> Databases" is the definitive new guide to graph databases and
> their applications. Written by three acclaimed leaders in the
> field, this first edition is now available. Download your free book
> today! http://p.sf.net/sfu/13534_NeoTech
> 
> ------------------------------------------------------------------------
>
>  Dspam-user mailing list Dspam-user@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/dspam-user
> 
> 
> -- Sent from my Android phone with K-9 Mail. Please excuse my
> brevity.
> 
> 
> ------------------------------------------------------------------------------
>
> 
Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases
> and their applications. Written by three acclaimed leaders in the
> field, this first edition is now available. Download your free book
> today! http://p.sf.net/sfu/13534_NeoTech
> 
> 
> 
> _______________________________________________ Dspam-user mailing
> list Dspam-user@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/dspam-user
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTJKpqAAoJEJPfMZ19VO/1QZoP/1VO2KQxSic6XgLNpkcf3sR5
+Vx5ChnTrzceCq30DDRgZGuGG1D+QSIgKnZzTnmDHdceyc/4+AJnYGIuApWP1oi1
OfONprIDPlzCta8qhjruueZeo94F2zYoseOG+itfcgcPIGu74MwYDNrdE/WVrkU0
31j+iETWAr6SoPly22wTontWK3/6Mcv95jOiu70rMIdRLS2Nw0h4ESnWAMb/6HIk
3IKr+kjP5gQFq7wFxEa8szQ7lbzWP6sUvnAlPRA90UvdC2WKdVJJfPPL32SVfiwI
Hv4CaZ7CfGUkMfS0MUbaW7PVN7YGJ1Tny2mFR9wJo3YZKbo6VBPUf9cEHxx8l84K
ztt/NLWIwjxp9O1DwGan8qTnut5vEy9okwdchEEDJnQPe3cgIKGIakBSTowsAUAE
/Le0SbW1nIErB3nv9HYRw0G/2IjkvNa9GPcIv0Giqgw8J8m3/J7jkfYE87o/5qpI
Z6pdRW9TPnh3WjHh/q/oW839vPHfDHAoJ9uRQmNdWKsmrjqVlykrV453iZD5Bn5l
A7ZuHinzJnWu28W2OG6zyJsAd4VYrvcSj/L/q7KN3zIngc1DQOguS2SS6frkts7C
BaRDGKQpXkcq7JFQI9WXiN/mnvO7BjUFXjHfaiW8xC0dM6NEfAoXJIFMlPT12qUa
F4TFbg1pDBchq6pm5QhL
=NWDl
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to