Hi, Antonello Provenzano wrote: > > The fact is the port of CharDet to C# is made from Java starting > point: if you've checked the original JCharDet is quite outdated also > (latest release was 3 years ago). Yes, that's true. But the Python port is a lot newer and includes the Universal chardet (more below). > > I haven't tried yet, but I believe the current version should work for > detection of character encodings, since the encoding table is not > changed since that time. The current code works (I have it working in a project of mine - http://sublib.sf.net), but isn't complete anymore. If you look at http://www.mozilla.org/projects/intl/chardet.html , the new "universal charset detector" (which code is at http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/ ) includes more encodings. Just to name a few: ISO-8859-2 ISO-8859-5 ISO-8859-7 windows-1250 windows-1251 windows-1253
I've received feedback from Polish users, for instance, where the auto-detection fails and they have to manually select the encoding for things to work. Best regards, > > > On 3/10/07, Pedro Castro <[EMAIL PROTECTED]> wrote: >> Hi, >> >> This comes first as a question: is there currently a way to autodetect >> encodings in text files / strings? >> >> I realize there isn't, so would like ask if someone's interested on >> going forward with this. Mozilla has a great detector, written in C, >> which has been ported to other languages, like Java >> (http://jchardet.sourceforge.net/) and Python >> (http://chardet.feedparser.org/) for instance. A port exists in C# but >> is very outdated >> (http://www.conceptdevelopment.net/Localization/NCharDet/). >> >> This library would be of great help to many applications, mostly those >> working with files in different encodings, but basically any >> application reading plain-text files. >> >> -- >> Pedro Castro >> http://www.pedrocastro.org >> _______________________________________________ >> Mono-list maillist - [email protected] >> http://lists.ximian.com/mailman/listinfo/mono-list >> > -- Pedro Castro http://www.pedrocastro.org _______________________________________________ Mono-list maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-list
