Another battle tested piece of code would be Mozilla's sniffer, if external libraries and it's license suit you.
This document is out of date, bur explains the ideas. http://www.mozilla.org/projects/intl/detectorsrc.html On Apr 26, 2011, at 3:39 PM, John Pannell <[email protected]> wrote: > Hi Laurent- > > I have an app that collects a lot of text off the web; my string creation > algorithm is something like the following: > > 1. Attempt to create an NSString with NSUTF8StringEncoding. > 2. If the string is nil, attempt to create the string using the encoding > returned from the server. > 3. If string is still nil, ask the Text Encoding Conversion Manager to sniff > out the encoding from the data. > 3a. This returns an array of likely encodings. For each item in the > array: > 3b. Attempt to create a string with the encoding. > > There was a little too much code associated with this to copy/paste into > email, but I'd be happy to share... I have a wrapper object for the needed > interaction with the Text Encoding Conversion Manager. Some more about it: > > http://developer.apple.com/library/mac/#documentation/Carbon/Reference/Text_Encodin_sion_Manager/Reference/reference.html%23//apple_ref/doc/uid/TP30000123 > > Hope this helps! > > John > > > John Pannell > http://www.positivespinmedia.com > > On Apr 26, 2011, at 12:53 PM, Nick Zitzmann wrote: > >> >> On Apr 26, 2011, at 12:49 PM, Laurent Daudelin wrote: >> >>>> TextEdit's encoding guesser just uses the built-in NSAttributedString >>>> method -initWithURL:options:documentAttributes:error:, which will guess >>>> the file's encoding when opening it. But it has been mentioned that >>>> heuristics are not infallible, and this method's heuristics are no >>>> exception. It does a good job overall, but I've found that it usually >>>> misinterprets UTF-8 format text. >>> >>> Yes, I know that all the guess jobs can fail. I was starting to be excited >>> when started reading your reply but if it usually misinterprets UTF-8, >>> that's a pretty significant problem... >> >> That was a long time ago, so it may have been fixed. But if it's still >> happening, then one workaround would be to try and open the file as UTF-8 >> first, and if that fails, then fall back on the above method. The UTF-8 >> parser often returns nil on text that is not in UTF-8 format IIRC. >> > > _______________________________________________ > > Cocoa-dev mailing list ([email protected]) > > Please do not post admin requests or moderator comments to the list. > Contact the moderators at cocoa-dev-admins(at)lists.apple.com > > Help/Unsubscribe/Update your Subscription: > http://lists.apple.com/mailman/options/cocoa-dev/lordpixel%40mac.com > > This email sent to [email protected] _______________________________________________ Cocoa-dev mailing list ([email protected]) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [email protected]
