Hi

I wrote an app that converts Word files into a simpler format by first converting from .doc to html using scripting and Word's "Save as Web page" command followed by using NSXMLDocument to extract the parts I need. I'm finding that there are no good options when it comes to choosing a character encoding for the saved html (this is set in Word) because it uses some custom tags to embed special characters like bullets and that UTF-8 chokes on.

My basic process is to
- Use Applescript to tell Word to convert from .doc to html and save as utf-8
- Read the resultant file into an NSString with NSUTF8StringEncoding

I've tried saving the html from Word as NSLatin1Encoding but many important characters like double-quotes, apostrophes, dashes etc are translated to cap "O's" with various diacritical marks.

Not really sure how to proceed as there doesn't seem to be a single encoding useable by NSString that will both translate the quotes and allow me to access Word's "special" characters. Anyone have any ideas how I can read the html and treat it as a mostly normal character string without resorting to a custom binary character translation class?

Thanks for any help
_______________________________________________

Cocoa-dev mailing list ([email protected])

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to