I'm a little surprised that no one else mentioned it, but are you sure that you 
actually want to strip the characters?

As Sixten Otto said

> For what it's worth, another common cause of problems with stuff
> pasted from Word (at least on the web), is Word docs that contain
> characters from the Windows-1252 character set that are invalid UTF-8
> byte sequences. Most commonly, 0x80-0x9F, which is the range where
> Windows-1252 differs from ISO-Latin-1


0x80 to 0x9F in codepage 1252 inclues the Euro sign, the bullet (option-8 on 
the mac) the en-dash and em-dash... i.e. all things that will be found even in 
English text.

(Reference http://msdn.microsoft.com/en-us/goglobal/cc305145.aspx)

These can all be represented in unicode, but you'd have to run the text through 
a converter. Which will lead to the question, how do you know the encoding of 
what was pasted in?
Generally you wouldn't, but you can play guessing games based on probabilities 
if you see these char values.

My only point is, you may end up annoying users by dropping part of what they 
tried to paste in. This may or may not be acceptable in your case.

AndyT (lordpixel - the cat who walks through walls)
A little bigger on the inside

        (see you later space cowboy, you can't take the sky from me)


Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to