On 10/17/2007 03:23 PM, Graham TerMarsch wrote:
> The app accepts form data from the user, runs it through Data::FormValidator > to validate it, then stuffs it into our PostgreSQL database. We're expecting > users are going to cut/paste from MS-Word and as a result we're going to have > to deal with MS "smart quotes". > [snip] > What I'd like the solution to do is: > a) provide me a means of encoding/marking the data so that I can insert it > into our Pg database without it throwing an error, > b) allow viewing of the data to look the same as it did when it was entered. There have been several good replies on this. Using UTF-8 throughout the tool chain is a Good Idea. It can be painful to move there though, especially if you have legacy (non db) data in a variety of encodings, as I did in a recent project (some HTML docs with multiple encodings in the same file! -- a wonder they ever rendered at all). However, the specific case you're talking about is non-ASCII punctuation. I have found it sane to transliterate non-ASCII punctuation (smart quotes, etc) to ASCII just to normalize it, since if/when you edit the data later, you may not have the 'smart' editor magically taking care of toggling the quotes for you (changing leading to trailing, etc.). So my processing is similar to Michael's with one addition: * convert all incoming data to UTF-8. * selectively transliterate data to normalize punctuation (I have a 'hotlist' of troublesome characters, similar to the MT plugin someone else mentioned). * save in db as UTF-8. You might want to look at my Search::Tools project on CPAN, particularly the Search::Tools::UTF8 and Search::Tools::Transliterate classes. good luck. pek -- Peter Karman . [EMAIL PROTECTED] . http://peknet.com/ --------------------------------------------------------------------- Web Archive: http://www.mail-archive.com/[email protected]/ http://marc.theaimsgroup.com/?l=cgiapp&r=1&w=2 To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
