Thanks Malcolm, On Dec 4, 6:12 pm, Malcolm Tredinnick <[EMAIL PROTECTED]> wrote: > Now you might well be able to have this happen automatically using the > "unicode" option to MySQLdb -- it knows how to convert between various > server-side encodings and Python unicode. So look at that parameter to > the connect() call. It's fairly well done in MySQLdb (it and PostgreSQL > were almost trivial to make work when we added Unicode support to > Django).
I actually had that set up already. I'm trying to look at it a little more closely. Here's a dpaste of a SQL call and a few columns. Look at the "fdescr" column output... it's showing the string is unicode but it has some characters in it like \x95 and \x92. http://dpaste.com/96601/ > Alternatively, if you're getting bytestrings backs, run them through a > decode() call: > > data = original_data.decode('cp1252') I tried this at the bottom of the above dpaste just to see... I know I'm not getting bytestrings back. So I tried it also without the unicode=True flag to connect and it produces different output than above: >>> row['fdescr'].decode('cp1252') u'Lefty Kreh is one of the most experienced, well-prepared, and thoughtful anglers in the world. In <i>101 Fly-Fishing Tips</i>, he shares this wealth of experience with a variety of common-sense solutions to the problems that anglers face. Included are tips on:<br / > \u2022how to pacify a fish<br /> \u2022which hook-sharpening tools to use and when<br /> \u2022how to take a rod apart when it\u2019s stuck<br /> \u2022what to do when a fish runs under your boat<br /> \u2022how to dry waders and find leaks<br /> \u2022why long hat brims affect casting accuracy<br /> \u2022and much more<br /><br />Sure to improve a fly fisher\u2019s success, comfort, and enjoyment while on the water. A must for any angler.<br /><br /><b>ABOUT THE AUTHOR</ b><br />Lefty Kreh is an internationally known and respected master in the field of fly fishing, and the author of numerous articles and books on the subject. He lives in Maryland.' Now instead of \x95 I get \u2022 (which is a bullet). >From here I'm not sure what the best way to proceed is... do I want the \u2022 version instead, in which case, should I not pass in unicode=True and manually decode each column? I'm partly thinking that since this is a one-time operation (actually, it's a many one-time operation until we're ready to switch over to the new site), I could scan for any "\x" characters and manually replace them. There are likely only a handful as in the above. But how does one scan and replace these so the output is correct? > Just for laughs, though, try running "file" on the csv file you generate > and make sure it, at least, detects that it is a UTF-16 file. It actually tells me nothing... > file export.csv export.csv: Thanks, Rob --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---