I will try this tonight and see what happens (can't restart the site during the day).
Gustaf, you may well be correct that it's bad data, but this wasn't happening on the old system (or at least that's what they tell me) so I'm hoping to fix it with configuration. Brian, I used pg_dump/restore. I have been using the "custom" dump format for a while, which unfortunately doesn't let me look at the data the way a text dump does, but I would have thought that a binary dump might be less prone to data corruption than a text one. Perhaps not, or maybe this was unavoidable. I don't know of any PG tools to fix the character set; I haven't run across anything like that in Google yet. But Torben sent me an email suggesting I try the Linux utility recode, which seems to be found here: http://recode.progiciels-bpi.ca/index.html. If nothing else works, I can try running this on a text dump and see if it fixes the bad data. thanks, janine On Dec 2, 2010, at 2:08 AM, Bernhard van Woerden wrote: > Just found the same as Russell, it started life as a single byte but needs > some conversion to match it with the correct unicode character which can then > be stored in utf-8. > > What happens if you ask AOLserver to output iso-8859-1 does it convert 0xc2 > 0x92 to 0x92 ? > ns_param OutputCharset iso-8859-1 > > On 2 December 2010 07:39, Janine Ohmer <[email protected]> wrote: > On Dec 1, 2010, at 11:16 AM, Bernhard van Woerden wrote: > >> Can you check the byte sequence of the string in the db that's causing a >> problem. > > Sort of... the column in question is of type text, and the only function I > can find that will convert from text to bytea is decode, not encode. So > here's what I did: > > select decode(answer_3, 'escape') from public_places where > public_place_id=1012; > > The interesting part of the result is "Park\302\222s", which is supposed to > be "Park's". > > I then ran > > select encode('\\302\\222':bytea, 'hex'); > > and got "c292". > > Google tells me that this is the same as U+0092, which is defined simply as > <control>, or "private use 2". > > This doesn't seem very helpful, but it's possible that my initial assumptions > on how to do this were wrong and I'm really just trying to look up garbage > data. :) > > Bernhard, does this give you any clues? > > thanks, > > janine > >> >> select encode(column_name::bytea,'hex') from ... >> or use get_byte to isolate >> >> >> On 1 December 2010 18:44, Janine Ohmer <[email protected]> wrote: >> Hi Brian, >> >> The old system isn't accessible anymore (something went very wrong and even >> though it's RAID and all, fsck ended up removing a bunch of files and making >> the system unbootable). But from looking at the files in etc from the >> backups I can deduce that LANG was set to the same thing as it is on the new >> system, en_US.UTF-8. >> >> I had tried setting these: >> >> ns_param HackContentType 1 >> ns_param URLCharset "utf-8" >> ns_param OutputCharset "utf-8" >> ns_param HttpOpenCharset "utf-8" >> >> Which did not work. I will try adding DefaultCharset just in case that is >> the key, but I'm not holding my breath... >> >> There is an environment variable I can set, PGCLIENTENCODING, which will let >> me specify what encoding to use for the client, so I can add that to >> nsd-postgres. But I'm not sure what to set it to - everything's already in >> UTF8, so there shouldn't *be* any encoding issues (famous last words, I know >> :). >> >> I can't restart the site during the day (very picky client) so will not be >> able to try anything until late afternoon. >> >> thanks, >> >> janine >> >> On Dec 1, 2010, at 3:32 AM, Fenton, Brian wrote: >> >> > Hi Janine >> > >> > this may not be the answer but it can't do any harm. You should consider >> > adding these to the AOLserver tcl file: >> > ns_param HackContentType 1 >> > ns_param DefaultCharset utf-8 >> > ns_param HttpOpenCharset utf-8 >> > ns_param OutputCharset utf-8 >> > ns_param URLCharset utf-8 >> > >> > Also, rule out any OS differences by checking your locale, LANG etc. Does >> > Postgres have an equivalent to Oracle's NLS_LANG? Hopefully you won't have >> > to dive into codepages and all that stuff! >> > >> > best wishes >> > Brian Fenton >> > >> > >> > ________________________________________ >> > From: AOLserver Discussion [[email protected]] On Behalf Of >> > Janine Ohmer [[email protected]] >> > Sent: 01 December 2010 05:19 >> > To: [email protected] >> > Subject: [AOLSERVER] Charset differences between 3.3+ad13 and 4.0.10? >> > >> > Me again... still working out the last details on those sites I had to >> > move. >> > >> > We're having some issues with characters like apostrophes and dashes >> > either disappearing (Safari) or showing up as garbage characters (Firefox). >> > >> > I'm using the same version of Postgres and the same codebase. The >> > databases on both systems use the UNICODE encoding. The main thing that's >> > different is the AOLserver version. There isn't anything about charset in >> > either the old or the new config file. >> > >> > Has anyone done this conversion (in recent memory, that is :) and knows >> > what the problem might be? >> > >> > thanks, >> > >> > janine >> > >> > --- >> > Janine Ohmer (formerly Sisk) >> > President/CEO of furfly, LLC >> > 503-693-6407 >> > >> > >> > -- >> > AOLserver - http://www.aolserver.com/ >> > >> > To Remove yourself from this list, simply send an email to >> > <[email protected]> with the >> > body of "SIGNOFF AOLSERVER" in the email message. You can leave the >> > Subject: field of your email blank. >> > >> > >> > -- >> > AOLserver - http://www.aolserver.com/ >> > >> > To Remove yourself from this list, simply send an email to >> > <[email protected]> with the >> > body of "SIGNOFF AOLSERVER" in the email message. You can leave the >> > Subject: field of your email blank. >> >> --- >> Janine Ohmer (formerly Sisk) >> President/CEO of furfly, LLC >> 503-693-6407 >> >> >> -- >> AOLserver - http://www.aolserver.com/ >> >> To Remove yourself from this list, simply send an email to >> <[email protected]> with the >> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: >> field of your email blank. >> >> >> -- >> AOLserver - http://www.aolserver.com/ >> >> >> To Remove yourself from this list, simply send an email to >> <[email protected]> with the >> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: >> field of your email blank. >> > > --- > Janine Ohmer (formerly Sisk) > President/CEO of furfly, LLC > 503-693-6407 > > > > > > -- > AOLserver - http://www.aolserver.com/ > > > To Remove yourself from this list, simply send an email to > <[email protected]> with the > body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: > field of your email blank. > > > > > -- > AOLserver - http://www.aolserver.com/ > > > To Remove yourself from this list, simply send an email to > <[email protected]> with the > body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: > field of your email blank. > --- Janine Ohmer (formerly Sisk) President/CEO of furfly, LLC 503-693-6407 -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[email protected]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
