I will try this tonight and see what happens (can't restart the site during the 
day).

Gustaf, you may well be correct that it's bad data, but this wasn't happening 
on the old system (or at least that's what they tell me) so I'm hoping to fix 
it with configuration.

Brian, I used pg_dump/restore.  I have been using the "custom" dump format for 
a while, which unfortunately doesn't let me look at the data the way a text 
dump does, but I would have thought that a binary dump might be less prone to 
data corruption than a text one.  Perhaps not, or maybe this was unavoidable. 

I don't know of any PG tools to fix the character set;  I haven't run across 
anything like that in Google yet.  But Torben sent me an email suggesting I try 
the Linux utility recode, which seems to be found here:  
http://recode.progiciels-bpi.ca/index.html.  If nothing else works, I can try 
running this on a text dump and see if it fixes the bad data.

thanks,

janine

On Dec 2, 2010, at 2:08 AM, Bernhard van Woerden wrote:

> Just found the same as Russell, it started life as a single byte but needs 
> some conversion to match it with the correct unicode character which can then 
> be stored in utf-8.
> 
> What happens if you ask AOLserver to output iso-8859-1 does it convert 0xc2 
> 0x92 to 0x92 ?
> ns_param  OutputCharset     iso-8859-1
> 
> On 2 December 2010 07:39, Janine Ohmer <[email protected]> wrote:
> On Dec 1, 2010, at 11:16 AM, Bernhard van Woerden wrote:
> 
>> Can you check the byte sequence of the string in the db that's causing a 
>> problem.
> 
> Sort of... the column in question is of type text, and the only function I 
> can find that will convert from text to bytea is decode, not encode.  So 
> here's what I did:
> 
> select decode(answer_3, 'escape') from public_places where 
> public_place_id=1012;
> 
> The interesting part of the result is  "Park\302\222s", which is supposed to 
> be "Park's".
> 
> I then ran 
> 
> select encode('\\302\\222':bytea, 'hex');
> 
> and got "c292".
> 
> Google tells me that this is the same as U+0092, which is defined simply as 
> <control>, or "private use 2".
> 
> This doesn't seem very helpful, but it's possible that my initial assumptions 
> on how to do this were wrong and I'm really just trying to look up garbage 
> data. :)
> 
> Bernhard, does this give you any clues?
> 
> thanks,
> 
> janine
> 
>> 
>> select encode(column_name::bytea,'hex') from ...
>> or use get_byte to isolate 
>> 
>> 
>> On 1 December 2010 18:44, Janine Ohmer <[email protected]> wrote:
>> Hi Brian,
>> 
>> The old system isn't accessible anymore (something went very wrong and even 
>> though it's RAID and all, fsck ended up removing a bunch of files and making 
>> the system unbootable).  But from looking at the files in etc from the 
>> backups I can deduce that LANG was set to the same thing as it is on the new 
>> system, en_US.UTF-8.
>> 
>> I had tried setting these:
>> 
>>        ns_param        HackContentType 1
>>        ns_param        URLCharset      "utf-8"
>>        ns_param        OutputCharset   "utf-8"
>>        ns_param        HttpOpenCharset "utf-8"
>> 
>> Which did not work.  I will try adding DefaultCharset just in case that is 
>> the key, but I'm not holding my breath...
>> 
>> There is an environment variable I can set, PGCLIENTENCODING, which will let 
>> me specify what encoding to use for the client, so I can add that to 
>> nsd-postgres.  But I'm not sure what to set it to - everything's already in 
>> UTF8, so there shouldn't *be* any encoding issues (famous last words, I know 
>> :).
>> 
>> I can't restart the site during the day (very picky client) so will not be 
>> able to try anything until late afternoon.
>> 
>> thanks,
>> 
>> janine
>> 
>> On Dec 1, 2010, at 3:32 AM, Fenton, Brian wrote:
>> 
>> > Hi Janine
>> >
>> > this may not be the answer but it can't do any harm. You should consider 
>> > adding these to the AOLserver tcl file:
>> > ns_param  HackContentType    1
>> > ns_param  DefaultCharset    utf-8
>> > ns_param  HttpOpenCharset    utf-8
>> > ns_param  OutputCharset      utf-8
>> > ns_param  URLCharset        utf-8
>> >
>> > Also, rule out any OS differences by checking your locale, LANG etc. Does 
>> > Postgres have an equivalent to Oracle's NLS_LANG? Hopefully you won't have 
>> > to dive into codepages and all that stuff!
>> >
>> > best wishes
>> > Brian Fenton
>> >
>> >
>> > ________________________________________
>> > From: AOLserver Discussion [[email protected]] On Behalf Of 
>> > Janine Ohmer [[email protected]]
>> > Sent: 01 December 2010 05:19
>> > To: [email protected]
>> > Subject: [AOLSERVER] Charset differences between 3.3+ad13 and 4.0.10?
>> >
>> > Me again... still working out the last details on those sites I had to 
>> > move.
>> >
>> > We're having some issues with characters like apostrophes and dashes 
>> > either disappearing (Safari) or showing up as garbage characters (Firefox).
>> >
>> > I'm using the same version of Postgres and the same codebase.  The 
>> > databases on both systems use the UNICODE encoding. The main thing that's 
>> > different is the AOLserver version.  There isn't anything about charset in 
>> > either the old or the new config file.
>> >
>> > Has anyone done this conversion (in recent memory, that is :) and knows 
>> > what the problem might be?
>> >
>> > thanks,
>> >
>> > janine
>> >
>> > ---
>> > Janine Ohmer (formerly Sisk)
>> > President/CEO of furfly, LLC
>> > 503-693-6407
>> >
>> >
>> > --
>> > AOLserver - http://www.aolserver.com/
>> >
>> > To Remove yourself from this list, simply send an email to 
>> > <[email protected]> with the
>> > body of "SIGNOFF AOLSERVER" in the email message. You can leave the 
>> > Subject: field of your email blank.
>> >
>> >
>> > --
>> > AOLserver - http://www.aolserver.com/
>> >
>> > To Remove yourself from this list, simply send an email to 
>> > <[email protected]> with the
>> > body of "SIGNOFF AOLSERVER" in the email message. You can leave the 
>> > Subject: field of your email blank.
>> 
>> ---
>> Janine Ohmer (formerly Sisk)
>> President/CEO of furfly, LLC
>> 503-693-6407
>> 
>> 
>> --
>> AOLserver - http://www.aolserver.com/
>> 
>> To Remove yourself from this list, simply send an email to 
>> <[email protected]> with the
>> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
>> field of your email blank.
>> 
>> 
>> --
>> AOLserver - http://www.aolserver.com/
>> 
>> 
>> To Remove yourself from this list, simply send an email to 
>> <[email protected]> with the
>> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
>> field of your email blank.
>> 
> 
> ---
> Janine Ohmer (formerly Sisk)
> President/CEO of furfly, LLC
> 503-693-6407
> 
> 
> 
> 
> 
> --
> AOLserver - http://www.aolserver.com/
> 
> 
> To Remove yourself from this list, simply send an email to 
> <[email protected]> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
> field of your email blank.
> 
> 
> 
> 
> --
> AOLserver - http://www.aolserver.com/
> 
> 
> To Remove yourself from this list, simply send an email to 
> <[email protected]> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
> field of your email blank.
> 

---
Janine Ohmer (formerly Sisk)
President/CEO of furfly, LLC
503-693-6407






--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<[email protected]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to