Re: [GENERAL] How to find freak UTF-8 character?

2011-10-04 Thread Daniele Varrazzo
On Sat, Oct 1, 2011 at 10:16 PM, Leif Biberg Kristensen l...@solumslekt.org wrote: Yes I know that this is a perfectly legal UTF-8 character. It crept into my database as a result of a copy-and-paste job from a web site. The point is that it doesn't have a counterpart in ISO-8859-1 to which I

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-03 Thread Andrew Sullivan
On Sat, Oct 01, 2011 at 11:16:06PM +0200, Leif Biberg Kristensen wrote: But thank you for the idea, I think that I will strip out at least any lrm; entities from text entered into the database. If you're getting lrm, you might want to check for ZWJ and ZWNJ code points too. They're nasty

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-02 Thread pasman pasmański
Its simple to remove strange chars with regex_replace. 2011/10/1, Leif Biberg Kristensen l...@solumslekt.org: On Saturday 1. October 2011 21.29.45 Andrew Sullivan wrote: I see you found it, but note that it's _not_ a spurious UTF-8 character: it's a right-to-left mark, ans is a perfectly ok

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-02 Thread Leif Biberg Kristensen
On Sunday 2. October 2011 15.53.50 pasman pasmański wrote: Its simple to remove strange chars with regex_replace. True, but first you have to know how to represent a «strange char» in Postgresql :P It isn't all that obvious, and it's difficult to search for the solution. I tried a lot of

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-02 Thread Cédric Villemain
2011/10/2 Leif Biberg Kristensen l...@solumslekt.org: On Sunday 2. October 2011 15.53.50 pasman pasmański wrote: Its simple to remove strange chars  with regex_replace. True, but first you have to know how to represent a «strange char» in Postgresql :P It isn't all that obvious, and it's

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-02 Thread Leif Biberg Kristensen
On Sunday 2. October 2011 16.34.27 Cédric Villemain wrote: you may have miss this one : http://tapoueh.org/blog/2010/02/23-getting-out-of-sql_ascii-part-2.html That's an, uh, interesting article, but as far as I can see, it doesn't tell anything about how to find a perfectly legal three-byte

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-02 Thread Raymond O'Donnell
On 02/10/2011 15:55, Leif Biberg Kristensen wrote: On Sunday 2. October 2011 16.34.27 Cédric Villemain wrote: you may have miss this one : http://tapoueh.org/blog/2010/02/23-getting-out-of-sql_ascii-part-2.html That's an, uh, interesting article, but as far as I can see, it doesn't tell

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-02 Thread Leif Biberg Kristensen
On Sunday 2. October 2011 17.54.52 Raymond O'Donnell wrote: I may have missed it upthread, but if you haven't already would you consider writing up your solution for the benefit of the archives? I did, in my own first reply to the original message: SELECT * FROM foo WHERE bar LIKE

[GENERAL] How to find freak UTF-8 character?

2011-10-01 Thread Leif Biberg Kristensen
I've somehow introduced a spurious UTF-8 character in my database. When I try to export to an application that requires LATIN1 encoding, my export script bombs out with this message: psycopg2.DataError: character 0xe2808e of encoding UTF8 has no equivalent in LATIN1 I figure that it should be

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-01 Thread Leif Biberg Kristensen
On Saturday 1. October 2011 07.55.01 Leif Biberg Kristensen wrote: I've somehow introduced a spurious UTF-8 character in my database. When I try to export to an application that requires LATIN1 encoding, my export script bombs out with this message: psycopg2.DataError: character 0xe2808e of

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-01 Thread Andrew Sullivan
On Sat, Oct 01, 2011 at 07:55:01AM +0200, Leif Biberg Kristensen wrote: I've somehow introduced a spurious UTF-8 character in my database. When I try to export to an application that requires LATIN1 encoding, my export script bombs out with this message: psycopg2.DataError: character

Re: [GENERAL] How to find freak UTF-8 character?

2011-10-01 Thread Leif Biberg Kristensen
On Saturday 1. October 2011 21.29.45 Andrew Sullivan wrote: I see you found it, but note that it's _not_ a spurious UTF-8 character: it's a right-to-left mark, ans is a perfectly ok UTF-8 code point. Andrew, thank you for your reply. Yes I know that this is a perfectly legal UTF-8 character.