Re: [HACKERS] Chinese in Postgres
[ Could you please trim your citations, i.e., please don’t top-post: https://en.wikipedia.org/wiki/Posting_style#Top-posting ] 2013/8/16 Francesco ciifrance...@tiscali.it: Thanks for your answer. Yes, the client is also UTF8: MyDB=# show client_encoding; client_encoding - UTF8 (1 row) I guess that this is the client encoding used by psql. I suspect your C++-program doesn’t use client encoding UTF8. What library are you using, libpq? Did you run the psql instance (whose output you pasted) on Windows or on some kind of UNIX-machine over SSH? Does your problematic C++-program run on Windows or the UNIX-machine? (The “client encoding” is not a property of the database, but of the specific client you are using. The C++-program’s client encoding might therefore by entirely different from the one used by psql, especially if you don’t run them on the same machine.) [ BTW, I think this question really doesn’t belong on -hackers, as no-one seems to think it is a bug, nor is it a question about PostgreSQL internals. ] Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Chinese in Postgres
On Fri, Aug 16, 2013 at 4:25 AM, ciifrance...@tiscali.it ciifrance...@tiscali.it wrote: If I insert the data using a C++ program I have empty squares, in this format: ��� (3 empty squares for each chinese ideogram as that is the length in UTF-8) If the string contains chinese mixed with ASCII, the ASCII is OK but the Chinese is broken: 漢語1-3漢語 -- ��1-3�� You mentioned nothing about what platform this is or how you've built the program, and nothing about operating system locale. If this is a Windows program (you mention PuTTY), I'd read up on differences between what are known as Unicode and Multibyte encodings on MSDN: http://msdn.microsoft.com/en-us/library/2dax2h36.aspx Of course, this is a total stab in the dark, but then people with the problem that you describe don't tend to be on *nix systems as a rule. As someone said upthread, if Postgres does that then it's because the bytes you sent aren't what you think the are when rendered as UTF-8. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Chinese in Postgres
Hello all, before writing this message, I wrote about this in other mailing lists without solving my problem. Maybe some of you can help me. I have problems with a DB in postgres, when i try to insert Chinese strings in UTF-8 format. If I insert the data using a C++ program I have empty squares, in this format: ��� (3 empty squares for each chinese ideogram as that is the length in UTF-8) If the string contains chinese mixed with ASCII, the ASCII is OK but the Chinese is broken: 漢語1-3漢語 -- ��1-3�� All the data is read from a binary file. It seems it's read correctly, but something happens when the query is executed. (If the text is in a different language that uses only 2 bytes for each letter, I will see only 2 empty squares per character, ex. hebrew, but this is not good anyway...) Strange things: 1. if i insert the record doing a query from command line (putty), the chinese text is OK. This problem is only when i insert by the C++ program. 2. I checked the C++ functions involved by creating unitary tests; if i run some other tests (on another virtual machine) the text is not damaged. These strange things are confusing me, but maybe they will be useful informations for somebody who had the same problem. The DB is set for UTF-8 Name | Owner | Encoding | Collate |Ctype| Access privileges --+---+--+-+-+-- postgres | pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 | MyDB | pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0| pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template1| pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 | Previously I also tried with: Name | Owner | Encoding | Collate |Ctype| Access privileges --+---+--+-+-+-- postgres | pgsql | UTF8 | C | C | MyDB | pgsql | UTF8 | C | C | ... But the problem was the same. I know that you would like to see the code, but it's too long (anyway if you want i can try to write some lines of code, like connection to Db and so on). I don't know if there is some log create by postgres when inserting damaged data, sould be useful. For now, in order to save your time my question is: did anybody of you have the same problem? (and how did you solve it?) Thanks, Francesco Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/
Re: [HACKERS] Chinese in Postgres
On 08/16/2013 01:25 PM, ciifrance...@tiscali.it wrote: Hello all, before writing this message, I wrote about this in other mailing lists without solving my problem. Maybe some of you can help me. I have problems with a DB in postgres, when i try to insert Chinese strings in UTF-8 format. If I insert the data using a C++ program I have empty squares, in this format: ��� (3 empty squares for each chinese ideogram as that is the length in UTF-8) If the string contains chinese mixed with ASCII, the ASCII is OK but the Chinese is broken: 漢語1-3漢語 -- ��1-3�� Can you cehck that your client encoding is also UTF8 hannu=# show client_encoding ; client_encoding - UTF8 (1 row) Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: R: Re: [HACKERS] Chinese in Postgres
On 08/16/2013 02:40 PM, ciifrance...@tiscali.it wrote: Thanks for your answer. Yes, the client is also UTF8: MyDB=# show client_encoding; client_encoding - UTF8 (1 row) Strange, it works for me : hannu@hannu-900X3E:~/workspace/my-app$ psql psql (9.3beta2, server 9.2.4) Type help for help. hannu=# select * from pg_stat_activity; hannu=# show client_encoding ; client_encoding - UTF8 (1 row) hannu=# create table tchinese(data text); CREATE TABLE hannu=# insert into tchinese values('漢語1-3漢語'); INSERT 0 1 hannu=# select * from tchinese ; data - 漢語1-3漢語 (1 row) hannu=# \q Are you sure that the client-encoding is also the same when you are actually doing the import ? Or when you are getting the wrong results when reading what does length() of the bad field give you ? hannu=# select data, length(data) from tchinese ; data | length -+ 漢語1-3漢語 | 7 (1 row) -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
R: Re: [HACKERS] Chinese in Postgres
Thanks for your answer. Yes, the client is also UTF8: MyDB=# show client_encoding; client_encoding - UTF8 (1 row) Cheers Francesco Messaggio originale Da: ha...@2ndquadrant.com Data: 16/08/2013 14.16 A: ciifrance...@tiscali.it ciifrance...@tiscali.it Cc: pgsql-hackers@postgresql.org, pgsql-zh- gene...@postgresql.org, pgsql-ru-gene...@postgresql.org Ogg: Re: [HACKERS] Chinese in Postgres On 08/16/2013 01:25 PM, ciifrance...@tiscali.it wrote: Hello all, before writing this message, I wrote about this in other mailing lists without solving my problem. Maybe some of you can help me. I have problems with a DB in postgres, when i try to insert Chinese strings in UTF-8 format. If I insert the data using a C++ program I have empty squares, in this format: ��� (3 empty squares for each chinese ideogram as that is the length in UTF-8) If the string contains chinese mixed with ASCII, the ASCII is OK but the Chinese is broken: 漢語1-3漢語 -- ��1- 3�� Can you cehck that your client encoding is also UTF8 hannu=# show client_encoding ; client_encoding - UTF8 (1 row) Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql- hackers Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
R: 回复: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres
[I reply to both in one email] Song: that C++ program has a log file. In the log file the queries look like this: UPDATE MY_table SET UTF8_field = 'e58fb0203132333427205748455245204944203d2031 starting from the first chinese letter, all the rest of the query is in hexa. But this is not a problem, because the query is inserted fine (excepted the chinese letter). And if i use a hexa converter, i get the correct query: UPDATE MY_table SET UTF8_field = '台 1234' WHERE ID = 1 Hannu: the length in the database is counting each of the empty squares: UTF8_field | length + ��� 1234 | 8 cheers Francesco Messaggio originale Da: mark3...@yahoo.cn Data: 16/08/2013 14.52 A: ciifrance...@tiscali.it ciifrance...@tiscali.it, ha...@2ndquadrant.comhannu@2ndQuadrant. com Cc: pgsql-hackers@postgresql.orgpgsql-hackers@postgresql.org, pgsql-zh-gene...@postgresql.orgpgsql-zh-gene...@postgresql.org, pgsql-ru-gene...@postgresql.orgpgsql-ru-gene...@postgresql.org Ogg: 回复: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres maybe your C++ program has something (such as charset or configuation) causing this strange thing mark 发件人: ciifrance...@tiscali.it ciifrance...@tiscali.it 收件人: ha...@2ndquadrant.com 抄送: pgsql-hackers@postgresql.org; pgsql-zh- gene...@postgresql.org; pgsql-ru-gene...@postgresql.org 发送日期: 2013年8月16 日, 星期五, 8:40 下午 主题: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres Thanks for your answer. Yes, the client is also UTF8: MyDB=# show client_encoding; client_encoding - UTF8 (1 row) Cheers Francesco Messaggio originale Da: ha...@2ndquadrant.com Data: 16/08/2013 14.16 A: ciifrancesco@tiscali. it ciifrance...@tiscali.it Cc: pgsql-hackers@postgresql.org, pgsql-zh- gene...@postgresql.org, pgsql-ru-gene...@postgresql.org Ogg: Re: [HACKERS] Chinese in Postgres On 08/16/2013 01:25 PM, ciifrance...@tiscali.it wrote: Hello all, before writing this message, I wrote about this in other mailing lists without solving my problem. Maybe some of you can help me. I have problems with a DB in postgres, when i try to insert Chinese strings in UTF-8 format. If I insert the data using a C++ program I have empty squares, in this format: ��� (3 empty squares for each chinese ideogram as that is the length in UTF-8) If the string contains chinese mixed with ASCII, the ASCII is OK but the Chinese is broken: 漢語1-3漢語 -- ��1- 3�� Can you cehck that your client encoding is also UTF8 hannu=# show client_encoding ; client_encoding - UTF8 (1 row) Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql- hack...@postgresql.org) To make changes to your subscription: http: //www.postgresql.org/mailpref/pgsql- hackers Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/ -- Sent via pgsql-zh-general mailing list (pgsql-zh-general@postgresql. org) To make changes to your subscription: http://www.postgresql. org/mailpref/pgsql-zh-general Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 回复: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres
maybe your C++ program has something (such as charset or configuation) causing this strange thing mark 发件人: ciifrance...@tiscali.it ciifrance...@tiscali.it 收件人: ha...@2ndquadrant.com 抄送: pgsql-hackers@postgresql.org; pgsql-zh-gene...@postgresql.org; pgsql-ru-gene...@postgresql.org 发送日期: 2013年8月16日, 星期五, 8:40 下午 主题: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres Thanks for your answer. Yes, the client is also UTF8: MyDB=# show client_encoding; client_encoding - UTF8 (1 row) Cheers Francesco Messaggio originale Da: ha...@2ndquadrant.com Data: 16/08/2013 14.16 A: ciifrance...@tiscali.it ciifrance...@tiscali.it Cc: pgsql-hackers@postgresql.org, pgsql-zh- gene...@postgresql.org, pgsql-ru-gene...@postgresql.org Ogg: Re: [HACKERS] Chinese in Postgres On 08/16/2013 01:25 PM, ciifrance...@tiscali.it wrote: Hello all, before writing this message, I wrote about this in other mailing lists without solving my problem. Maybe some of you can help me. I have problems with a DB in postgres, when i try to insert Chinese strings in UTF-8 format. If I insert the data using a C++ program I have empty squares, in this format: ��� (3 empty squares for each chinese ideogram as that is the length in UTF-8) If the string contains chinese mixed with ASCII, the ASCII is OK but the Chinese is broken: 漢語1-3漢語 -- ��1- 3�� Can you cehck that your client encoding is also UTF8 hannu=# show client_encoding ; client_encoding - UTF8 (1 row) Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql- hackers Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/ -- Sent via pgsql-zh-general mailing list (pgsql-zh-gene...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-zh-general