Re: [HACKERS] Chinese in Postgres

2013-08-18 Thread Nicolas Barbier
[ Could you please trim your citations, i.e., please don’t top-post:
https://en.wikipedia.org/wiki/Posting_style#Top-posting ]

2013/8/16 Francesco ciifrance...@tiscali.it:

 Thanks for your answer.
 Yes, the client is also UTF8:

 MyDB=# show
 client_encoding;
 client_encoding
 -
 UTF8
 (1 row)

I guess that this is the client encoding used by psql. I suspect your
C++-program doesn’t use client encoding UTF8. What library are you
using, libpq? Did you run the psql instance (whose output you pasted)
on Windows or on some kind of UNIX-machine over SSH? Does your
problematic C++-program run on Windows or the UNIX-machine?

(The “client encoding” is not a property of the database, but of the
specific client you are using. The C++-program’s client encoding might
therefore by entirely different from the one used by psql, especially
if you don’t run them on the same machine.)

[ BTW, I think this question really doesn’t belong on -hackers, as
no-one seems to think it is a bug, nor is it a question about
PostgreSQL internals. ]

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Chinese in Postgres

2013-08-17 Thread Peter Geoghegan
On Fri, Aug 16, 2013 at 4:25 AM, ciifrance...@tiscali.it
ciifrance...@tiscali.it wrote:
 If I insert the data using a C++ program I have empty squares, in this
 format: ��� (3 empty squares for each chinese ideogram as that is the length
 in UTF-8)
 If the string contains chinese mixed with ASCII, the ASCII is OK but the
 Chinese is broken:
 漢語1-3漢語  -- ��1-3��

You mentioned nothing about what platform this is or how you've built
the program, and nothing about operating system locale.

If this is a Windows program (you mention PuTTY), I'd read up on
differences between what are known as Unicode and Multibyte
encodings on MSDN:

http://msdn.microsoft.com/en-us/library/2dax2h36.aspx

Of course, this is a total stab in the dark, but then people with the
problem that you describe don't tend to be on *nix systems as a rule.
As someone said upthread, if Postgres does that then it's because the
bytes you sent aren't what you think the are when rendered as UTF-8.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Chinese in Postgres

2013-08-16 Thread ciifrance...@tiscali.it
Hello all,
before writing this message, I wrote about this in other mailing lists without 
solving my problem.
Maybe some of you can help me.

I have problems with a DB in postgres, when i try to insert Chinese strings in 
UTF-8 format.
If I insert the data using a C++ program I have empty squares, in this format: 
��� (3 empty squares for each chinese ideogram as that is the length in UTF-8)
If the string contains chinese mixed with ASCII, the ASCII is OK but the 
Chinese is broken:
漢語1-3漢語  -- ��1-3��

All the data is read from a binary file. It seems it's read correctly, but 
something happens when the query is executed.
(If the text is in a different language that uses only 2 bytes for each letter, 
I will see only 2 empty squares per character, ex. hebrew, but this is not good 
anyway...)

Strange things:
1. if i insert the record doing a query from command line (putty), the chinese 
text is OK. This problem is only when i insert by the C++ program.
2. I checked the C++ functions involved by creating unitary tests; if i run 
some other tests (on another virtual machine) the text is not damaged.
These strange things are confusing me, but maybe they will be useful 
informations for somebody who had the same problem.

The DB is set for UTF-8
 Name | Owner | Encoding |   Collate   |Ctype| Access privileges
--+---+--+-+-+--
 postgres | pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 MyDB | pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 template0| pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 template1| pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 |

Previously I also tried with:

 Name | Owner | Encoding |   Collate   |Ctype| Access privileges
--+---+--+-+-+--
 postgres | pgsql | UTF8 | C   | C   |
 MyDB | pgsql | UTF8 | C   | C   |
...

But the problem was the same.
I know that you would like to see the code, but it's too long (anyway if you 
want i can try to write some lines of code, like connection to Db and so on). I 
don't know if there is some log create by postgres when inserting damaged data, 
sould be useful.

For now, in order to save your time my question is: did anybody of you have the 
same problem?
(and how did you solve it?)

Thanks,
Francesco

Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di 
uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un 
amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/ 



Re: [HACKERS] Chinese in Postgres

2013-08-16 Thread Hannu Krosing
On 08/16/2013 01:25 PM, ciifrance...@tiscali.it wrote:
 Hello all,
 before writing this message, I wrote about this in other mailing lists
 without solving my problem.
 Maybe some of you can help me.

 I have problems with a DB in postgres, when i try to insert Chinese
 strings in UTF-8 format.
 If I insert the data using a C++ program I have empty squares, in this
 format: ��� (3 empty squares for each chinese ideogram as that is the
 length in UTF-8)
 If the string contains chinese mixed with ASCII, the ASCII is OK but
 the Chinese is broken:
 漢語1-3漢語  -- ��1-3��
Can you cehck that your client encoding is also UTF8

hannu=# show client_encoding ;
 client_encoding
-
 UTF8
(1 row)


Cheers


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: R: Re: [HACKERS] Chinese in Postgres

2013-08-16 Thread Hannu Krosing
On 08/16/2013 02:40 PM, ciifrance...@tiscali.it wrote:
 Thanks for your answer.
 Yes, the client is also UTF8:

 MyDB=# show 
 client_encoding;
  client_encoding
 -
  UTF8
 (1 row)
Strange, it works for me :

hannu@hannu-900X3E:~/workspace/my-app$ psql
psql (9.3beta2, server 9.2.4)
Type help for help.

hannu=# select * from pg_stat_activity;
hannu=# show client_encoding ;
 client_encoding
-
 UTF8
(1 row)

hannu=# create table tchinese(data text);
CREATE TABLE
hannu=# insert into tchinese values('漢語1-3漢語');
INSERT 0 1
hannu=# select * from tchinese ;
data
-
 漢語1-3漢語
(1 row)

hannu=# \q


Are you sure that the client-encoding is also the same when you are
actually doing the import ?

Or when you are getting the wrong results when reading

what does length() of the bad field give you ?

hannu=# select data, length(data) from tchinese ;
data | length
-+
 漢語1-3漢語 |  7
(1 row)




-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


R: Re: [HACKERS] Chinese in Postgres

2013-08-16 Thread ciifrance...@tiscali.it
Thanks for your answer.
Yes, the client is also UTF8:

MyDB=# show 
client_encoding;
 client_encoding
-
 UTF8
(1 row)


Cheers
Francesco
Messaggio originale
Da: ha...@2ndquadrant.com

Data: 16/08/2013 14.16
A: ciifrance...@tiscali.it
ciifrance...@tiscali.it
Cc: pgsql-hackers@postgresql.org, pgsql-zh-
gene...@postgresql.org, pgsql-ru-gene...@postgresql.org
Ogg: Re: 
[HACKERS] Chinese in Postgres

On 08/16/2013 01:25 PM, 
ciifrance...@tiscali.it wrote:
 Hello all,
 before writing this 
message, I wrote about this in other mailing lists
 without solving my 
problem.
 Maybe some of you can help me.

 I have problems with a DB 
in postgres, when i try to insert Chinese
 strings in UTF-8 format.
 
If I insert the data using a C++ program I have empty squares, in this

 format: ��� (3 empty squares for each chinese ideogram as that is the

 length in UTF-8)
 If the string contains chinese mixed with ASCII, 
the ASCII is OK but
 the Chinese is broken:
 漢語1-3漢語  -- ��1-
3��
Can you cehck that your client encoding is also UTF8

hannu=# 
show client_encoding ;
 client_encoding
-
 UTF8
(1 row)



Cheers


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, 
Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via 
pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-
hackers





Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di 
uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un 
amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


R: 回复: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres

2013-08-16 Thread ciifrance...@tiscali.it
[I reply to both in one email]

Song:
that C++ program has a log file. 
In the log file the queries look like this:
UPDATE MY_table SET 
UTF8_field = 
'e58fb0203132333427205748455245204944203d2031


starting from the first chinese letter, all the rest of the query is 
in hexa.
But this is not a problem, because the query is inserted fine 
(excepted the chinese letter). And if i use a hexa converter, i get the 
correct query:

UPDATE MY_table SET UTF8_field = '台 1234' WHERE ID = 1


Hannu:
the length in the database is counting each of the empty 
squares:

 UTF8_field | length 
+
 ��� 1234   
|  8

cheers
Francesco

Messaggio originale
Da: 
mark3...@yahoo.cn
Data: 16/08/2013 14.52
A: ciifrance...@tiscali.it
ciifrance...@tiscali.it, ha...@2ndquadrant.comhannu@2ndQuadrant.
com
Cc: pgsql-hackers@postgresql.orgpgsql-hackers@postgresql.org, 
pgsql-zh-gene...@postgresql.orgpgsql-zh-gene...@postgresql.org, 
pgsql-ru-gene...@postgresql.orgpgsql-ru-gene...@postgresql.org
Ogg: 
回复: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres

maybe your 
C++ program has something (such as charset or configuation) causing 
this strange thing

mark





 发件人: 
ciifrance...@tiscali.it ciifrance...@tiscali.it
收件人: 
ha...@2ndquadrant.com 
抄送: pgsql-hackers@postgresql.org; pgsql-zh-
gene...@postgresql.org; pgsql-ru-gene...@postgresql.org 
发送日期: 2013年8月16
日, 星期五, 8:40 下午
主题: [pgsql-zh-general] R: Re: [HACKERS] Chinese in 
Postgres
 

Thanks for your answer.
Yes, the client is also UTF8:


MyDB=# show 
client_encoding;
client_encoding
-
UTF8
(1 
row)


Cheers
Francesco
Messaggio originale
Da: 
ha...@2ndquadrant.com

Data: 16/08/2013 14.16
A: ciifrancesco@tiscali.
it
ciifrance...@tiscali.it
Cc: pgsql-hackers@postgresql.org, 
pgsql-zh-
gene...@postgresql.org, pgsql-ru-gene...@postgresql.org

Ogg: Re: 
[HACKERS] Chinese in Postgres

On 08/16/2013 01:25 PM, 

ciifrance...@tiscali.it wrote:
 Hello all,
 before writing this 

message, I wrote about this in other mailing lists
 without solving 
my 
problem.
 Maybe some of you can help me.

 I have problems with 
a DB 
in postgres, when i try to insert Chinese
 strings in UTF-8 
format.
 
If I insert the data using a C++ program I have empty 
squares, in this

 format: ��� (3 empty squares for each chinese 
ideogram as that is the

 length in UTF-8)
 If the string contains 
chinese mixed with ASCII, 
the ASCII is OK but
 the Chinese is broken:

 漢語1-3漢語  -- ��1-
3��
Can you cehck that your client encoding 
is also UTF8

hannu=# 
show client_encoding ;
client_encoding

-
UTF8
(1 row)



Cheers


-- 
Hannu Krosing
PostgreSQL 
Consultant
Performance, 
Scalability and High Availability
2ndQuadrant 
Nordic OÜ



-- 
Sent via 
pgsql-hackers mailing list (pgsql-
hack...@postgresql.org)
To make 
changes to your subscription:
http:
//www.postgresql.org/mailpref/pgsql-
hackers





Invita i tuoi amici e 
Tiscali ti premia! Il consiglio di un amico vale più di uno spot in TV. 
Per ogni nuovo abbonato 30 € di premio per te e per lui! Un amico al 
mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/



-- 
Sent via pgsql-zh-general mailing list (pgsql-zh-general@postgresql.
org)
To make changes to your subscription:
http://www.postgresql.
org/mailpref/pgsql-zh-general




Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di 
uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un 
amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 回复: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres

2013-08-16 Thread Song
maybe your C++ program has something (such as charset or configuation) causing 
this strange thing

mark





 发件人: ciifrance...@tiscali.it ciifrance...@tiscali.it
收件人: ha...@2ndquadrant.com 
抄送: pgsql-hackers@postgresql.org; pgsql-zh-gene...@postgresql.org; 
pgsql-ru-gene...@postgresql.org 
发送日期: 2013年8月16日, 星期五, 8:40 下午
主题: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres
 

Thanks for your answer.
Yes, the client is also UTF8:

MyDB=# show 
client_encoding;
client_encoding
-
UTF8
(1 row)


Cheers
Francesco
Messaggio originale
Da: ha...@2ndquadrant.com

Data: 16/08/2013 14.16
A: ciifrance...@tiscali.it
ciifrance...@tiscali.it
Cc: pgsql-hackers@postgresql.org, pgsql-zh-
gene...@postgresql.org, pgsql-ru-gene...@postgresql.org
Ogg: Re: 
[HACKERS] Chinese in Postgres

On 08/16/2013 01:25 PM, 
ciifrance...@tiscali.it wrote:
 Hello all,
 before writing this 
message, I wrote about this in other mailing lists
 without solving my 
problem.
 Maybe some of you can help me.

 I have problems with a DB 
in postgres, when i try to insert Chinese
 strings in UTF-8 format.
 
If I insert the data using a C++ program I have empty squares, in this

 format: ��� (3 empty squares for each chinese ideogram as that is the

 length in UTF-8)
 If the string contains chinese mixed with ASCII, 
the ASCII is OK but
 the Chinese is broken:
 漢語1-3漢語  -- ��1-
3��
Can you cehck that your client encoding is also UTF8

hannu=# 
show client_encoding ;
client_encoding
-
UTF8
(1 row)



Cheers


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, 
Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via 
pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-
hackers





Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di 
uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un 
amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/


-- 
Sent via pgsql-zh-general mailing list (pgsql-zh-gene...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-zh-general