Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Alexander Law
Hello, C. We have one logfile with UTF-8. Pros: Log messages of all our clients can fit in it. We can use any generic editor/viewer to open it. Nothing changes for Linux (and other OSes with UTF-8 encoding). Cons: All the strings written to log file should go through some conversation function.

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Tatsuo Ishii
I am thinking about variant of C. Problem with C is, converting from other encoding to UTF-8 is not cheap because it requires huge conversion tables. This may be a serious problem with busy server. Also it is possible some information is lossed while in this conversion. This is because

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Tatsuo Ishii
Hello, Implementing any of these isn't trivial - especially making sure messages emitted to stderr from things like segfaults and dynamic linker messages are always correct. Ensuring that the logging collector knows when setlocale() has been called to change the encoding and translation of

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Alexander Law
The initial issue was that log file contains messages in different encodings. So transcoding is performed already, but it's not This is not true. Transcoding happens only when PostgreSQL is built with --enable-nls option (default is no nls). I'll restate the initial issue as I see it. I have

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Alexander Law
And regarding mule internal encoding - reading about Mule http://www.emacswiki.org/emacs/UnicodeEncoding I found: /In future (probably Emacs 22), Mule will use an internal encoding which is a UTF-8 encoding of a superset of Unicode. / So I still see UTF-8 as a common denominator for all the

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Tatsuo Ishii
You can google by encoding EUC_JP has no equivalent in UTF8 or some such to find such an example. In this case PostgreSQL just throw an error. For frontend/backend encoding conversion this is fine. But what should we do for logs? Apparently we cannot throw an error here. Unification is

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Alexander Law
Ok, maybe the time of real universal encoding has not yet come. Then we maybe just should add a new parameter log_encoding (UTF-8 by default) to postgresql.conf. And to use this encoding consistently within logging_collector. If this encoding is not available then fall back to 7-bit ASCII. What

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Tatsuo Ishii
Sorry, it was inaccurate phrase. I mean if the conversion to this encoding is not avaliable. For example, when we have database in EUC_JP and log_encoding set to Latin1. I think that we can even fall back to UTF-8 as we can convert all encodings to it (with some exceptions that you noticed).

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Alban Hertroys
On 19 July 2012 10:40, Alexander Law exclus...@gmail.com wrote: Ok, maybe the time of real universal encoding has not yet come. Then we maybe just should add a new parameter log_encoding (UTF-8 by default) to postgresql.conf. And to use this encoding consistently within logging_collector. If

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Alban Hertroys
Yikes, messed up my grammar a bit I see! On 19 July 2012 10:58, Alban Hertroys haram...@gmail.com wrote: I like Craig's idea of adding the client encoding to the log lines. A possible problem with that (I'm not an encoding expert) is that a log line like that will contain data about the

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Alexander Law
Sorry, it was inaccurate phrase. I mean if the conversion to this encoding is not avaliable. For example, when we have database in EUC_JP and log_encoding set to Latin1. I think that we can even fall back to UTF-8 as we can convert all encodings to it (with some exceptions that you noticed).

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Alexander Law
I like Craig's idea of adding the client encoding to the log lines. A possible problem with that (I'm not an encoding expert) is that a log line like that will contain data about the database server meta-data (log time, client encoding, etc) in the database default encoding and database data (the

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Alban Hertroys
On 19 July 2012 13:50, Alexander Law exclus...@gmail.com wrote: I like Craig's idea of adding the client encoding to the log lines. A possible problem with that (I'm not an encoding expert) is that a log line like that will contain data about the database server meta-data (log time, client

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Craig Ringer
On 07/19/2012 03:24 PM, Tatsuo Ishii wrote: BTW, I'm not stick with mule-internal encoding. What we need here is a super encoding which could include any existing encodings without information loss. For this purpose, I think we can even invent a new encoding(maybe something like very first

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-19 Thread Craig Ringer
On 07/19/2012 04:58 PM, Alban Hertroys wrote: On 19 July 2012 10:40, Alexander Law exclus...@gmail.com wrote: Ok, maybe the time of real universal encoding has not yet come. Then we maybe just should add a new parameter log_encoding (UTF-8 by default) to postgresql.conf. And to use this

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-18 Thread Tatsuo Ishii
C. We have one logfile with UTF-8. Pros: Log messages of all our clients can fit in it. We can use any generic editor/viewer to open it. Nothing changes for Linux (and other OSes with UTF-8 encoding). Cons: All the strings written to log file should go through some conversation function.

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-18 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes: My idea is using mule-internal encoding for the log file instead of UTF-8. There are several advantages: 1) Converion to mule-internal encoding is cheap because no conversion table is required. Also no information loss happens in this

Re: [GENERAL] [BUGS] main log encoding problem

2012-07-18 Thread Tatsuo Ishii
Tatsuo Ishii is...@postgresql.org writes: My idea is using mule-internal encoding for the log file instead of UTF-8. There are several advantages: 1) Converion to mule-internal encoding is cheap because no conversion table is required. Also no information loss happens in this