Bugs item #2177734, was opened at 2008-10-18 21:58
Message generated for change (Comment added) made by skinkie
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2177734&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: Clients CVS Head
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Stefan de Konink (skinkie)
Assigned to: Nobody/Anonymous (nobody)
Summary: Output breaks on foreign characters
Initial Comment:
I presume NLS must be systemwide enabled in order get working 'foreign' output
in mclient. What I now see is that probably the pipe to stdout breaks on a
foreign characters:
| 22771762 | power | line | 22771762 | 22 | 290062167 | 290062167 |
50.91285649999 | 6.130952899999 | Team_Alpha | 2008-08-22 |
: : : : : : : :
9997 : 9996 : : 06:04:08.000000 :
: : : : : : : :
: : : +00:00 :
| 22772346 | power | line | 22772346 | 0 | 244473999 | 244473999 |
50.79486810000 | 6.973957699999 |12315 tuples
sql>select * from way_tags, way_nds, nodes_legacy where k='power' and v='line'
and way_tags.way = way_nds.way and way_nds.to_node = nodes_legacy.id;
12315 tuples
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-11-08 15:54
Message:
/opt/monetdb/bin/mclient -lsql
sql>create table MyTab (MyAtt string);
0 tuples
sql>insert into MyTab values ('FSürth');
more>!unexpected end of input
0 tuples
/opt/monetdb/bin/mclient -lsql -Eutf8
sql>insert into MyTab values ('FSürth');
Rows affected 1
I guess this is the 'as expected behavior' on a client that is POSIX. It
might be even worthwhile to pursuit a fix for Xterm. Hence, the locale
should affect what is displayed in the client. Hence if pasting a 'ü'
works, but the current locale prohibits it then it should not be pasted
*as* an 'ü' in the first place by local rendering rules.
/opt/monetdb/bin/mclient -lsql
sql>select * from MyTab;
+---------+
| myatt |
+=========+
|1 tuple
write error
sql>select * from users;
+---------+---------------+----------------+
| name | fullname | default_schema |
+=========+===============+================+
| monetdb | MonetDB Admin | 1061 |
+---------+---------------+----------------+
1 tuple
/opt/monetdb/bin/mclient -lsql -Eutf-8
sql>select * from MyTab;
+---------+
| myatt |
+=========+
| FSürth |
+---------+
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-10-22 22:12
Message:
Is it possible the patch has not been propagated in public cvs yet?
I don't see any difference after updating, will try tomorrow too.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-22 13:53
Message:
Stefan ("skinkie"),
could you "live" with the below behavior after Sjoerd's yesterday's
changes?
If so, please feel free to close this bug report.
===================================================================
2008/10/21 - sjoerd: MonetDB/src/common/stream.mx,1.173
Added function to clear stream error so that we can continue after an
error.
This is needed for mclient.
===================================================================
2008/10/21 - sjoerd: clients/src/mapiclient/MapiClient.mx,1.124
Report write error and clear the stream so that we can continue.
Unfortunately the reason for the write error is lost, so the best we
can do now is to report that there was an error.
Convert command line query according to specified or implied encoding.
===================================================================
Basically, mclient now issues a "write error", if the chosen encoding does
not allow mclient to properly output certain (not supported) characters:
========
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=nl_NL.UTF-8
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ mclient -lsql
sql>create table MyTab (MyAtt string);
0 tuples
sql>insert into MyTab values ('FSürth');
Rows affected 1
sql>select * from MyTab;
+---------+
| myatt |
+=========+
| FSürth |
+---------+
1 tuple
sql>
$ mclient -lsql -Eiso-8859-1
sql>select * from MyTab;
+---------+
| myatt |
+=========+
| FS�rth |
+---------+
1 tuple
sql>insert into MyTab values ('FSürth');
Rows affected 1
sql>select * from MyTab;
+-----------+
| myatt |
+===========+
| FS�rth |
| FSürth |
+-----------+
2 tuples
sql>
$ mclient -lsql -EANSI_X3.4-1968
sql>select * from MyTab;
+-----------+
| myatt |
+===========+
|2 tuples
write error
sql>insert into MyTab values ('FSürth');
more>;
more>'
more>;
!syntax error, unexpected SCOLON, expecting ')' or ',' in: "insert into
mytab values ('FS;
!'
!;"
0 tuples
sql>
$ export LANG=''
$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER=nl_NL.UTF-8
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
13:45:02 [EMAIL PROTECTED]:~/_/scratch0/Monet/Testing/Current $ mclient -lsql
sql>select * from MyTab;
+-----------+
| myatt |
+===========+
|2 tuples
write error
sql>
$ mclient -lsql -EANSI_X3.4-1968
sql>select * from MyTab;
+-----------+
| myatt |
+===========+
|2 tuples
write error
sql>
$ mclient -lsql -Eiso-8859-1
sql>select * from MyTab;
+-----------+
| myatt |
+===========+
| FS�rth |
| FSürth |
+-----------+
2 tuples
sql>
$ mclient -lsql -Een_US.UTF-8
mclient: warning: cannot convert local character set en_US.UTF-8 to UTF-8
mclient: warning: cannot convert UTF-8 to local character set en_US.UTF-8
sql>select * from MyTab;
+-----------+
| myatt |
+===========+
| FSürth |
| FSürth |
+-----------+
2 tuples
sql>
========
----------------------------------------------------------------------
Comment By: Sjoerd Mullender (sjoerd)
Date: 2008-10-21 22:15
Message:
Finally we have enough information to tell what's going on.
Because your locale is POSIX, the character set is ANSI_X3.4-1968. This
is a 7-bit encoding, so cannor represent non-ASCII characters. This means
that mclient cannot convert non-ASCII characters such as the ü. Because
the conversion fails, the output stream refuses to do anything more.
So there is a bug in that if character conversion fails, the error is not
properly reported and the stream stops working.
If you want to use non-ASCII characters, don't use the C/POSIX locale but
use a locale that actually supports the character set you want to use.
mclient uses the locale automatically, but you can override with the -E
option, so you could use -Eiso-8859-1. But remember, because of your
environment, that is not the locale you're actually using.
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-10-21 20:33
Message:
I can understand the behaviour you that happens when inserting in a 'wrong'
encoding as the client in it. Although, shouldn't respect the encoding that
the terminal is in? Hence the environment variable.
But the other way around; it seems that data inside the database is able
to manipulate the client; it goes to far for me to say it is exploitable,
because I don't know where the console output is redirected to upon such
character is found (hence I was also unable to see what under the hood goes
wrong). But upon the existence of such character a redirect can occur vs
uninitialisation of console parameter.
sql>\>/tmp/test
sql>select * from nodes_legacy where id = 244473999;
This result in the right output in the file test.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-21 20:13
Message:
Also for input, the encoding used by your mclient must match the encoding
of the input data --- with interactive input, this is the encoding that
your shell / terminal uses, where you started mclient.
By default, mclient uses (and hence expects) UTF-8 encoding; you can
change this by using mclient's "-E"/"--encoding" command line option; see
also `mclient --help`.
In case the encoding do not match, the behavior may be "unexpected" as the
charaters are interpreted differently than they (seem to) look.
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-10-21 18:57
Message:
@Stefan:
And what is a "'>'-option"?
The option to redirect all output of the server to a file?
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-10-21 18:30
Message:
sql>create table MyTab (MyAtt string);
!CREATE TABLE: name 'mytab' already in use
0 tuples
sql>insert into MyTab values ('FSürth');
more>select * from MyTab;
more>
This is what you requested. (More important things on my mind right now
see private mail). Yes, it was insufficient.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-21 17:39
Message:
@skinkie: FYI, here is how I read your initial report --- judge yourself
whether is it sufficient information to reproduce and understand your
problem:
"
I presume NLS must be systemwide enabled in order get working 'foreign'
output in mclient.
"
Ok, so you do have NLS enabled (or not?) on your system --- hm, which
(operating) system are you actually using, here?
In fact, what do you refer to with NLS, here? --- See, e.g.,
http://en.wikipedia.org/wiki/NLS for a list of alternatives; even in
"Computing" there are (at least) three (cf.,
http://en.wikipedia.org/wiki/NLS#Computing) --- let's assume "Native
Language Support" --- so, what is your native language? --- ah, I happen to
know (I guess...): Dutch --- so, you're using a Dutch systems. Good.
Hence, all non-dutch characters are 'foreign" to you? --- hm, 'ë' should
work fine, then ...
"
What I now see is that probably the pipe to stdout breaks on a foreign
characters:
"
"Now?" --- but at least if works fine when redirecting into a file?
"
| 22771762 | power | line | 22771762 | 22 | 290062167 | 290062167 |
50.91285649999 | 6.130952899999 | Team_Alpha | 2008-08-22 |
: : : : : : : :
9997 : 9996 : : 06:04:08.000000 :
: : : : : : : :
: : : +00:00 :
| 22772346 | power | line | 22772346 | 0 | 244473999 | 244473999 |
50.79486810000 | 6.973957699999 |12315 tuples
"
Some output; good.
But what is wrong with that output?
What is / would have been correct?
What did you expect?
Who/what produced that output?
Ok, yet another guessing exercise: Let's assume this is a result of some
SQL query, run on some (unknown) data, most probably using some
not-specified version of MonetDB/SQL, run via mclient in some terminal or
command shell with Dutch NLS (see above).
The queries produces 12315 tuples with 33 attributes each; like the query
itself, the first 12313 tuples have not been added to this bug report, the
last-ut-one tuple apprears to be complete; the last tuple has been cut off
after 9 attribute --- still no indication what kind of 'foreign' character
might have been expected and/or cause any problem ...
Since there is no other information, we must assume that the number of
result tuples (12315) is correct.
"
sql>select * from way_tags, way_nds, nodes_legacy where k='power' and
v='line' and way_tags.way = way_nds.way and way_nds.to_node =
nodes_legacy.id;
12315 tuples
"
Ah, now we have a query --- but no indication what so ever, why it appears
here...
Is it related to the problem? If so how?
It is identical with / similar to / different from the query that created
the above output?
At least is produces the same number of tuples: 12315 --- guess this is
correct, again --- isn't it??
For brevity of the report, the user now chose to omit all 12315 tuples;
hence, they were most probably what he expected. Good.
So --- What was the actual problem, again?
How was the problem triggered?
How can we reproduce it?
... Oh, the first followup of the user tells us more:
"
[...] there is enough information to reproduce the bug on something like
UTF8 characters.
"
Yes? What bug? What information?
"
If even the authors of mclient are not surprised what the output of the
second query is <<12315 tuples>> without setting any '>'-option, better to
not improve my bug reporting skills.
"
Hm, since you refuse to give the slightest information what kind of data
you run your query on, or at least which result you would have expected,
how shall be able to guess whether 12315 tuples is correct or not?
And what is a "'>'-option"?
...
Well, I guess, that should be enough to get the picture --- even without
http://www.chiark.greenend.org.uk/~sgtatham/bugs.html, common sense might
suggest that "showing output" without any indication how that output was
produced and/or what is wrong with that output, and what would have been
correct is just as much information as showing no output at all.
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-10-21 16:57
Message:
@sjoerd; personally i would connect non-nls as posix locale but that is me.
So still the bug exist, I read the document after you pointed it to me. In
this document is also mentioned 'show to the programmer the output of your
program' that is the thing I did with the initial posting. I also mentioned
what I *thought* would be the problem. But I'll try to be more verbose.
@mr-meltdown; do you agree with me that in this case GDB was not able to
output too? while it actually does?
----------------------------------------------------------------------
Comment By: Sjoerd Mullender (sjoerd)
Date: 2008-10-21 16:20
Message:
mclient -lsql -Eiso-8859-1
We had to ask lots of questions for you to finally provide some useful
information. If you had internalized the document I pointed you to, that
would not have been necessary. If you feel insulted, tough. But please do
read that document. We have better things to do than trying to be
clairvoyant.
----------------------------------------------------------------------
Comment By: Fabian (mr-meltdown)
Date: 2008-10-21 15:58
Message:
utf-8 chars on a posix/C locale is guaranteed to be problematic, IMO.
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-10-21 14:54
Message:
locale
------
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
perl -e ''
--------
<< returns nothing >>>
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-21 09:09
Message:
I know, but for completeness, I added mine, too ;-)
----------------------------------------------------------------------
Comment By: Fabian (mr-meltdown)
Date: 2008-10-21 09:01
Message:
sorry, I meant
@skinkie: what does `locale` say, and what does `perl -e ''` say?
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-21 08:56
Message:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=nl_NL.UTF-8
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
----------------------------------------------------------------------
Comment By: Fabian (mr-meltdown)
Date: 2008-10-21 08:44
Message:
what does `locale` say on your system?
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-10-20 23:26
Message:
I presume you have a Linux distro that has NLS enabled, like I did before I
thought I could strip it to save more space for my VM/LiveCD.
SQLrow (len=0x8713698, numeric=0x87136c8, rest=0x87136b0, fields=5,
trim=1) at ../../../src/mapiclient/MapiClient.mx:4
408 for (i = 0; i < fields; i++) {
(gdb)
409 if ((t = rest[i]) != NULL && utf8strlen(t)
> (size_t) len[i]) {
(gdb)
utf8strlen (s=0x8713610 "FSürth") at
../../../src/mapiclient/MapiClient.mx:370
Considering that; I presume something in stream_printf goes wrong where
toConsole is changed. If you want to debug it, of course a quest account is
possible within my vm.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2008-10-20 20:20
Message:
Works for me:
sql>create table MyTab (MyAtt string);
0 tuples
sql>insert into MyTab values ('FSürth');
Rows affected 1
sql>select * from MyTab;
+---------+
| myatt |
+=========+
| FSürth |
+---------+
1 tuple
sql>
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-10-20 12:38
Message:
(I'm using the new and improved bugtracker maybe that helps)
The issue is not the white space, the issue is the sudden loss of output
to stdout. If you take a peak on the 'line' with 22771762, you notice a
string Team_Alpha. Now look forward to '22772346' you notice 12315 tuples.
For the reason, that I can positively mark as 'in some way related to my
recompiling to -nls' (in gentoo terms),
http://openstreetmap.org/api/0.5/node/244473999 where the user-value is set
to 'FSürth'.
Now how can I know that I didn't screw up Mserver5, or Mapi in that
perspective? Because my alternative lookup mechanism still works, and is
working on this same dataset.
http://thuis.konink.de/api/0.5/node/244473999
----------------------------------------------------------------------
Comment By: Fabian (mr-meltdown)
Date: 2008-10-20 12:28
Message:
Unfortunately SF eats whitespace, so I'm not able to see the problem
proper, but can it be that you're bitten by mclient breaking up long values
in an attempt not to get wider than your terminal size?
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2008-10-20 12:12
Message:
I take this as offensive; there is enough information to reproduce the bug
on something like UTF8 characters. If even the authors of mclient are not
surprised what the output of the second query is <<12315 tuples>> without
setting any '>'-option, better to not improve my bug reporting skills.
----------------------------------------------------------------------
Comment By: Sjoerd Mullender (sjoerd)
Date: 2008-10-20 10:33
Message:
Please provide details.
Read and internalize
<http://www.chiark.greenend.org.uk/~sgtatham/bugs.html>.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2177734&group_id=56967
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs