Hello all,
the fix worked, thank you Gustaf! But we still have a problem with
emojis when writing them to the database. The error we get is:
Database operation "dml" failed (exception ERROR, "ERROR: invalid byte
sequence for encoding "UTF8": 0xf0 0x9f 0x98 0xff
when trying to write the emoji to a TEXT or VARCHAR field in the
database. Inserting the same string in the database console works as
expected. When we read the string and reinsert it, it also works
flawlessly.
We've compared the two strings, wrote them to files and compared them
with a hex reader, converted them with tcl "encoding convertto" and
iconv, all with no luck.
We are using postgres 12 and the nsdbpg module with
naviserver-4.99.22-16-g67adf3c34710+
Here is the test case:
In the database console:
CREATE TABLE test (
idx SERIAL,
txt TEXT
);
INSERT INTO test (txt) VALUES ('<smiley>😃</smiley>');
In the naviserver console or in a script:
# V1: working
set db [ns_db gethandle]
set sql "SELECT txt FROM test WHERE idx=1"
set selection [ns_db 1row $db $sql]
set str [ns_set value $selection 0]
set sql "INSERT INTO test (txt) VALUES ('$str')"
ns_db dml $db $sql
ns_db releasehandle $db
# V2: not working
set db [ns_db gethandle]
set sql "INSERT INTO test (txt) VALUES ('<smiley>😃</smiley>')"
ns_db dml $db $sql
ns_db releasehandle $db
With nscp, pasting the string of V2 already shows a wrong string in the log:
Notice: nscp: 3: set sql "INSERT INTO test (txt) VALUES
('<smiley>�������</smiley>')"
Whereas V1 works (the smiley is not printed here, but works in the console):
Notice: nscp: 5: puts $str
<smiley></smiley>
Any help is greatly appreciated!
Wolfgang Winkler
Am 09.11.21 um 09:36 schrieb Gustaf Neumann:
Dear all,
The situation is trickier than someone might hope. Aside of the Tcl
version dependencies (as Brian pointed out), Tcl before 8.7 do not
support TCL_UTF_MAX with longer multi-byte sequences than 4 (see Tcl
TIP 389), which are also mostly relevant for some newer emojis. So,
for full emoji support, Tcl 8.7 with the proper compilation options is
needed.
Anyhow, in the case of Wolfgang's the "Smiling Face with Open Mouth"
we have just a 4-byte UTF-8 character, which is supported by
out-of-the-box Tcl 8.6. However, this emoji is represented
Tcl-internally as a 6-byte sequence. Since NaviServer wrongly assumed
that Tcl-internal representations are also accepted as external
representations, a conversion step was omitted for utf-8 (which is not
always true).
In the tip version of NaviServer on Bitbucket, this optimization is
now removed, the examples work as expected, the regression test is
extended for this case.
Many thanks to Wolfgang for the good bug report.
-g
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel
--
*Wolfgang Winkler*
Geschäftsführung
wolfgang.wink...@digital-concepts.com
mobil +43.699.19971172
dc:*büro*
digital concepts Novak Winkler OG
Software & Design
Landstraße 68, 5. Stock, 4020 Linz
www.digital-concepts.com <http://www.digital-concepts.com>
tel +43.732.997117.72
tel +43.699.1997117.2
Firmenbuchnummer: 192003h
Firmenbuchgericht: Landesgericht Linz
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel