Hello all,

the fix worked, thank you Gustaf! But we still have a problem with emojis when writing them to the database. The error we get is:

Database operation "dml" failed (exception ERROR, "ERROR:  invalid byte sequence for encoding "UTF8": 0xf0 0x9f 0x98 0xff

when trying to write the emoji to a TEXT or VARCHAR field in the database. Inserting the same string in the database console works as expected. When we read the string and reinsert it, it also works flawlessly.

We've compared the two strings, wrote them to files and compared them with a hex reader, converted them with tcl "encoding convertto" and iconv, all with no luck.

We are using postgres 12 and the nsdbpg module with naviserver-4.99.22-16-g67adf3c34710+

Here is the test case:

In the database console:

CREATE TABLE test (
    idx SERIAL,
    txt TEXT
);

INSERT INTO test (txt) VALUES ('<smiley>😃</smiley>');

In the naviserver console or in a script:

# V1: working
set db [ns_db gethandle]
set sql "SELECT txt FROM test WHERE idx=1"
set selection [ns_db 1row $db $sql]
set str [ns_set value $selection 0]
set sql "INSERT INTO test (txt) VALUES ('$str')"
ns_db dml $db $sql
ns_db releasehandle $db

# V2: not working
set db [ns_db gethandle]
set sql "INSERT INTO test (txt) VALUES ('<smiley>😃</smiley>')"
ns_db dml $db $sql
ns_db releasehandle $db

With nscp, pasting the string of V2 already shows a wrong string in the log:

Notice: nscp:  3: set sql "INSERT INTO test (txt) VALUES ('<smiley>�������</smiley>')"


Whereas V1 works (the smiley is not printed here, but works in the console):

Notice: nscp:  5: puts $str
<smiley></smiley>

Any help is greatly appreciated!

Wolfgang Winkler

Am 09.11.21 um 09:36 schrieb Gustaf Neumann:
Dear all,

The situation is trickier than someone might hope.  Aside of the Tcl version dependencies (as Brian pointed out), Tcl before 8.7 do not support TCL_UTF_MAX with longer multi-byte sequences than 4 (see Tcl TIP 389), which are also mostly relevant for some newer emojis. So, for full emoji support, Tcl 8.7 with the proper compilation options is needed.

Anyhow, in the case of Wolfgang's the "Smiling Face with Open Mouth" we have just a 4-byte UTF-8 character, which is supported by out-of-the-box Tcl 8.6. However, this emoji is represented Tcl-internally as a 6-byte sequence. Since NaviServer wrongly assumed that Tcl-internal representations are also accepted as external representations, a conversion step was omitted for utf-8 (which is not always true).

In the tip version of NaviServer on Bitbucket, this optimization is now removed, the examples work as expected, the regression test is extended for this case.

Many thanks to Wolfgang for the good bug report.

-g





_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel
--

*Wolfgang Winkler*
Geschäftsführung
wolfgang.wink...@digital-concepts.com
mobil +43.699.19971172

dc:*büro*
digital concepts Novak Winkler OG
Software & Design
Landstraße 68, 5. Stock, 4020 Linz
www.digital-concepts.com <http://www.digital-concepts.com>
tel +43.732.997117.72
tel +43.699.1997117.2

Firmenbuchnummer: 192003h
Firmenbuchgericht: Landesgericht Linz

_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to