On 16-05-2022 21:00, Adriano dos Santos Fernandes wrote:
On 16/05/2022 12:07, Mark Rotteveel wrote:
On 16-05-2022 16:50, Mark Rotteveel wrote:
I was running some tests against Firebird-5.0.0.494-0_x64 (latest
snapshot, from last Saturday), and I notice that I get incorrect
string right truncation errors with CHAR/VARCHAR.

I currently cannot dive deeper into it, but as a datapoint, the error
does not occur with Firebird-5.0.0.488-0_x64.

Based solely on the commit message, maybe this commit is at fault?
https://github.com/FirebirdSQL/firebird/commit/dd18a3b11b28c3ed8126a6f54b829989954bfa03


The Jaybird test that triggers this is
org.firebirdsql.jdbc.TestFBPreparedStatementUTF8, specifically tests:

- connectionUtf8_insertMultiByte_inWin1252_char_1_win1252_succeeds
(fails with "expected length 1, actual 2")
[..]
- connectionUtf8_insertMultiple_lessThanMax_varchar_5_utf8_succeeds
(fails with "expected length 5, actual 6")

For CHAR, it seems to count 1 more character, for VARCHAR 2-3 more
characters.


Is this happening with fbclient library too?

Good question: no it doesn't. Which suggests Jaybird is doing something different. Jaybird uses blr_varying/blr_text, not blr_varying2/blr_text2 when sending the BLR of the execute. Could that make a difference?

Changing Jaybird to write blr_varying2/blr_text2 with the sub type (character set) fixes the issue.

However, I thought that using blr_varying/blr_text (judging by MetadataFromBlr::MetadataFromBlr in common\classes\InternalMessageBuffer.cpp, Firebird selects CS_dynamic, which should use the connection character set (UTF8) in this case. Is it possible that `CharSet* const fromCharSet = INTL_charset_lookup(tdbb, from_cs);` doesn't handle CS_dynamic properly? Or that this happens in a codepath where BLR decoding doesn't assign CS_dynamic, but it remains zero (NONE)?

Is the error in insert or select?

The error happens on insert.

I'm failing to reproduce it in isql, for example for
connectionUtf8_insertMultipleInWin1252_varchar_5_win1252_succeeds:

-----
CREATE TABLE utf8table (
  id INTEGER,
  char_1_none CHAR(1) CHARACTER SET NONE,
  char_1_utf8 CHAR(1) CHARACTER SET UTF8,
  char_1_win1252 CHAR(1) CHARACTER SET WIN1252,
  char_5_none CHAR(5) CHARACTER SET NONE,
  char_5_utf8 CHAR(5) CHARACTER SET UTF8,
  char_5_win1252 CHAR(5) CHARACTER SET WIN1252,
  varchar_5_none VARCHAR(5) CHARACTER SET NONE,
  varchar_5_utf8 VARCHAR(5) CHARACTER SET UTF8,
  varchar_5_win1252 VARCHAR(5) CHARACTER SET WIN1252
);

/*
select unicode_char(0x00FE) || unicode_char(0x00A3) || 'a' ||
unicode_char(0x0160) || ',' from rdb$database;
*/

set bulk_insert INSERT INTO utf8table (id, char_5_win1252) VALUES (?, ?);
(1, 'þ£aŠ,')
stop
;

select char_5_win1252 from utf8table where id = 1;
-----

The problem seems to be it is counting characters UTF8 using WIN1252 or maybe NONE, so UTF8 characters that take two bytes are counted as two characters.

As an example, if I change a failing test to use "12345" it works, if I change it to "\u00fe2345" (\u00fe is two bytes in UTF-8), the error is "expected length 5, actual 6".

If I change it to "\u20ac2345" (\u20ac is three bytes in UTF-8), the error is "expected length 5, actual 7".

Mark
--
Mark Rotteveel


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to