[
https://issues.apache.org/jira/browse/CASSANDRA-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nabeel Shahzad updated CASSANDRA-5180:
--------------------------------------
Description:
When a double/float is used in a map (key or value), list, or set types, the
decoding is done as a utf8 string, which then incorrectly parses and adds extra
bytes.
For example:
The bytes of a map <double, double> (this is coming out of the Thrift call)
{noformat}
00 01 00 08 3f f4 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
{noformat}
But after it's been parsed out from the field as UTF8:
{noformat}
00 01 00 08 3f 3f 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
{noformat}
As you can see there's an incorrect byte (the 3f where the f4, and an extra
00). For reference, this value was map<double, double> = {1.25: 2.25}. This is
the same behavior for floats. The f4 translated to ASCII 247, which I believe
isn't a valid utf8 code.
I have seen cases (which I'm trying to get right now) where there are *extra*
bytes added in, which breaks the parsing based on byte size.
The actual value of the field becomes:
{noformat}
value:
'\u0000\u0002\u0000\b??\u0000\u0000\u0000\u0000\u0000\u0000\u0000\b@\u0002\u0000\u0000\u0000\u0000\u0000\u0000''
{noformat}
Where the \b = 8, ? = f4, ? = unknown char.
It seems to me when the "ftype" is parsed (int16) before the actual field, it's
returning a TYPE value of "11" (string) - instead of the proper value of a
map/set/list.
So this messes up any parsing based on the byte-length for the field, since
there are a variable number of extra bytes added, either to the key or value of
the map, and any values of a list.
For reference, the table, and an insert example:
{noformat}
CREATE TABLE sample_map (
id text PRIMARY KEY,
map_col_text map < text, text >,
map_col_int map < int, text >,
map_col_float map < float, float >,
map_col_double map < double, double >
);
INSERT INTO sample_map (id, map_col_double) VALUES('DOUBLE_ROW_SINGLE',
{10.1415: 20.9876});
{{noformat}}
Not sure if it matters, but this was using CQL3
was:
When a double/float is used in a map (key or value), list, or set types, the
decoding is done as a utf8 string, which then incorrectly parses and adds extra
bytes.
For example:
The bytes of a map <double, double> (this is coming out of the Thrift call)
{noformat}
00 01 00 08 3f f4 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
{noformat}
But after it's been parsed out from the field as UTF8:
{noformat}
00 01 00 08 3f 3f 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
{noformat}
As you can see there's an extra byte (the 3f where the f4, and an extra 00).
For reference, this value was map<double, double> = {1.25: 2.25}. This is the
same behavior for floats. The f4 translated to ASCII 247, which I believe isn't
a valid utf8 code.
The actual value of the field becomes:
{noformat}
value:
'\u0000\u0002\u0000\b??\u0000\u0000\u0000\u0000\u0000\u0000\u0000\b@\u0002\u0000\u0000\u0000\u0000\u0000\u0000''
{noformat}
Where the \b = 8, ? = f4, ? = unknown char.
It seems to me when the "ftype" is parsed (int16) before the actual field, it's
returning a TYPE value of "11" (string) - instead of the proper value of a
map/set/list.
So this messes up any parsing based on the byte-length for the field, since
there are a variable number of extra bytes added, either to the key or value of
the map, and any values of a list.
For reference, the table, and an insert example:
{noformat}
CREATE TABLE sample_map (
id text PRIMARY KEY,
map_col_text map < text, text >,
map_col_int map < int, text >,
map_col_float map < float, float >,
map_col_double map < double, double >
);
INSERT INTO sample_map (id, map_col_double) VALUES('DOUBLE_ROW_SINGLE',
{10.1415: 20.9876});
{{noformat}}
Not sure if it matters, but this was using CQL3
> NodeJS Thrift generated file incorrectly parses map/list/sets when
> doubles/floats are used
> ------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-5180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5180
> Project: Cassandra
> Issue Type: Bug
> Reporter: Nabeel Shahzad
>
> When a double/float is used in a map (key or value), list, or set types, the
> decoding is done as a utf8 string, which then incorrectly parses and adds
> extra bytes.
> For example:
> The bytes of a map <double, double> (this is coming out of the Thrift call)
> {noformat}
> 00 01 00 08 3f f4 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
> {noformat}
> But after it's been parsed out from the field as UTF8:
> {noformat}
> 00 01 00 08 3f 3f 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
> {noformat}
> As you can see there's an incorrect byte (the 3f where the f4, and an extra
> 00). For reference, this value was map<double, double> = {1.25: 2.25}. This
> is the same behavior for floats. The f4 translated to ASCII 247, which I
> believe isn't a valid utf8 code.
> I have seen cases (which I'm trying to get right now) where there are *extra*
> bytes added in, which breaks the parsing based on byte size.
> The actual value of the field becomes:
> {noformat}
> value:
> '\u0000\u0002\u0000\b??\u0000\u0000\u0000\u0000\u0000\u0000\u0000\b@\u0002\u0000\u0000\u0000\u0000\u0000\u0000''
> {noformat}
> Where the \b = 8, ? = f4, ? = unknown char.
> It seems to me when the "ftype" is parsed (int16) before the actual field,
> it's returning a TYPE value of "11" (string) - instead of the proper value of
> a map/set/list.
> So this messes up any parsing based on the byte-length for the field, since
> there are a variable number of extra bytes added, either to the key or value
> of the map, and any values of a list.
> For reference, the table, and an insert example:
> {noformat}
> CREATE TABLE sample_map (
> id text PRIMARY KEY,
> map_col_text map < text, text >,
> map_col_int map < int, text >,
> map_col_float map < float, float >,
> map_col_double map < double, double >
> );
> INSERT INTO sample_map (id, map_col_double) VALUES('DOUBLE_ROW_SINGLE',
> {10.1415: 20.9876});
> {{noformat}}
> Not sure if it matters, but this was using CQL3
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira