Tyler Hobbs created CASSANDRA-8101:
--------------------------------------
Summary: Invalid ASCII and UTF-8 chars not rejected in CQL string
literals
Key: CASSANDRA-8101
URL: https://issues.apache.org/jira/browse/CASSANDRA-8101
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Tyler Hobbs
Assignee: Tyler Hobbs
Priority: Critical
Fix For: 2.0.11, 2.1.1
When processing CQL string literals, we ultimately use
{{String.getBytes(Charset)}}, which has the following note:
{quote}
This method always replaces malformed-input and unmappable-character sequences
with this charset's default replacement byte array. The CharsetEncoder class
should be used when more control over the encoding process is required.
{quote}
So, if we insert a non-ASCII character into an ascii string literal, it will be
replaced with a {{?}} char. Something similar happens for UTF-8.
For example:
{noformat}
cqlsh:ks1> create table badstrings (a int primary key, b ascii);
cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ');
cqlsh:ks1> select * from badstrings;
a | b
---+------
0 | ????
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)