[
https://issues.apache.org/jira/browse/CASSANDRA-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andy Tolbert updated CASSANDRA-10539:
-------------------------------------
Description:
[From the java-driver mailing
list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI]
/ [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955]
If you have nodes in your cluster that are using a different default character
set it's possible for nodes to generate different prepared statement ids for
the same 'keyspace + query string' combination. I imagine this is not a very
typical or desired configuration (thus the low severity).
This is because
[MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54]
uses
[String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()]
which relies on the default charset.
In the general case this is fine, but if you use some characters in your query
string such as
[Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE]
('\uffff') the byte representation may vary based on the coding.
I was able to reproduce this configuring a 2-node cluster with node1 using
file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}. The
java-driver test that demonstrates this can be found
[here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java].
was:
[From the java-driver mailing
list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI]
/ [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955]
If you have nodes in your cluster that are using a different default character
set it's possible for nodes to generate different prepared statement ids for
the same 'keyspace + query string' combination. I imagine this is not a very
typical or desired configuration (thus the low severity).
This is because
[MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54]
uses
[String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()]
which relies on the default charset.
In the general case this is fine, but if you use some characters in your query
string such as
[Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE]
('\uffff') the byte representation may vary based on the coding.
I was able to reproduce this configuring a 2-node cluster with node1 using
file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}. The
java-driver test demonstrates this can be found
[here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java].
> Different encodings used between nodes can cause inconsistently generated
> prepared statement ids
> -------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-10539
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10539
> Project: Cassandra
> Issue Type: Bug
> Reporter: Andy Tolbert
> Priority: Minor
>
> [From the java-driver mailing
> list|https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/3Aa7s0u2ZrI]
> / [JAVA-955|https://datastax-oss.atlassian.net/browse/JAVA-955]
> If you have nodes in your cluster that are using a different default
> character set it's possible for nodes to generate different prepared
> statement ids for the same 'keyspace + query string' combination. I imagine
> this is not a very typical or desired configuration (thus the low severity).
> This is because
> [MD5Digest.compute(String)|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/utils/MD5Digest.java#L51-L54]
> uses
> [String.getBytes()|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()]
> which relies on the default charset.
> In the general case this is fine, but if you use some characters in your
> query string such as
> [Character.MAX_VALUE|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#MAX_VALUE]
> ('\uffff') the byte representation may vary based on the coding.
> I was able to reproduce this configuring a 2-node cluster with node1 using
> file.encoding {{UTF-8}} and node2 using file.encoding {{ISO-8859-1}}. The
> java-driver test that demonstrates this can be found
> [here|https://github.com/datastax/java-driver/blob/java955/driver-core/src/test/java/com/datastax/driver/core/RetryOnUnpreparedTest.java].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)