[
https://issues.apache.org/jira/browse/CASSANDRA-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286781#comment-17286781
]
Yifan Cai commented on CASSANDRA-16429:
---------------------------------------
Thanks everyone for the input!
I would like to revert all the necessary parts in order to maintain the
compatibility. The patch attached does it. And we should also update the
documentation.
Regarding the performance optimization,
* Looking at commit
[c3b014a|https://github.com/apache/cassandra/commit/647bdd6a11970f80666d7f20b53af76fbda4ff14#diff-82bdd361868471a5287c3b014ace6b5d3e6307557983d6eb9ef5dff27b97a408R144-R154],
I introduced {{writeAsciiString}}, which is the fastest for ASCII strings, and
a new implementation of {{writeString}}, which computes the exact size to avoid
overallocation. Both changes, according to the [micro bench
result|https://issues.apache.org/jira/browse/CASSANDRA-15410?focusedCommentId=16975536&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16975536],
provide noticeably speedup comparing with the original implementation that
calls {{ByteBufUtil.writeUtf8}}. In the attached patch, I only reverted the
improper invocations of {{writeAsciiString}} back to the new {{writeString}}.
It should still provide a better performance comparing with the pre-40 code.
* The overall speedup for request execution may not be significant. The
speedup of the encoding is at nanoseconds level, while, the query execution is
at milliseconds level. Even if we are allowed to encode column names using
{{writeAsciiString}}, unless there are thousands of columns to be filled, it
should not affects the overall execution time a lot.
* I would argue to not fully revert back the optimization patch. Because the
patch also avoids overallocation that reduces memory usage at runtime.
> Fix incorrect encoding for strings can be UTF8
> ----------------------------------------------
>
> Key: CASSANDRA-16429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16429
> Project: Cassandra
> Issue Type: Bug
> Components: CQL/Interpreter
> Reporter: Yoshi Kimoto
> Assignee: Yifan Cai
> Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: jptest.cql
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Tables created with Japanese character name columns are working well in C*
> 3.11.10 when doing a SELECT * in cqlsh but will show as garbled (shown as
> "?") in 4.0-beta4. DESCRIBE shows the column names correctly in both cases.
> Run the attached jptest.cql script in both envs with cqlsh -f. They will
> yield different results.
> My test env (MacOS 10.15.7):
> C* 3.11.10 with
> - OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09)
> - Python 2.7.16
> C* 4.0-beta4
> - OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9.1+1)
> - Python 3.8.2
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]