[jira] [Commented] (CASSANDRA-16429) Fix incorrect encoding for strings can be UTF8

Yifan Cai (Jira) Thu, 18 Feb 2021 16:16:05 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286781#comment-17286781
 ]


Yifan Cai commented on CASSANDRA-16429:
---------------------------------------

Thanks everyone for the input!

I would like to revert all the necessary parts in order to maintain the 
compatibility. The patch attached does it. And we should also update the 
documentation. 

Regarding the performance optimization,
 * Looking at commit 
[c3b014a|https://github.com/apache/cassandra/commit/647bdd6a11970f80666d7f20b53af76fbda4ff14#diff-82bdd361868471a5287c3b014ace6b5d3e6307557983d6eb9ef5dff27b97a408R144-R154],
 I introduced {{writeAsciiString}}, which is the fastest for ASCII strings, and 
a new implementation of {{writeString}}, which computes the exact size to avoid 
overallocation. Both changes, according to the [micro bench 
result|https://issues.apache.org/jira/browse/CASSANDRA-15410?focusedCommentId=16975536&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16975536],
 provide noticeably speedup comparing with the original implementation that 
calls {{ByteBufUtil.writeUtf8}}. In the attached patch, I only reverted the 
improper invocations of {{writeAsciiString}} back to the new {{writeString}}. 
It should still provide a better performance comparing with the pre-40 code. 
 * The overall speedup for request execution may not be significant. The 
speedup of the encoding is at nanoseconds level, while, the query execution is 
at milliseconds level. Even if we are allowed to encode column names using 
{{writeAsciiString}}, unless there are thousands of columns to be filled, it 
should not affects the overall execution time a lot. 
 * I would argue to not fully revert back the optimization patch. Because the 
patch also avoids overallocation that reduces memory usage at runtime.  

> Fix incorrect encoding for strings can be UTF8
> ----------------------------------------------
>
>                 Key: CASSANDRA-16429
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16429
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL/Interpreter
>            Reporter: Yoshi Kimoto
>            Assignee: Yifan Cai
>            Priority: Normal
>             Fix For: 4.0-beta
>
>         Attachments: jptest.cql
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tables created with Japanese character name columns are working well in C* 
> 3.11.10 when doing a SELECT * in cqlsh but will show as garbled (shown as 
> "?") in 4.0-beta4. DESCRIBE shows the column names correctly in both cases.
> Run the attached jptest.cql script in both envs with cqlsh -f. They will 
> yield different results.
> My test env (MacOS 10.15.7):
> C* 3.11.10 with
>  - OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09)
>  - Python 2.7.16
> C* 4.0-beta4
>  - OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9.1+1)
>  - Python 3.8.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-16429) Fix incorrect encoding for strings can be UTF8

Reply via email to