[jira] [Updated] (CASSANDRA-17617) CQLSH unicode control character list is too liberal

Tanuj Nayak (Jira) Mon, 09 May 2022 20:03:06 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-17617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tanuj Nayak updated CASSANDRA-17617:
------------------------------------
    Description: 
It appears that the list of escaped unicode control characters 
[here|https://github.com/apache/cassandra/blob/53a67ff2c36d90d337aba1409498de29931d4279/pylib/cqlshlib/formatting.py#L32]
 is a bit too liberal. It seems to include characters such as '1' (0x31) and 
'0' (0x30) which do not need to be escaped. It seems that the actual range 
should be 0x00 - 0x1F and 0x7F+ as corroborated [by this 
page|[https://en.wikipedia.org/wiki/Unicode_control_characters].]

 

This causes unnecessary escaping and regex substitutions on the CQLSH end 
whenever common characters such as any punctuation or a 0 or a 1 appear in the 
text column of a table. One might notice that a table with a text column filled 
with 2's will take much less time to print than one with all 0's for this 
reason.

  was:
It appears that the list of escaped unicode control characters 
[here|https://github.com/apache/cassandra/blob/53a67ff2c36d90d337aba1409498de29931d4279/pylib/cqlshlib/formatting.py#L32]
 is a bit too liberal. It seems to include characters such as '1' (0x31) and 
'0' (0x30) which do not need to be escaped. It seems that the actual range 
should be 0x00 - 0x1F and 0x7F+ as corroborated 
[here|[https://en.wikipedia.org/wiki/Unicode_control_characters].]

 

This causes unnecessary escaping and regex substitutions on the CQLSH end 
whenever common characters such as any punctuation or a 0 or a 1 appear in the 
text column of a table. One might notice that a table with a text column filled 
with 2's will take much less time to print than one with all 0's for this 
reason.


> CQLSH unicode control character list is too liberal
> ---------------------------------------------------
>
>                 Key: CASSANDRA-17617
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17617
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL/Interpreter
>            Reporter: Tanuj Nayak
>            Assignee: Tanuj Nayak
>            Priority: Normal
>             Fix For: 3.11.x, 4.0.x, 4.1.x
>
>
> It appears that the list of escaped unicode control characters 
> [here|https://github.com/apache/cassandra/blob/53a67ff2c36d90d337aba1409498de29931d4279/pylib/cqlshlib/formatting.py#L32]
>  is a bit too liberal. It seems to include characters such as '1' (0x31) and 
> '0' (0x30) which do not need to be escaped. It seems that the actual range 
> should be 0x00 - 0x1F and 0x7F+ as corroborated [by this 
> page|[https://en.wikipedia.org/wiki/Unicode_control_characters].]
>  
> This causes unnecessary escaping and regex substitutions on the CQLSH end 
> whenever common characters such as any punctuation or a 0 or a 1 appear in 
> the text column of a table. One might notice that a table with a text column 
> filled with 2's will take much less time to print than one with all 0's for 
> this reason.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-17617) CQLSH unicode control character list is too liberal

Reply via email to