[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900097#comment-17900097
 ] 

Stefan Miklosovic commented on CASSANDRA-20102:
-----------------------------------------------

Somewhat tangential to this is CASSANDRA-19546 I would love to see both in and 
I think that it is great idea to simplify life for people in this regard.

https://github.com/apache/cassandra/pull/3239/files#diff-29608698944826e9532ac0678a1c6166e00e0f07ec6d81ead60a2bba484f4407

> Support simple length functions for blob and string
> ---------------------------------------------------
>
>                 Key: CASSANDRA-20102
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL/Semantics
>            Reporter: Joey Lynch
>            Assignee: Joey Lynch
>            Priority: Low
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octetlength}} - Defined on all types and returns the number of underlying 
> bytes in the bytebuffer representation of the column. Does _not_ account for 
> storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> -------+--------
>  hello | 0x12af
> cqlsh> select length(key), octetlength(value) from test.test;
>  system.length(key) | system.octetlength(value)
> --------------------+---------------------------
>                   5 |                         2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octetlength(key), octetlength(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octetlength(key) | system.octetlength(value)
> --------------------+-------------------------+---------------------------
>                   7 |                      21 |                         2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to