[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900967#comment-17900967
 ] 

Chris Lohfink commented on CASSANDRA-20102:
-------------------------------------------

+1 for trunk, wont be for 4.1 for sure since this is new feature

> Support simple length functions for blob and string
> ---------------------------------------------------
>
>                 Key: CASSANDRA-20102
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL/Semantics
>            Reporter: Joey Lynch
>            Assignee: Joey Lynch
>            Priority: Low
>             Fix For: 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> -------+--------
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> --------------------+---------------------------
>                   5 |                         2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> --------------------+-------------------------+---------------------------
>                   7 |                      21 |                         2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to