[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900967#comment-17900967 ]
Chris Lohfink commented on CASSANDRA-20102: ------------------------------------------- +1 for trunk, wont be for 4.1 for sure since this is new feature > Support simple length functions for blob and string > --------------------------------------------------- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics > Reporter: Joey Lynch > Assignee: Joey Lynch > Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > -------+-------- > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > --------------------+--------------------------- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > --------------------+-------------------------+--------------------------- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org