jonahgao commented on issue #7344:
URL: 
https://github.com/apache/arrow-datafusion/issues/7344#issuecomment-1690099314

   
   > The docs would suggest that what is happening here is it is coercing the 
arguments to a string, I'll try to confirm this
   
   I have just checked the 
[documentation](https://www.postgresql.org/docs/current/functions-binarystring.html)
 and found that the length function in PostgresSQL has two different forms:
   ```
   length(bytea) → integer
       Returns the number of bytes in the binary string.
       length('\x1234567890'::bytea) → 5
   
   length(bytes bytea, encoding name) → integer
      Returns the number of characters in the binary string, assuming that it 
is text in the given encoding.
      length('jose'::bytea, 'UTF8') → 4
   ```
   It seems that there will be no type coercion happening.
   
   -----
   
   I think we might need three different types of length functions.
   - `character_length` only operates on text strings.
   - `octet_length` can be applied to both text strings and binaries, and it 
returns the length of the underlying byte array.
   - `length` can be applied to both text strings and binaries.
     - For text strings, it has the same effect as `character_length`.
     - For binaries,  it has the same effect as `octet_length`. 
    
   -----
   The following are the execution results of PostgreSQL for reference only.
   ```
   postgres=#  create table text_table(c text);
   CREATE TABLE
   postgres=#  insert into text_table values('abc'), ('efg你好');
   INSERT 0 2
   postgres=# select length(c), octet_length(c), character_length(c) from 
text_table;
    length | octet_length | character_length
   --------+--------------+------------------
         3 |            3 |                3
         5 |            9 |                5
   (2 rows)
   
   postgres=#
   postgres=#
   postgres=# create table binary_table(c bytea);
   CREATE TABLE
   postgres=#
   postgres=# insert into binary_table values('abc'), ('\x00010203'::bytea);
   INSERT 0 2
   postgres=# select length(c), octet_length(c) from binary_table;
    length | octet_length
   --------+--------------
         3 |            3
         4 |            4
   (2 rows)
   
   postgres=# select character_length(c) from binary_table;
   ERROR:  function character_length(bytea) does not exist
   LINE 1: select character_length(c) from binary_table;
                  ^
   HINT:  No function matches the given name and argument types. You might need 
to add explicit type casts.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to