tshauck opened a new issue, #11834:
URL: https://github.com/apache/datafusion/issues/11834

   ### Is your feature request related to a problem or challenge?
   
   We are working to add complete StringView support in DataFusion, which 
permits potentially much faster processing of string data. See 
https://github.com/apache/datafusion/issues/10918 for more background.
   
   Today, most DataFusion string functions support DataType::Utf8 and 
DataType::LargeUtf8 and when called with a StringView argument DataFusion will 
cast the argument back to DataType::Utf8 which is expensive.
   
   To realize the full speed of StringView, we need to ensure that all string 
functions support the DataType::Utf8View directly.
   
   ### Describe the solution you'd like
   
   Update the function to support DataType::Utf8View directly
   
   ### Describe alternatives you've considered
   
   The typical steps are:
   1. Write some tests showing the function doesn't support Utf8View (see the 
tests in 
[`string_view.slt`](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string_view.slt)
 to ensure the arguments are not being cast
   2. Change the `Signature` of the function to accept `Utf8View` in addition 
to `Utf8`/`LargeUtf8`
   3. Update the implementation of the function to operate on `Utf8View`
   
   Example PRs
   * Update to use an arrow kernel that already supports StringView:  
https://github.com/apache/datafusion/pull/11787
   * Change the implementation to support StringView directly: 
https://github.com/apache/datafusion/pull/11676
   * Change implementation (option 2): 
https://github.com/apache/datafusion/pull/11556
   
   ### Additional context
   
   The documenation of string functions can be found here: 
https://datafusion.apache.org/user-guide/sql/scalar_functions.html#string-functions
   
   To test a function with StringView with `datafusion-cli` you can use an 
example like this (replacing `starts_with` with the relevant function)
   ```sql
   > create table foo as values (arrow_cast('foo', 'Utf8View'), 
arrow_cast('bar', 'Utf8View'));
   0 row(s) fetched.
   Elapsed 0.043 seconds.
   
   > select starts_with(column1, column2) from foo;
   +--------------------------------------+
   | starts_with(foo.column1,foo.column2) |
   +--------------------------------------+
   | false                                |
   +--------------------------------------+
   1 row(s) fetched.
   Elapsed 0.015 seconds.
   ```
   
   To see if it is using utf8 view, use `EXPLAIN` to see the plan and verify 
there is no `CAST`. In this example the `CAST(column1@0 AS Utf8)` indicates 
that the function is not using `Utf8View` natively
   
   
   ```sql
   > explain select starts_with(column1, column2) from foo;
   
+---------------+------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                  |
   
+---------------+------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: starts_with(CAST(foo.column1 AS Utf8), 
CAST(foo.column2 AS Utf8))                                                |
   |               |   TableScan: foo projection=[column1, column2]             
                                                                  |
   | physical_plan | ProjectionExec: expr=[starts_with(CAST(column1@0 AS Utf8), 
CAST(column2@1 AS Utf8)) as starts_with(foo.column1,foo.column2)] |
   |               |   MemoryExec: partitions=1, partition_sizes=[1]            
                                                                  |
   |               |                                                            
                                                                  |
   
+---------------+------------------------------------------------------------------------------------------------------------------------------+
   2 row(s) fetched.
   Elapsed 0.006 seconds.
   ```
   
   It is also often good to test with a constant as well (likewise there should 
be no cast):
   
   ```sql
   > explain select starts_with(column1, 'foo') from foo;
   
+---------------+----------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                              |
   
+---------------+----------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: starts_with(CAST(foo.column1 AS Utf8), 
Utf8("foo"))                                          |
   |               |   TableScan: foo projection=[column1]                      
                                              |
   | physical_plan | ProjectionExec: expr=[starts_with(CAST(column1@0 AS Utf8), 
foo) as starts_with(foo.column1,Utf8("foo"))] |
   |               |   MemoryExec: partitions=1, partition_sizes=[1]            
                                              |
   |               |                                                            
                                              |
   
+---------------+----------------------------------------------------------------------------------------------------------+
   2 row(s) fetched.
   Elapsed 0.002 seconds.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to