goldmedal opened a new issue, #12637:
URL: https://github.com/apache/datafusion/issues/12637

   ### Describe the bug
   
   While working on #12415, I found the `LIKE` and `ILIKE` behavior differs 
between `StringView` and other string types. Given the following data and SQL:
   ```sql
   DataFusion CLI v42.0.0
   > create table test_source as values
     ('Andrew', 'X', 'datafusion📊🔥', '🔥'),
     ('Xiangpeng', 'Xiangpeng', 'datafusion数据融合', 'datafusion数据融合'),
     ('Raphael', 'R', 'datafusionДатаФусион', 'аФус'),
     (NULL, 'R', NULL, '🔥');
   0 row(s) fetched. 
   Elapsed 0.028 seconds.
   
   > create table test_basic_operator_utf8view as
   select
       arrow_cast(column1, 'Utf8View') as ascii_1,
       arrow_cast(column2, 'Utf8View') as ascii_2,
       arrow_cast(column3, 'Utf8View') as unicode_1,
       arrow_cast(column4, 'Utf8View') as unicode_2
   from test_source;
   0 row(s) fetched. 
   Elapsed 0.002 seconds.
   
   > create table test_basic_operator_utf8 as
   select
       arrow_cast(column1, 'Utf8') as ascii_1,
       arrow_cast(column2, 'Utf8') as ascii_2,
       arrow_cast(column3, 'Utf8') as unicode_1,
       arrow_cast(column4, 'Utf8') as unicode_2
   from test_source;
   0 row(s) fetched. 
   Elapsed 0.004 seconds.
   
   -- StringView Table ( pattern contain % )
   > select ascii_1, unicode_1,
          ascii_1 like 'An%' as ascii_like,
          unicode_1 like '%ion数据%' as unicode_like,
          ascii_1 ilike 'An%' as ascii_ilike,
          unicode_1 ilike '%ion数据%' as unicode_ilik
   from test_basic_operator_utf8view;
   
+-----------+----------------------+------------+--------------+-------------+--------------+
   | ascii_1   | unicode_1            | ascii_like | unicode_like | ascii_ilike 
| unicode_ilik |
   
+-----------+----------------------+------------+--------------+-------------+--------------+
   | Andrew    | datafusion📊🔥       | true       | false        | true        | 
false        |
   | Xiangpeng | datafusion数据融合   | false      | true         | false       | 
true         |
   | Raphael   | datafusionДатаФусион | false      | false        | false       
| false        |
   |           |                      | false      | false        | false       
|              |
   
+-----------+----------------------+------------+--------------+-------------+--------------+
   4 row(s) fetched. 
   Elapsed 0.004 seconds.
   
   -- StringView Table ( pattern without % )
   > select ascii_1, unicode_1,
          ascii_1 like 'An' as ascii_like,
          unicode_1 like '%ion数据' as unicode_like,
          ascii_1 ilike 'An' as ascii_ilike,
          unicode_1 ilike 'ion数据' as unicode_ilik
   from test_basic_operator_utf8view;
   
+-----------+----------------------+------------+--------------+-------------+--------------+
   | ascii_1   | unicode_1            | ascii_like | unicode_like | ascii_ilike 
| unicode_ilik |
   
+-----------+----------------------+------------+--------------+-------------+--------------+
   | Andrew    | datafusion📊🔥       | false      | false        | false       | 
false        |
   | Xiangpeng | datafusion数据融合   | false      | false        | false       | 
false        |
   | Raphael   | datafusionДатаФусион | false      | false        | false       
| false        |
   |           |                      |            | false        |             
|              |
   
+-----------+----------------------+------------+--------------+-------------+--------------+
   4 row(s) fetched. 
   Elapsed 0.004 seconds.
   
   -- String Table (same as LargeString and DictionaryString)
   > select ascii_1, unicode_1,
          ascii_1 like 'An%' as ascii_like,
          unicode_1 like '%ion数据%' as unicode_like,
          ascii_1 ilike 'An%' as ascii_ilike,
          unicode_1 ilike '%ion数据%' as unicode_ilik
   from test_basic_operator_utf8;
   
+-----------+----------------------+------------+--------------+-------------+--------------+
   | ascii_1   | unicode_1            | ascii_like | unicode_like | ascii_ilike 
| unicode_ilik |
   
+-----------+----------------------+------------+--------------+-------------+--------------+
   | Andrew    | datafusion📊🔥       | true       | false        | true        | 
false        |
   | Xiangpeng | datafusion数据融合   | false      | true         | false       | 
true         |
   | Raphael   | datafusionДатаФусион | false      | false        | false       
| false        |
   |           |                      |            |              |             
|              |
   
+-----------+----------------------+------------+--------------+-------------+--------------+
   4 row(s) fetched. 
   Elapsed 0.004 seconds.
   ```
   
   When the input value is NULL, string type will return `NULL` but string view 
will return false. (Something is interesting about the ILIKE operation is 
different between  ASCII-only and Unicode `StringView` 🤔 )
   
   ## Some testing for StringView ScalarValue
   When the matching pattern contains `%`, it will return `false` instead 
`null`.
   ```sql
   > select
       arrow_cast(null, 'Utf8View') like '123' as utf8view_like_ascii,
       arrow_cast(null, 'Utf8View') ilike '123' as utf8view_ilike_ascii,
       arrow_cast(null, 'Utf8View') like '123%' as "utf8view_like_ascii%",
       arrow_cast(null, 'Utf8View') ilike '123%' as "utf8view_ilike_ascii%";
   
+---------------------+----------------------+----------------------+-----------------------+
   | utf8view_like_ascii | utf8view_ilike_ascii | utf8view_like_ascii% | 
utf8view_ilike_ascii% |
   
+---------------------+----------------------+----------------------+-----------------------+
   |                     |                      | false                | false  
               |
   
+---------------------+----------------------+----------------------+-----------------------+
   1 row(s) fetched. 
   Elapsed 0.001 seconds.
   
   > select    
       arrow_cast(null, 'Utf8View') like '數據' as "utf8view_like_unicode",
       arrow_cast(null, 'Utf8View') ilike '數據' as "utf8view_ilike_unicode",
       arrow_cast(null, 'Utf8View') like '數據%' as "utf8view_like_unicode%",
       arrow_cast(null, 'Utf8View') ilike '數據%' as "utf8view_ilike_unicode%";
   
+-----------------------+------------------------+------------------------+-------------------------+
   | utf8view_like_unicode | utf8view_ilike_unicode | utf8view_like_unicode% | 
utf8view_ilike_unicode% |
   
+-----------------------+------------------------+------------------------+-------------------------+
   |                       |                        | false                  |  
                       |
   
+-----------------------+------------------------+------------------------+-------------------------+
   1 row(s) fetched. 
   Elapsed 0.001 seconds.
   ```
   
   ## Some testing for String ScalarValue  (Same as LargeString and 
DictionaryString)
   ```sql
   > select
       arrow_cast(null, 'Utf8') like '123' as utf8_like_ascii,
       arrow_cast(null, 'Utf8') ilike '123' as utf8_ilike_ascii,
       arrow_cast(null, 'Utf8') like '123%' as "utf8_like_ascii%",
       arrow_cast(null, 'Utf8') ilike '123%' as "utf8_ilike_ascii%";
   +-----------------+------------------+------------------+-------------------+
   | utf8_like_ascii | utf8_ilike_ascii | utf8_like_ascii% | utf8_ilike_ascii% |
   +-----------------+------------------+------------------+-------------------+
   |                 |                  |                  |                   |
   +-----------------+------------------+------------------+-------------------+
   1 row(s) fetched. 
   Elapsed 0.001 seconds.
   
   > select    
       arrow_cast(null, 'Utf8') like '數據' as "utf8_like_unicode",
       arrow_cast(null, 'Utf8') ilike '數據' as "utf8_ilike_unicode",
       arrow_cast(null, 'Utf8') like '數據%' as "utf8_like_unicode%",
       arrow_cast(null, 'Utf8') ilike '數據%' as "utf8_ilike_unicode%";
   
+-------------------+--------------------+--------------------+---------------------+
   | utf8_like_unicode | utf8_ilike_unicode | utf8_like_unicode% | 
utf8_ilike_unicode% |
   
+-------------------+--------------------+--------------------+---------------------+
   |                   |                    |                    |              
       |
   
+-------------------+--------------------+--------------------+---------------------+
   1 row(s) fetched. 
   Elapsed 0.002 seconds.
   ````
   
   
   ### To Reproduce
   
   Run the SQLs mentioned above.
   
   ### Expected behavior
   
   I'm not really sure if the behavior of StringView is expected 🤔 but I think 
their behavior should be consistent.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to