goldmedal opened a new issue, #12637: URL: https://github.com/apache/datafusion/issues/12637
### Describe the bug While working on #12415, I found the `LIKE` and `ILIKE` behavior differs between `StringView` and other string types. Given the following data and SQL: ```sql DataFusion CLI v42.0.0 > create table test_source as values ('Andrew', 'X', 'datafusion📊🔥', '🔥'), ('Xiangpeng', 'Xiangpeng', 'datafusion数据融合', 'datafusion数据融合'), ('Raphael', 'R', 'datafusionДатаФусион', 'аФус'), (NULL, 'R', NULL, '🔥'); 0 row(s) fetched. Elapsed 0.028 seconds. > create table test_basic_operator_utf8view as select arrow_cast(column1, 'Utf8View') as ascii_1, arrow_cast(column2, 'Utf8View') as ascii_2, arrow_cast(column3, 'Utf8View') as unicode_1, arrow_cast(column4, 'Utf8View') as unicode_2 from test_source; 0 row(s) fetched. Elapsed 0.002 seconds. > create table test_basic_operator_utf8 as select arrow_cast(column1, 'Utf8') as ascii_1, arrow_cast(column2, 'Utf8') as ascii_2, arrow_cast(column3, 'Utf8') as unicode_1, arrow_cast(column4, 'Utf8') as unicode_2 from test_source; 0 row(s) fetched. Elapsed 0.004 seconds. -- StringView Table ( pattern contain % ) > select ascii_1, unicode_1, ascii_1 like 'An%' as ascii_like, unicode_1 like '%ion数据%' as unicode_like, ascii_1 ilike 'An%' as ascii_ilike, unicode_1 ilike '%ion数据%' as unicode_ilik from test_basic_operator_utf8view; +-----------+----------------------+------------+--------------+-------------+--------------+ | ascii_1 | unicode_1 | ascii_like | unicode_like | ascii_ilike | unicode_ilik | +-----------+----------------------+------------+--------------+-------------+--------------+ | Andrew | datafusion📊🔥 | true | false | true | false | | Xiangpeng | datafusion数据融合 | false | true | false | true | | Raphael | datafusionДатаФусион | false | false | false | false | | | | false | false | false | | +-----------+----------------------+------------+--------------+-------------+--------------+ 4 row(s) fetched. Elapsed 0.004 seconds. -- StringView Table ( pattern without % ) > select ascii_1, unicode_1, ascii_1 like 'An' as ascii_like, unicode_1 like '%ion数据' as unicode_like, ascii_1 ilike 'An' as ascii_ilike, unicode_1 ilike 'ion数据' as unicode_ilik from test_basic_operator_utf8view; +-----------+----------------------+------------+--------------+-------------+--------------+ | ascii_1 | unicode_1 | ascii_like | unicode_like | ascii_ilike | unicode_ilik | +-----------+----------------------+------------+--------------+-------------+--------------+ | Andrew | datafusion📊🔥 | false | false | false | false | | Xiangpeng | datafusion数据融合 | false | false | false | false | | Raphael | datafusionДатаФусион | false | false | false | false | | | | | false | | | +-----------+----------------------+------------+--------------+-------------+--------------+ 4 row(s) fetched. Elapsed 0.004 seconds. -- String Table (same as LargeString and DictionaryString) > select ascii_1, unicode_1, ascii_1 like 'An%' as ascii_like, unicode_1 like '%ion数据%' as unicode_like, ascii_1 ilike 'An%' as ascii_ilike, unicode_1 ilike '%ion数据%' as unicode_ilik from test_basic_operator_utf8; +-----------+----------------------+------------+--------------+-------------+--------------+ | ascii_1 | unicode_1 | ascii_like | unicode_like | ascii_ilike | unicode_ilik | +-----------+----------------------+------------+--------------+-------------+--------------+ | Andrew | datafusion📊🔥 | true | false | true | false | | Xiangpeng | datafusion数据融合 | false | true | false | true | | Raphael | datafusionДатаФусион | false | false | false | false | | | | | | | | +-----------+----------------------+------------+--------------+-------------+--------------+ 4 row(s) fetched. Elapsed 0.004 seconds. ``` When the input value is NULL, string type will return `NULL` but string view will return false. (Something is interesting about the ILIKE operation is different between ASCII-only and Unicode `StringView` 🤔 ) ## Some testing for StringView ScalarValue When the matching pattern contains `%`, it will return `false` instead `null`. ```sql > select arrow_cast(null, 'Utf8View') like '123' as utf8view_like_ascii, arrow_cast(null, 'Utf8View') ilike '123' as utf8view_ilike_ascii, arrow_cast(null, 'Utf8View') like '123%' as "utf8view_like_ascii%", arrow_cast(null, 'Utf8View') ilike '123%' as "utf8view_ilike_ascii%"; +---------------------+----------------------+----------------------+-----------------------+ | utf8view_like_ascii | utf8view_ilike_ascii | utf8view_like_ascii% | utf8view_ilike_ascii% | +---------------------+----------------------+----------------------+-----------------------+ | | | false | false | +---------------------+----------------------+----------------------+-----------------------+ 1 row(s) fetched. Elapsed 0.001 seconds. > select arrow_cast(null, 'Utf8View') like '數據' as "utf8view_like_unicode", arrow_cast(null, 'Utf8View') ilike '數據' as "utf8view_ilike_unicode", arrow_cast(null, 'Utf8View') like '數據%' as "utf8view_like_unicode%", arrow_cast(null, 'Utf8View') ilike '數據%' as "utf8view_ilike_unicode%"; +-----------------------+------------------------+------------------------+-------------------------+ | utf8view_like_unicode | utf8view_ilike_unicode | utf8view_like_unicode% | utf8view_ilike_unicode% | +-----------------------+------------------------+------------------------+-------------------------+ | | | false | | +-----------------------+------------------------+------------------------+-------------------------+ 1 row(s) fetched. Elapsed 0.001 seconds. ``` ## Some testing for String ScalarValue (Same as LargeString and DictionaryString) ```sql > select arrow_cast(null, 'Utf8') like '123' as utf8_like_ascii, arrow_cast(null, 'Utf8') ilike '123' as utf8_ilike_ascii, arrow_cast(null, 'Utf8') like '123%' as "utf8_like_ascii%", arrow_cast(null, 'Utf8') ilike '123%' as "utf8_ilike_ascii%"; +-----------------+------------------+------------------+-------------------+ | utf8_like_ascii | utf8_ilike_ascii | utf8_like_ascii% | utf8_ilike_ascii% | +-----------------+------------------+------------------+-------------------+ | | | | | +-----------------+------------------+------------------+-------------------+ 1 row(s) fetched. Elapsed 0.001 seconds. > select arrow_cast(null, 'Utf8') like '數據' as "utf8_like_unicode", arrow_cast(null, 'Utf8') ilike '數據' as "utf8_ilike_unicode", arrow_cast(null, 'Utf8') like '數據%' as "utf8_like_unicode%", arrow_cast(null, 'Utf8') ilike '數據%' as "utf8_ilike_unicode%"; +-------------------+--------------------+--------------------+---------------------+ | utf8_like_unicode | utf8_ilike_unicode | utf8_like_unicode% | utf8_ilike_unicode% | +-------------------+--------------------+--------------------+---------------------+ | | | | | +-------------------+--------------------+--------------------+---------------------+ 1 row(s) fetched. Elapsed 0.002 seconds. ```` ### To Reproduce Run the SQLs mentioned above. ### Expected behavior I'm not really sure if the behavior of StringView is expected 🤔 but I think their behavior should be consistent. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org