[ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832949#comment-17832949
 ] 

Gideon P commented on SPARK-47413:
----------------------------------

[~uros-db] All that sounds good.

 https://github.com/apache/spark/pull/45738 Here is my PR (WIP)

I have some tests, and have substring implementation basically passing tests. 

However, my redefined implementations of Left and Right are failing my tests. 
They are currently throwing the following exceptions in my new test cases in 
CollationSuite.scala:
```
[COMPLEX_EXPRESSION_UNSUPPORTED_INPUT.MISMATCHED_TYPES] Cannot process input 
data types for the expression: "(IF((1 <= 0), , substring(collate(klm), (- 1), 
2147483647)))". All input types must be the same except nullable, containsNull, 
valueContainsNull flags, but found the input types ["STRING", "STRING COLLATE 
UTF8_BINARY_LCASE"]. SQLSTATE: 42K09
```

I think it is caused by additional work needed on the overridden implementation 
of `replacement` within `Left` and `Right` in `stringExpressions.scala`. I 
think that because the second parameter to Literal is DataType in the 
replacement, rather than AbstractDataType, there is this little issue. I think 
that is what is causing these tests to fail. What do you think? Please advise 
on how to debug this issue. 



> Substring, Right, Left (all collations)
> ---------------------------------------
>
>                 Key: SPARK-47413
>                 URL: https://issues.apache.org/jira/browse/SPARK-47413
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Uroš Bojanić
>            Priority: Major
>              Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to