[ 
https://issues.apache.org/jira/browse/SPARK-55430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057450#comment-18057450
 ] 

Natea Eshetu Beshada commented on SPARK-55430:
----------------------------------------------

i would like to assign myself this issue but i cant seem to

>   [SQL] Cache ICU StringSearch for collation string predicates with constant 
> patterns
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-55430
>                 URL: https://issues.apache.org/jira/browse/SPARK-55430
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Natea Eshetu Beshada
>            Priority: Major
>
>   This PR adds StringSearch object caching for Contains, StartsWith, and 
> EndsWith expressions when used with ICU-based collations (UNICODE, 
> UNICODE_CI) and a compile-time constant (foldable) pattern.
>   Currently, every row evaluation creates a new com.ibm.icu.text.StringSearch 
> object, which involves setting up the ICU collator and pattern matcher from 
> scratch. When the pattern is a constant (e.g., col LIKE
>   '%abc%' or contains(col, 'abc')), this repeated construction is unnecessary.
>   With this change, a single StringSearch is created once and reused across 
> rows by calling setTarget() for each new input string. This applies to both 
> the interpreted path (via @transient private lazy val) and the
>    codegen path (via ctx.addMutableState).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to