[
https://issues.apache.org/jira/browse/HUDI-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-3594:
-----------------------------
Sprint: Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21 (was:
Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14)
> Support standard Spark functions in Filter Exprs in Data Skipping
> -----------------------------------------------------------------
>
> Key: HUDI-3594
> URL: https://issues.apache.org/jira/browse/HUDI-3594
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.11.0
>
>
> As part of this effort we're planning to (at the very least) support a suite
> of standard Spark functions when evaluating Data Filtering expressions w/in
> Data Skipping flow, for ex: when user is issuing a following query
>
> {code:java}
> SELECT ... WHERE date_format(ts, 'dd-mm-yyyy') > '01-01-2022'
> {code}
> We're able to relate such query to our Column Stats Index appropriately,
> therefore being able to do Data Skipping not only on the "raw" columns, but
> also upon simple derivative expressions on top of them (like standard
> function calls){*}{{*}}
>
> *Important to note here, is that only transformations that _preserve the
> ordering of the source column_ can be applied. Transformations not preserving
> the ordering will render Column Stats index practically irrelevant (since no
> assumption could be made that values in the column derived by such
> transformations are ordered)*
--
This message was sent by Atlassian Jira
(v8.20.1#820001)