[jira] [Commented] (SPARK-34285) Implement Parquet StringEndsWith、StringContains Filter

Attila Zsolt Piros (Jira) Fri, 29 Jan 2021 05:59:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-34285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274458#comment-17274458
 ]


Attila Zsolt Piros commented on SPARK-34285:
--------------------------------------------

[~Xudingyu] predicate pushdown is extremely useful when a column group can be 
dropped altogether. 

To support this for each group statistics are stored in the Parquet. It 
contains the min and max value.
In case of "StringStartsWith" you can see dropping the column groups is an easy 
decision (let's say the min is "BBB" and the max is "EEE" in the current column 
group):
- when the pattern is after the max (i.e "F.*") or
- when the pattern is before the min (i.e "A.*")
you can safely drop the whole column.

Regarding the "StringEndsWith" and "StringContains" you cannot make any 
decision based on the min and max value. 


> Implement Parquet StringEndsWith、StringContains Filter
> ------------------------------------------------------
>
>                 Key: SPARK-34285
>                 URL: https://issues.apache.org/jira/browse/SPARK-34285
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Xudingyu
>            Priority: Major
>
> When create parquetFilters, currently only implements  
> {code:java}
> case sources.StringStartsWith(name, prefix)
> {code}
> But there exists StringEndsWith、StringContains in 
> /spark/sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala
> We can implements this two filters, and  rename 
> {code:java}
> PARQUET_FILTER_PUSHDOWN_STRING_STARTSWITH_ENABLED 
> {code}
>  to
> {code:java}
> PARQUET_FILTER_PUSHDOWN_STRING_ENABLED 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-34285) Implement Parquet StringEndsWith、StringContains Filter

Reply via email to