[ 
https://issues.apache.org/jira/browse/SPARK-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909153#comment-14909153
 ] 

Apache Spark commented on SPARK-10841:
--------------------------------------

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/8922

> Add pushdown support of UDF for parquet
> ---------------------------------------
>
>                 Key: SPARK-10841
>                 URL: https://issues.apache.org/jira/browse/SPARK-10841
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Liang-Chi Hsieh
>
> JIRA: 
> Currently we can't push down filters involving UDFs to Parquet. In practice, 
> we have some usage of UDFs in filters, e.g.,
>      SELECT * FROM table WHERE udf(customer_id) = "ABC"
> In above query, `customer_id` is a column storing customer id in some way. 
> `udf` is a function used to parse this column to string value. Without 
> pushing down the filter to Parquet, we will fetch all data from many Parquet 
> files and then perform filtering in Spark.
> Using Parquet's `UserDefinedPredicate`, we can push down these filters to 
> Parquet. This patch adds the support for this. This patch currently only 
> implements `EqualTo` predicate. Other predicates such as `LessThan`, 
> `GreaterThan` will be implemented in following PRs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to