[
https://issues.apache.org/jira/browse/SPARK-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-10841:
------------------------------------
Assignee: Apache Spark
> Add pushdown support of UDF for parquet
> ---------------------------------------
>
> Key: SPARK-10841
> URL: https://issues.apache.org/jira/browse/SPARK-10841
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Reporter: Liang-Chi Hsieh
> Assignee: Apache Spark
>
> JIRA:
> Currently we can't push down filters involving UDFs to Parquet. In practice,
> we have some usage of UDFs in filters, e.g.,
> SELECT * FROM table WHERE udf(customer_id) = "ABC"
> In above query, `customer_id` is a column storing customer id in some way.
> `udf` is a function used to parse this column to string value. Without
> pushing down the filter to Parquet, we will fetch all data from many Parquet
> files and then perform filtering in Spark.
> Using Parquet's `UserDefinedPredicate`, we can push down these filters to
> Parquet. This patch adds the support for this. This patch currently only
> implements `EqualTo` predicate. Other predicates such as `LessThan`,
> `GreaterThan` will be implemented in following PRs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]