Hyukjin Kwon created SPARK-17310:
------------------------------------
Summary: Disable Parquet's record-by-record filter in normal
parquet reader and do it in Spark-side
Key: SPARK-17310
URL: https://issues.apache.org/jira/browse/SPARK-17310
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.0.0
Reporter: Hyukjin Kwon
Currently, we are pushing filters down for normal Parquet reader which also
filters record-by-record.
It seems Spark-side codegen row-by-row filtering might be faster than Parquet's
one in general due to type-boxing and virtual function calls which Spark's one
tries to avoid.
Maybe we should perform a benchmark and disable this. This ticket was from
https://github.com/apache/spark/pull/14671
Please refer the discussion in the PR.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]