parquet-mr filter pushdown

Reynold Xin Wed, 06 Jul 2016 11:13:56 -0700

Among the people working on Spark there are a lot of confusions about what
Parquet's filter pushdown actually accomplishes. Depending on who I talk
to, I get "it filters rows one by one" or "it skips blocks via min/max
value tracking". Can I get a more official response on this?


The reason I'm asking is that we have seen so many bugs related to filter
pushdown (either bugs in Parquet, or bugs in Spark's implementation of it)
that we are considering just permanently disabling filter pushdown, if the
performance gain is not enormous.

Let me know. Thanks.

parquet-mr filter pushdown

Reply via email to