Among the people working on Spark there are a lot of confusions about what
Parquet's filter pushdown actually accomplishes. Depending on who I talk
to, I get "it filters rows one by one" or "it skips blocks via min/max
value tracking". Can I get a more official response on this?

The reason I'm asking is that we have seen so many bugs related to filter
pushdown (either bugs in Parquet, or bugs in Spark's implementation of it)
that we are considering just permanently disabling filter pushdown, if the
performance gain is not enormous.

Let me know. Thanks.

Reply via email to