Among the people working on Spark there are a lot of confusions about what Parquet's filter pushdown actually accomplishes. Depending on who I talk to, I get "it filters rows one by one" or "it skips blocks via min/max value tracking". Can I get a more official response on this?
The reason I'm asking is that we have seen so many bugs related to filter pushdown (either bugs in Parquet, or bugs in Spark's implementation of it) that we are considering just permanently disabling filter pushdown, if the performance gain is not enormous. Let me know. Thanks.
