Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-19 Thread Jerry Lam
Hi guys, Does this issue affect 1.2.0 only or all previous releases as well? Best Regards, Jerry On Thu, Jan 8, 2015 at 1:40 AM, Xuelin Cao xuelincao2...@gmail.com wrote: Yes, the problem is, I've turned the flag on. One possible reason for this is, the parquet file supports predicate

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-17 Thread Yana Kadiyska
Just wondering if you've made any progress on this -- I'm having the same issue. My attempts to help myself are documented here http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJ4HpHFVKvdNgKes41DvuFY=+f_nTJ2_RT41+tadhNZx=bc...@mail.gmail.com%3E . I don't believe I have the

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-07 Thread Cody Koeninger
But Xuelin already posted in the original message that the code was using SET spark.sql.parquet.filterPushdown=true On Wed, Jan 7, 2015 at 12:42 AM, Daniel Haviv danielru...@gmail.com wrote: Quoting Michael: Predicate push down into the input format is turned off by default because there is

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-07 Thread Xuelin Cao
Yes, the problem is, I've turned the flag on. One possible reason for this is, the parquet file supports predicate pushdown by setting statistical min/max value of each column on parquet blocks. If in my test, the groupID=10113000 is scattered in all parquet blocks, then the predicate pushdown

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-06 Thread Daniel Haviv
Quoting Michael: Predicate push down into the input format is turned off by default because there is a bug in the current parquet library that null pointers when there are full row groups that are null. https://issues.apache.org/jira/browse/SPARK-4258 You can turn it on if you want:

Why Parquet Predicate Pushdown doesn't work?

2015-01-06 Thread Xuelin Cao
Hi,        I'm testing parquet file format, and the predicate pushdown is a very useful feature for us.        However, it looks like the predicate push down doesn't work after I set        sqlContext.sql(SET spark.sql.parquet.filterPushdown=true)        Here is my sql: