[ https://issues.apache.org/jira/browse/SPARK-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mitesh updated SPARK-17636: --------------------------- Description: Theres a `PushedFilters` for a simple numeric field, but not for a numeric field inside a struct. Not sure if this is a Spark limitation because of Parquet, or only a Spark limitation. {quote} scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", "sale_id") res5: org.apache.spark.sql.DataFrame = [day_timestamp: struct<timestamp:bigint,timezone:string>, sale_id: bigint] scala> res5.filter("sale_id > 4").queryExecution.executedPlan res9: org.apache.spark.sql.execution.SparkPlan = Filter[23814] [args=(sale_id#86324L > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)] scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan res10: org.apache.spark.sql.execution.SparkPlan = Filter[23815] [args=(day_timestamp#86302.timestamp > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file {quote} was: The filter gets pushed down for a simple numeric field, but not for a numeric field inside a struct. Not sure if this is a Spark limitation because of Parquet, or only a Spark limitation. {quote} scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", "sale_id") res5: org.apache.spark.sql.DataFrame = [day_timestamp: struct<timestamp:bigint,timezone:string>, sale_id: bigint] scala> res5.filter("sale_id > 4").queryExecution.executedPlan res9: org.apache.spark.sql.execution.SparkPlan = Filter[23814] [args=(sale_id#86324L > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)] scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan res10: org.apache.spark.sql.execution.SparkPlan = Filter[23815] [args=(day_timestamp#86302.timestamp > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file {quote} > Parquet filter push down doesn't handle struct fields > ----------------------------------------------------- > > Key: SPARK-17636 > URL: https://issues.apache.org/jira/browse/SPARK-17636 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 1.6.2 > Reporter: Mitesh > > Theres a `PushedFilters` for a simple numeric field, but not for a numeric > field inside a struct. Not sure if this is a Spark limitation because of > Parquet, or only a Spark limitation. > {quote} > scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", > "sale_id") > res5: org.apache.spark.sql.DataFrame = [day_timestamp: > struct<timestamp:bigint,timezone:string>, sale_id: bigint] > scala> res5.filter("sale_id > 4").queryExecution.executedPlan > res9: org.apache.spark.sql.execution.SparkPlan = > Filter[23814] [args=(sale_id#86324L > > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] > +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: > s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)] > scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan > res10: org.apache.spark.sql.execution.SparkPlan = > Filter[23815] [args=(day_timestamp#86302.timestamp > > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] > +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: > s3a://some/parquet/file > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org