[ https://issues.apache.org/jira/browse/SPARK-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-6432: ------------------------------ Priority: Critical (was: Major) > Cannot load parquet data with partitions if not all partition columns match > data columns > ---------------------------------------------------------------------------------------- > > Key: SPARK-6432 > URL: https://issues.apache.org/jira/browse/SPARK-6432 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.3.0, 1.3.1 > Reporter: Jianshi Huang > Assignee: Cheng Lian > Priority: Critical > > Suppose we have a dataset in the following folder structure: > {noformat} > parquet/source=live/date=2015-03-18/ > parquet/source=live/date=2015-03-19/ > ... > {noformat} > And the data schema has the following columns: > - id > - *event_date* > - source > - value > Where partition key source matches data column source, but partition key date > doesn't match any columns in data. > Then we cannot load dataset in Spark using parquetFile. It reports: > {code} > org.apache.spark.sql.AnalysisException: Ambiguous references to source: > (source#2,List()),(source#5,List()); > ... > {code} > Currently if partition columns has overlaps with data columns, partition > columns have to be a subset of the data columns. > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org