Jianshi Huang created SPARK-6432: ------------------------------------ Summary: Cannot load parquet data with partitions if not all partition columns match data columns Key: SPARK-6432 URL: https://issues.apache.org/jira/browse/SPARK-6432 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1 Reporter: Jianshi Huang
Suppose we have a dataset in the following folder structure: {noformat} parquet/source=live/date=2015-03-18/ parquet/source=live/date=2015-03-19/ ... {noformat} And the data schema has the following columns: - id - *event_date* - source - value Where partition key source matches data column source, but partition key date doesn't match any columns in data. Then we cannot load dataset in Spark using parquetFile. It reports: org.apache.spark.sql.AnalysisException: Ambiguous references to source: (source#2,List()),(source#5,List()); ... Currently if partition columns has overlaps with data columns, partition columns have to be a subset of the data columns. Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org