Jianshi Huang created SPARK-6432:
------------------------------------
Summary: Cannot load parquet data with partitions if not all
partition columns match data columns
Key: SPARK-6432
URL: https://issues.apache.org/jira/browse/SPARK-6432
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.3.0, 1.3.1
Reporter: Jianshi Huang
Suppose we have a dataset in the following folder structure:
{noformat}
parquet/source=live/date=2015-03-18/
parquet/source=live/date=2015-03-19/
...
{noformat}
And the data schema has the following columns:
- id
- *event_date*
- source
- value
Where partition key source matches data column source, but partition key date
doesn't match any columns in data.
Then we cannot load dataset in Spark using parquetFile. It reports:
org.apache.spark.sql.AnalysisException: Ambiguous references to source:
(source#2,List()),(source#5,List());
...
Currently if partition columns has overlaps with data columns, partition
columns have to be a subset of the data columns.
Jianshi
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]