Jianshi Huang created SPARK-6432:
------------------------------------

             Summary: Cannot load parquet data with partitions if not all 
partition columns match data columns
                 Key: SPARK-6432
                 URL: https://issues.apache.org/jira/browse/SPARK-6432
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.3.0, 1.3.1
            Reporter: Jianshi Huang


Suppose we have a dataset in the following folder structure:

{noformat}
parquet/source=live/date=2015-03-18/
parquet/source=live/date=2015-03-19/
...
{noformat}

And the data schema has the following columns:
- id
- *event_date*
- source
- value

Where partition key source matches data column source, but partition key date 
doesn't match any columns in data.

Then we cannot load dataset in Spark using parquetFile. It reports:

org.apache.spark.sql.AnalysisException: Ambiguous references to source: 
(source#2,List()),(source#5,List());
...

Currently if partition columns has overlaps with data columns, partition 
columns have to be a subset of the data columns.

Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to