[jira] [Created] (SPARK-6432) Cannot load parquet data with partitions if not all partition columns match data columns

Jianshi Huang (JIRA) Fri, 20 Mar 2015 02:08:00 -0700

Jianshi Huang created SPARK-6432:
------------------------------------

             Summary: Cannot load parquet data with partitions if not all 
partition columns match data columns
                 Key: SPARK-6432
                 URL: https://issues.apache.org/jira/browse/SPARK-6432
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.3.0, 1.3.1
            Reporter: Jianshi Huang



Suppose we have a dataset in the following folder structure:

{noformat}
parquet/source=live/date=2015-03-18/
parquet/source=live/date=2015-03-19/
...
{noformat}

And the data schema has the following columns:
- id
- *event_date*
- source
- value

Where partition key source matches data column source, but partition key date 
doesn't match any columns in data.

Then we cannot load dataset in Spark using parquetFile. It reports:

org.apache.spark.sql.AnalysisException: Ambiguous references to source: 
(source#2,List()),(source#5,List());
...

Currently if partition columns has overlaps with data columns, partition 
columns have to be a subset of the data columns.

Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-6432) Cannot load parquet data with partitions if not all partition columns match data columns

Reply via email to