[ 
https://issues.apache.org/jira/browse/SPARK-49163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-49163:
-----------------------------------

    Assignee: Nikola Mandic

> Attempt to create table based on broken parquet partition data results in 
> internal error
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-49163
>                 URL: https://issues.apache.org/jira/browse/SPARK-49163
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Nikola Mandic
>            Assignee: Nikola Mandic
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> Create an example parquet table with partitions and insert data in Spark:
> {code:java}
> create table t(col1 string, col2 string, col3 string) using parquet location 
> 'some/path/parquet-test' partitioned by (col1, col2);
> insert into t (col1, col2, col3) values ('a', 'b', 'c');{code}
> Go into the parquet-test path in the filesystem and try to copy parquet data 
> file from path col1=a/col2=b directory into col1=a. After that, try to create 
> new table based on parquet data in Spark:
> {code:java}
> create table broken_table using parquet location 'some/path/parquet-test'; 
> {code}
> This query errors with internal error. Stack trace excerpts:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] Eagerly executed command 
> failed. You hit a bug in Spark or the Spark plugins you use. Please, report 
> this bug to the corresponding communities or vendors, and provide the full 
> stack trace. SQLSTATE: XX000
> ...
> Caused by: java.lang.AssertionError: assertion failed: Conflicting partition 
> column names detected:        Partition column name list #0: col1
>         Partition column name list #1: col1, col2For partitioned table 
> directories, data files should only live in leaf directories.
> And directories at the same level should have the same partition column name.
> Please check the following directories for unexpected files or inconsistent 
> partition column names:        file:some/path/parquet-test/col1=a
>         file:some/path/parquet-test/col1=a/col2=b
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.resolvePartitions(PartitioningUtils.scala:391)
> ...{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to