[
https://issues.apache.org/jira/browse/SPARK-49163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan reassigned SPARK-49163:
-----------------------------------
Assignee: Nikola Mandic
> Attempt to create table based on broken parquet partition data results in
> internal error
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-49163
> URL: https://issues.apache.org/jira/browse/SPARK-49163
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Nikola Mandic
> Assignee: Nikola Mandic
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Create an example parquet table with partitions and insert data in Spark:
> {code:java}
> create table t(col1 string, col2 string, col3 string) using parquet location
> 'some/path/parquet-test' partitioned by (col1, col2);
> insert into t (col1, col2, col3) values ('a', 'b', 'c');{code}
> Go into the parquet-test path in the filesystem and try to copy parquet data
> file from path col1=a/col2=b directory into col1=a. After that, try to create
> new table based on parquet data in Spark:
> {code:java}
> create table broken_table using parquet location 'some/path/parquet-test';
> {code}
> This query errors with internal error. Stack trace excerpts:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] Eagerly executed command
> failed. You hit a bug in Spark or the Spark plugins you use. Please, report
> this bug to the corresponding communities or vendors, and provide the full
> stack trace. SQLSTATE: XX000
> ...
> Caused by: java.lang.AssertionError: assertion failed: Conflicting partition
> column names detected: Partition column name list #0: col1
> Partition column name list #1: col1, col2For partitioned table
> directories, data files should only live in leaf directories.
> And directories at the same level should have the same partition column name.
> Please check the following directories for unexpected files or inconsistent
> partition column names: file:some/path/parquet-test/col1=a
> file:some/path/parquet-test/col1=a/col2=b
> at scala.Predef$.assert(Predef.scala:279)
> at
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.resolvePartitions(PartitioningUtils.scala:391)
> ...{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]