[
https://issues.apache.org/jira/browse/SPARK-48649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Sadikov updated SPARK-48649:
---------------------------------
Description:
When having a table directory with invalid partitions such as:
{code:java}
table/
invalid/...
part=1/...
part=2/...
part=3/...{code}
a SQL query reading all of the partitions would fail with
{code:java}
java.lang.AssertionError: assertion failed: Conflicting directory structures
detected. Suspicious paths:
table
table/invalid {code}
I propose to add a data source option and Spark SQL config to ignore invalid
partition paths. The config will be disabled by default to retain the current
behaviour.
{code:java}
spark.conf.set("spark.sql.files.ignoreInvalidPartitionPaths", "true"){code}
{code:java}
spark.read.format("parquet").option("ignoreInvalidPartitionPaths",
"true").load(...) {code}
> Add "ignoreInvalidPartitionPaths" and
> "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring
> invalid partition paths
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-48649
> URL: https://issues.apache.org/jira/browse/SPARK-48649
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Ivan Sadikov
> Priority: Major
>
> When having a table directory with invalid partitions such as:
> {code:java}
> table/
> invalid/...
> part=1/...
> part=2/...
> part=3/...{code}
> a SQL query reading all of the partitions would fail with
> {code:java}
> java.lang.AssertionError: assertion failed: Conflicting directory structures
> detected. Suspicious paths:
> table
> table/invalid {code}
>
> I propose to add a data source option and Spark SQL config to ignore invalid
> partition paths. The config will be disabled by default to retain the current
> behaviour.
> {code:java}
> spark.conf.set("spark.sql.files.ignoreInvalidPartitionPaths", "true"){code}
> {code:java}
> spark.read.format("parquet").option("ignoreInvalidPartitionPaths",
> "true").load(...) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]