Kent Yao created SPARK-55716:
--------------------------------
Summary: V1 file-based DataSource writes silently accept null
values into NOT NULL columns
Key: SPARK-55716
URL: https://issues.apache.org/jira/browse/SPARK-55716
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.2.0
Reporter: Kent Yao
V1 file-based DataSource writes (parquet/orc/json) silently accept null values
into NOT NULL columns. The root cause has two parts:
1. `DataSource.resolveRelation()` calls `dataSchema.asNullable` at line 439,
which strips NOT NULL constraints recursively. This was added in SPARK-13738
(2016) for read safety — files may contain nulls regardless of schema. However,
this also affects the write path.
2. `CreateDataSourceTableCommand` stores `dataSource.schema` (post-asNullable)
in the catalog at line 111, permanently losing NOT NULL information.
As a result, `PreprocessTableInsertion` never injects `AssertNotNull` for V1
file source tables because the schema it sees is all-nullable.
Note that `InsertableRelation` (e.g., `SimpleInsertSource`) does NOT have this
problem because it preserves the original schema (SPARK-24583).
**Fix:**
- Fix `CreateDataSourceTableCommand` to preserve user-specified nullability
using recursive nullability merging (the resolved `dataSource.schema` may have
CharVarchar normalization and metadata that must be kept).
- Fix `PreprocessTableInsertion` to restore nullability flags from the catalog
schema before null checks.
- Add a legacy config `spark.sql.legacy.allowNullInsertForFileSourceTables`
(default false) to gate the write-side enforcement for backward compatibility.
**Scope:**
- This fix covers catalog-based table writes (INSERT INTO, INSERT OVERWRITE).
- DataFrame `df.write.format().save()` without a catalog table is NOT affected
(no catalog schema to reference).
- Both top-level and nested type nullability (array elements, struct fields,
map values) are enforced.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]