This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e0d4ef4b0bd [SPARK-39706][SQL] Set missing column with defaultValue as
constant in `ParquetColumnVector`
e0d4ef4b0bd is described below
commit e0d4ef4b0bd2c8641b830106b0cb6063351ad5da
Author: yangjie01 <[email protected]>
AuthorDate: Tue Jul 12 09:20:24 2022 -0700
[SPARK-39706][SQL] Set missing column with defaultValue as constant in
`ParquetColumnVector`
### What changes were proposed in this pull request?
The change of this pr is add `vector.setIsConstant()` when missing column
with defaultValue and `vector.appendObjects(capacity,
defaultValue).isPresent()` is true during `ParquetColumnVector` initialization.
### Why are the changes needed?
This is just a minor improvement, for the missing column with default
value, setting isConstant to true can will prevent the `reset()` method from
restoring the internal state of `WritableColumnVector`.
`OrcColumnarBatchReader` has done similar things to missing column.
https://github.com/apache/spark/blob/bb4c4778713c7ba1ee92d0bb0763d7d3ce54374f/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java#L178-L191
Without this change, there will be no bug, because missing column will only
be initialized once and the corresponding columnReader is null, the reset()
method will only reset `.WritableColumnVector#elementsAppended` to 0, but this
will not affect anything.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GitHub Actions
Closes #37115 from LuciferYang/setIsConstant.
Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../spark/sql/execution/datasources/parquet/ParquetColumnVector.java | 2 ++
1 file changed, 2 insertions(+)
diff --git
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
index 2ad8cdfcca6..47774e0a397 100644
---
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
+++
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
@@ -89,6 +89,8 @@ final class ParquetColumnVector {
throw new IllegalArgumentException("Cannot assign default column value
to result " +
"column batch in vectorized Parquet reader because the data type is
not supported: " +
defaultValue);
+ } else {
+ vector.setIsConstant();
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]