This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new e0d4ef4b0bd [SPARK-39706][SQL] Set missing column with defaultValue as 
constant in `ParquetColumnVector`
e0d4ef4b0bd is described below

commit e0d4ef4b0bd2c8641b830106b0cb6063351ad5da
Author: yangjie01 <[email protected]>
AuthorDate: Tue Jul 12 09:20:24 2022 -0700

    [SPARK-39706][SQL] Set missing column with defaultValue as constant in 
`ParquetColumnVector`
    
    ### What changes were proposed in this pull request?
    The change of this pr is add `vector.setIsConstant()` when missing column 
with defaultValue and `vector.appendObjects(capacity, 
defaultValue).isPresent()` is true during `ParquetColumnVector` initialization.
    
    ### Why are the changes needed?
    This is just a minor improvement, for the missing column with default 
value, setting isConstant to true can will prevent the `reset()` method from 
restoring the internal state of `WritableColumnVector`. 
`OrcColumnarBatchReader` has done similar things to missing column.
    
    
https://github.com/apache/spark/blob/bb4c4778713c7ba1ee92d0bb0763d7d3ce54374f/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java#L178-L191
    
    Without this change, there will be no bug, because missing column will only 
be initialized once and the corresponding columnReader is null,  the reset() 
method will only reset `.WritableColumnVector#elementsAppended` to 0, but this 
will not affect anything.
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Pass GitHub Actions
    
    Closes #37115 from LuciferYang/setIsConstant.
    
    Lead-authored-by: yangjie01 <[email protected]>
    Co-authored-by: YangJie <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../spark/sql/execution/datasources/parquet/ParquetColumnVector.java    | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
index 2ad8cdfcca6..47774e0a397 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
@@ -89,6 +89,8 @@ final class ParquetColumnVector {
         throw new IllegalArgumentException("Cannot assign default column value 
to result " +
           "column batch in vectorized Parquet reader because the data type is 
not supported: " +
           defaultValue);
+      } else {
+        vector.setIsConstant();
       }
     }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to