[spark] branch master updated: [SPARK-31159][SQL][FOLLOWUP] Move checking of the `rebaseDateTime` flag out of the loop in `VectorizedColumnReader`

gurwls223 Mon, 23 Mar 2020 07:04:09 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new aa3a742  [SPARK-31159][SQL][FOLLOWUP] Move checking of the 
`rebaseDateTime` flag out of the loop in `VectorizedColumnReader`
aa3a742 is described below

commit aa3a7429f44970eb95a58e1f6cfbf7d89d6753a0
Author: Maxim Gekk <[email protected]>
AuthorDate: Mon Mar 23 23:02:48 2020 +0900

    [SPARK-31159][SQL][FOLLOWUP] Move checking of the `rebaseDateTime` flag out 
of the loop in `VectorizedColumnReader`
    
    ### What changes were proposed in this pull request?
    In the PR, I propose to refactor reading of timestamps of the 
`TIMESTAMP_MILLIS` logical type from Parquet files in `VectorizedColumnReader`, 
and move checking of the `rebaseDateTime` flag out of the internal loop.
    
    ### Why are the changes needed?
    To avoid any additional overhead of the checking the SQL config 
`spark.sql.legacy.parquet.rebaseDateTime.enabled` introduced by the PR 
https://github.com/apache/spark/pull/27915.
    
    ### Does this PR introduce any user-facing change?
    No
    
    ### How was this patch tested?
    By running the test suite `ParquetIOSuite`.
    
    Closes #27973 from MaxGekk/rebase-parquet-datetime-followup.
    
    Authored-by: Maxim Gekk <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
---
 .../parquet/VectorizedColumnReader.java            | 23 ++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
index 13225b2..2214c16 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
@@ -462,15 +462,22 @@ public class VectorizedColumnReader {
           num, column, rowId, maxDefLevel, (VectorizedValuesReader) 
dataColumn);
       }
     } else if (originalType == OriginalType.TIMESTAMP_MILLIS) {
-      for (int i = 0; i < num; i++) {
-        if (defColumn.readInteger() == maxDefLevel) {
-          long micros = DateTimeUtils.millisToMicros(dataColumn.readLong());
-          if (rebaseDateTime) {
-            micros = DateTimeUtils.rebaseJulianToGregorianMicros(micros);
+      if (rebaseDateTime) {
+        for (int i = 0; i < num; i++) {
+          if (defColumn.readInteger() == maxDefLevel) {
+            long micros = DateTimeUtils.millisToMicros(dataColumn.readLong());
+            column.putLong(rowId + i, 
DateTimeUtils.rebaseJulianToGregorianMicros(micros));
+          } else {
+            column.putNull(rowId + i);
+          }
+        }
+      } else {
+        for (int i = 0; i < num; i++) {
+          if (defColumn.readInteger() == maxDefLevel) {
+            column.putLong(rowId + i, 
DateTimeUtils.millisToMicros(dataColumn.readLong()));
+          } else {
+            column.putNull(rowId + i);
           }
-          column.putLong(rowId + i, micros);
-        } else {
-          column.putNull(rowId + i);
         }
       }
     } else {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-31159][SQL][FOLLOWUP] Move checking of the `rebaseDateTime` flag out of the loop in `VectorizedColumnReader`

Reply via email to