This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new ac79767 [SPARK-31159][SQL][FOLLOWUP] Move checking of the
`rebaseDateTime` flag out of the loop in `VectorizedColumnReader`
ac79767 is described below
commit ac7976791819ed9a2726c72d80528982e142c818
Author: Maxim Gekk <[email protected]>
AuthorDate: Mon Mar 23 23:02:48 2020 +0900
[SPARK-31159][SQL][FOLLOWUP] Move checking of the `rebaseDateTime` flag out
of the loop in `VectorizedColumnReader`
In the PR, I propose to refactor reading of timestamps of the
`TIMESTAMP_MILLIS` logical type from Parquet files in `VectorizedColumnReader`,
and move checking of the `rebaseDateTime` flag out of the internal loop.
To avoid any additional overhead of the checking the SQL config
`spark.sql.legacy.parquet.rebaseDateTime.enabled` introduced by the PR
https://github.com/apache/spark/pull/27915.
No
By running the test suite `ParquetIOSuite`.
Closes #27973 from MaxGekk/rebase-parquet-datetime-followup.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit aa3a7429f44970eb95a58e1f6cfbf7d89d6753a0)
Signed-off-by: HyukjinKwon <[email protected]>
---
.../parquet/VectorizedColumnReader.java | 23 ++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
index f9b6139..fa1838f 100644
---
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
+++
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
@@ -462,15 +462,22 @@ public class VectorizedColumnReader {
num, column, rowId, maxDefLevel, (VectorizedValuesReader)
dataColumn);
}
} else if (originalType == OriginalType.TIMESTAMP_MILLIS) {
- for (int i = 0; i < num; i++) {
- if (defColumn.readInteger() == maxDefLevel) {
- long micros = DateTimeUtils.fromMillis(dataColumn.readLong());
- if (rebaseDateTime) {
- micros = DateTimeUtils.rebaseJulianToGregorianMicros(micros);
+ if (rebaseDateTime) {
+ for (int i = 0; i < num; i++) {
+ if (defColumn.readInteger() == maxDefLevel) {
+ long micros = DateTimeUtils.fromMillis(dataColumn.readLong());
+ column.putLong(rowId + i,
DateTimeUtils.rebaseJulianToGregorianMicros(micros));
+ } else {
+ column.putNull(rowId + i);
+ }
+ }
+ } else {
+ for (int i = 0; i < num; i++) {
+ if (defColumn.readInteger() == maxDefLevel) {
+ column.putLong(rowId + i,
DateTimeUtils.fromMillis(dataColumn.readLong()));
+ } else {
+ column.putNull(rowId + i);
}
- column.putLong(rowId + i, micros);
- } else {
- column.putNull(rowId + i);
}
}
} else {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]