This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new aa3a742 [SPARK-31159][SQL][FOLLOWUP] Move checking of the
`rebaseDateTime` flag out of the loop in `VectorizedColumnReader`
aa3a742 is described below
commit aa3a7429f44970eb95a58e1f6cfbf7d89d6753a0
Author: Maxim Gekk <[email protected]>
AuthorDate: Mon Mar 23 23:02:48 2020 +0900
[SPARK-31159][SQL][FOLLOWUP] Move checking of the `rebaseDateTime` flag out
of the loop in `VectorizedColumnReader`
### What changes were proposed in this pull request?
In the PR, I propose to refactor reading of timestamps of the
`TIMESTAMP_MILLIS` logical type from Parquet files in `VectorizedColumnReader`,
and move checking of the `rebaseDateTime` flag out of the internal loop.
### Why are the changes needed?
To avoid any additional overhead of the checking the SQL config
`spark.sql.legacy.parquet.rebaseDateTime.enabled` introduced by the PR
https://github.com/apache/spark/pull/27915.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
By running the test suite `ParquetIOSuite`.
Closes #27973 from MaxGekk/rebase-parquet-datetime-followup.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
---
.../parquet/VectorizedColumnReader.java | 23 ++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
index 13225b2..2214c16 100644
---
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
+++
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
@@ -462,15 +462,22 @@ public class VectorizedColumnReader {
num, column, rowId, maxDefLevel, (VectorizedValuesReader)
dataColumn);
}
} else if (originalType == OriginalType.TIMESTAMP_MILLIS) {
- for (int i = 0; i < num; i++) {
- if (defColumn.readInteger() == maxDefLevel) {
- long micros = DateTimeUtils.millisToMicros(dataColumn.readLong());
- if (rebaseDateTime) {
- micros = DateTimeUtils.rebaseJulianToGregorianMicros(micros);
+ if (rebaseDateTime) {
+ for (int i = 0; i < num; i++) {
+ if (defColumn.readInteger() == maxDefLevel) {
+ long micros = DateTimeUtils.millisToMicros(dataColumn.readLong());
+ column.putLong(rowId + i,
DateTimeUtils.rebaseJulianToGregorianMicros(micros));
+ } else {
+ column.putNull(rowId + i);
+ }
+ }
+ } else {
+ for (int i = 0; i < num; i++) {
+ if (defColumn.readInteger() == maxDefLevel) {
+ column.putLong(rowId + i,
DateTimeUtils.millisToMicros(dataColumn.readLong()));
+ } else {
+ column.putNull(rowId + i);
}
- column.putLong(rowId + i, micros);
- } else {
- column.putNull(rowId + i);
}
}
} else {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]