[GitHub] [iceberg] bkahloon commented on a change in pull request #2282: [SPARK] Spark parquet read timestamp without timezone

GitBox Wed, 03 Mar 2021 21:11:59 -0800


bkahloon commented on a change in pull request #2282:
URL: https://github.com/apache/iceberg/pull/2282#discussion_r587124519




##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -170,6 +174,13 @@
     this.batchSize = 
options.get(SparkReadOptions.VECTORIZATION_BATCH_SIZE).map(Integer::parseInt).orElseGet(()
 ->
         PropertyUtil.propertyAsInt(table.properties(),
           TableProperties.PARQUET_BATCH_SIZE, 
TableProperties.PARQUET_BATCH_SIZE_DEFAULT));
+    // Allow reading timestamp without time zone as timestamp with time zone. 
Generally, this is not safe as timestamp
+    // without time zone is supposed to represent wall clock time semantics, 
i.e. no matter the reader/writer timezone
+    // 3PM should always be read as 3PM, but timestamp with time zone 
represents instant semantics, i.e the timestamp
+    // is adjusted so that the corresponding time in the reader timezone is 
displayed.
+    // When set to false (default), we throw an exception at runtime
+    // "Spark does not support timestamp without time zone fields" if reading 
timestamp without time zone fields
+    this.readTimestampWithoutZone = 
options.get("read-timestamp-without-zone").map(Boolean::parseBoolean).orElse(false);

Review comment:
       addressed it

##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -193,6 +204,8 @@ private Expression filterExpression() {
 
   private StructType lazyType() {
     if (type == null) {
+      Preconditions.checkArgument(readTimestampWithoutZone || 
!hasTimestampWithoutZone(lazySchema()),
+              "Spark does not support timestamp without time zone fields");

Review comment:
       addressed it

##########
File path: 
spark3/src/main/java/org/apache/iceberg/spark/source/SparkBatchScan.java
##########
@@ -84,6 +89,13 @@
     this.localityPreferred = Spark3Util.isLocalityEnabled(io.value(), 
table.location(), options);
     this.batchReadsEnabled = 
Spark3Util.isVectorizationEnabled(table.properties(), options);
     this.batchSize = Spark3Util.batchSize(table.properties(), options);
+    // Allow reading timestamp without time zone as timestamp with time zone. 
Generally, this is not safe as timestamp
+    // without time zone is supposed to represent wall clock time semantics, 
i.e. no matter the reader/writer timezone
+    // 3PM should always be read as 3PM, but timestamp with time zone 
represents instant semantics, i.e the timestamp
+    // is adjusted so that the corresponding time in the reader timezone is 
displayed.
+    // When set to false (default), we throw an exception at runtime
+    // "Spark does not support timestamp without time zone fields" if reading 
timestamp without time zone fields
+    this.readTimestampWithoutZone = 
options.getBoolean("read-timestamp-without-zone", false);

Review comment:
       addressed it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] bkahloon commented on a change in pull request #2282: [SPARK] Spark parquet read timestamp without timezone

Reply via email to