[GitHub] [iceberg] rdblue commented on a change in pull request #1508: Use schema at the time of the snapshot when reading a snapshot.

GitBox Sun, 19 Sep 2021 10:18:30 -0700


rdblue commented on a change in pull request #1508:
URL: https://github.com/apache/iceberg/pull/1508#discussion_r711775490




##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -166,13 +167,17 @@
     this.readTimestampWithoutZone = 
SparkUtil.canHandleTimestampWithoutZone(options.asMap(), sessionConf);
   }
 
+  protected Schema snapshotSchema() {
+    return SnapshotUtil.schemaFor(table, snapshotId, asOfTimestamp);

Review comment:
       I don't think that this should determine the snapshot that will be 
scanned in multiple places, but right now the logic to find the snapshot is 
here and in `tasks()`.
   
   Instead, I think this class should create a `TableScan`, configure it using 
the snapshot selection criteria, and use the `schema()` from the table scan 
here. In `tasks()`, this base scan can be further refined before getting tasks 
(each `TableScan` object is immutable and independent).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1508: Use schema at the time of the snapshot when reading a snapshot.

Reply via email to