[GitHub] [iceberg] rdblue commented on a change in pull request #1508: Use schema at the time of the snapshot when reading a snapshot.


rdblue commented on a change in pull request #1508:
URL: https://github.com/apache/iceberg/pull/1508#discussion_r716273873




##########
File path: 
spark3/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java
##########
@@ -101,24 +103,35 @@ public Table getTable(StructType schema, Transform[] 
partitioning, Map<String, S
     SparkSession spark = SparkSession.active();
     setupDefaultSparkCatalog(spark);
     String path = options.get("path");
+    Long snapshotId = Spark3Util.propertyAsLong(options, 
SparkReadOptions.SNAPSHOT_ID, null);
+    Long asOfTimestamp = Spark3Util.propertyAsLong(options, 
SparkReadOptions.AS_OF_TIMESTAMP, null);

Review comment:
       I think that modifications to the source for snapshot and timestamp are 
unnecessary. Looking at the Spark code, the table will be loaded using the 
[catalog/identifier from 
`SupportsCatalogOptions`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L114-L120)
 and then [wrapped in a `DataSourceV2Relation` that uses the reader 
options](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L131).
 Those read options still include `snapshot-id` and `as-of-timestamp`.
   
   When a scan is created, those [options get passed into the scan 
builder](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L42-L45).
   
   There should be no need to handle these in `IcebergSource` because the 
options will be set at planning time. Not considering them here avoids the need 
to add `SnapshotAwareIdentifier` and removes ambiguous cases where the scan 
options identify a snapshot and the identifier did as well.
   
   Can you update to remove this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1508: Use schema at the time of the snapshot when reading a snapshot.

Reply via email to