[GitHub] [iceberg] rdblue commented on a change in pull request #1508: Use schema at the time of the snapshot when reading a snapshot.

GitBox Sun, 10 Oct 2021 15:20:16 -0700


rdblue commented on a change in pull request #1508:
URL: https://github.com/apache/iceberg/pull/1508#discussion_r725707350




##########
File path: 
spark3/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java
##########
@@ -101,24 +103,35 @@ public Table getTable(StructType schema, Transform[] 
partitioning, Map<String, S
     SparkSession spark = SparkSession.active();
     setupDefaultSparkCatalog(spark);
     String path = options.get("path");
+    Long snapshotId = Spark3Util.propertyAsLong(options, 
SparkReadOptions.SNAPSHOT_ID, null);
+    Long asOfTimestamp = Spark3Util.propertyAsLong(options, 
SparkReadOptions.AS_OF_TIMESTAMP, null);

Review comment:
       I think that we want to generalize this a bit more rather than adding 
special classes. I've taken a shot at building the needed feature in #3269. In 
addition to solving the problem here of needing to pass a snapshot ID through 
an identifier, it defines a syntax for allowing users to do it, which basically 
adds support for time travel queries to SQL.
   
   I think that's a cleaner way of doing this because we only have to pass a 
snapshot ID through -- timestamp can be resolved when the table is loaded from 
the identifier. I think it's also a bit less code.
   
   Could you take a look at that issue? To move forward with this one, I think 
we should either get the identifier format in #3269 in first, or just remove 
the additions here for Spark 3 and add Spark 3 support once that goes in. What 
do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1508: Use schema at the time of the snapshot when reading a snapshot.

Reply via email to