[GitHub] [iceberg] waterlx commented on a change in pull request #1508: Use schema at the time of the snapshot when reading a snapshot.

GitBox Tue, 12 Jan 2021 07:50:55 -0800


waterlx commented on a change in pull request #1508:
URL: https://github.com/apache/iceberg/pull/1508#discussion_r555877223




##########
File path: 
spark3/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java
##########
@@ -60,10 +61,13 @@ public SparkTable getTable(StructType schema, Transform[] 
partitioning, Map<Stri
     // Get Iceberg table from options
     Configuration conf = SparkSession.active().sessionState().newHadoopConf();
     Table icebergTable = getTableAndResolveHadoopConfiguration(options, conf);
+    CaseInsensitiveStringMap cIOptions = new CaseInsensitiveStringMap(options);
+    Long snapshotId = Spark3Util.propertyAsLong(cIOptions, "snapshot-id", 
null);
+    Long asOfTimestamp = Spark3Util.propertyAsLong(cIOptions, 
"as-of-timestamp", null);
 
     // Build Spark table based on Iceberg table, and return it
     // Eagerly refresh the table before reading to ensure views containing 
this table show up-to-date data
-    return new SparkTable(icebergTable, schema, true);
+    return new SparkTable(icebergTable, schema, snapshotId, asOfTimestamp, 
true);

Review comment:
       @wypoon As exposed by the UT you added, when data frame read 
(`SparkCatalog#loadTable()` is called by `DataFrameReader`), I think we might 
need a Identifier with options here to bring snapshot-id and as-of-timestamp 
with the identifier. I tried with 
wrapping`org.apache.spark.sql.connector.catalog.Identifier` into a new 
interface and updating related code such as `Spark3Util` to construct an 
implentation of it in `catalogAndIdentifier` and it seems work here. But I 
think that is so hacked and rude.
   
   BTW, in `testSnapshotReadAfterAddAndDropColumn`, I got `iceberg does not 
support user specified schema. Please don't specify the schema.` because a 
schema is specified when dataframe reader. I checked the Spark 3 code and found 
that error message is expected. But your UT seems not expect that. Could you 
please share your idea about that?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] waterlx commented on a change in pull request #1508: Use schema at the time of the snapshot when reading a snapshot.

Reply via email to