[GitHub] [iceberg] rdblue commented on a change in pull request #3722: Spark: Use snapshot schema when reading snapshot

GitBox Mon, 13 Dec 2021 12:35:42 -0800


rdblue commented on a change in pull request #3722:
URL: https://github.com/apache/iceberg/pull/3722#discussion_r768101609




##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java
##########
@@ -65,16 +68,22 @@
     this.spark = spark;
     this.table = table;
     this.readConf = new SparkReadConf(spark, table, options);
+    this.snapshotId = readConf.snapshotId();
+    this.asOfTimestamp = readConf.asOfTimestamp();
     this.caseSensitive = readConf.caseSensitive();
   }
 
+  private Schema snapshotSchema() {

Review comment:
       I think the schema should be passed into this builder, not resolved 
here. The problem with this is that Spark has already analyzed the query using 
the schema returned by `SparkTable`. Whatever `SparkTable` reported as the 
schema must be what this class uses as the basis for projection, or else 
Iceberg could break resolution -- and that's worse than using a different 
projection schema.
   
   For cases where `snapshot-id` or `as-of-timestamp` are passed through read 
options, I think we have to use the current table schema.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3722: Spark: Use snapshot schema when reading snapshot

Reply via email to