wypoon commented on a change in pull request #1508: URL: https://github.com/apache/iceberg/pull/1508#discussion_r501463178
########## File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java ########## @@ -165,13 +165,23 @@ TableProperties.PARQUET_BATCH_SIZE, TableProperties.PARQUET_BATCH_SIZE_DEFAULT)); } + private Schema getTableSchema() { + if (snapshotId != null) { + return table.schemaForSnapshot(snapshotId); + } else if (asOfTimestamp != null) { + return table.schemaForSnapshotAsOfTime(asOfTimestamp); + } else { + return table.schema(); + } + } + private Schema lazySchema() { if (schema == null) { if (requestedSchema != null) { // the projection should include all columns that will be returned, including those only used in filters - this.schema = SparkSchemaUtil.prune(table.schema(), requestedSchema, filterExpression(), caseSensitive); + this.schema = SparkSchemaUtil.prune(getTableSchema(), requestedSchema, filterExpression(), caseSensitive); } else { - this.schema = table.schema(); + this.schema = getTableSchema(); Review comment: This is the only point I don't understand. As far as I can tell, `BaseTableScan` is constructed with the user's requested schema. If the user is wanting to read from an old snapshot, then they should be supplying a requested schema compatible with the snapshot's schema. I couldn't see anything I needed to change in `BaseTableScan`. If I'm missing something here, can you please point out to me the parts I need to change? I added a new test case where a requested schema is used, and that revealed a bug in the spark2 `IcebergSource#createReader`. In that method, there is a call to `SparkSchemaUtil.convert(Schema, StructType)` whose only purpose seems to be to check that the requested `StructType` does not contain fields in the `Schema` (for then the call would cause an exception), and here I needed to use the snapshot `Schema` rather than the table `Schema`. Other than that, the projection returned by the scan appears to be correct. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org