kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843
##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
class StaticTableScan extends BaseTableScan {
private final Function<StaticTableScan, DataTask> buildTask;
+ // Metadata table name that the buildTask that this StaticTableScan will
return data for.
+ private final String scannedTableName;
- StaticTableScan(TableOperations ops, Table table, Schema schema,
Function<StaticTableScan, DataTask> buildTask) {
+ StaticTableScan(TableOperations ops, Table table, Schema schema, String
scannedTableName,
+ Function<StaticTableScan, DataTask> buildTask) {
super(ops, table, schema);
this.buildTask = buildTask;
+ this.scannedTableName = scannedTableName;
}
- private StaticTableScan(TableOperations ops, Table table, Schema schema,
+ private StaticTableScan(TableOperations ops, Table table, Schema schema,
String scannedTableName,
Function<StaticTableScan, DataTask> buildTask,
TableScanContext context) {
super(ops, table, schema, context);
this.buildTask = buildTask;
+ this.scannedTableName = scannedTableName;
+ }
+
+ @Override
+ public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+ throw new UnsupportedOperationException(
+ String.format("Incremental scan is not supported for metadata table
%s", scannedTableName));
+ }
+
+ @Override
+ public TableScan appendsAfter(long fromSnapshotId) {
+ throw new UnsupportedOperationException(
+ String.format("Incremental scan is not supported for metadata table
%s", scannedTableName));
Review comment:
**TLDR**
It seems to me that the inheritance is mostly because of the time at which
some of this code was written, but I believe it could (and should) be unified
such that for each `MetadataTableType`:
1. The overrides occur in the base class for that metadata table type scan
(either the aggregate `BaseAllMetadataTableScan` or the current
`StaticTableScan`, which is only used for metadata table scans except ENTRIES
and FILES).
2. Metadata table scans that inherit from BaseTableScan are refactored to
use `StaticTableScan` (which is feasible).
The current layout of refined error messages for time based / snapshot
specific / incremental metadata table scans:
```
MetadataTableType Current Scan Class Base Snapshot Scan Overrides
ENTRIES BaseTableScan None
FILES BaseTableScan None
HISTORY StaticTableScan None
SNAPSHOTS StaticTableScan None
MANIFESTS StaticTableScan None
PARTITIONS StaticTableScan None
ALL_DATA_FILES BaseAllMetadataTableScan [asOfTime, useSnapshot]
ALL_MANIFESTS BaseAllMetadataTableScan [asOfTime, useSnapshot]
ALL_ENTRIES BaseAllMetadataTableScan None
```
[AllDataFilesTableScan specific
overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
[AllManifestsTableScan specific
overrides](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
All of the above that have overrides are implemented in the final
implementing class, as shown in the links.
I think it would be much better to pass the actually scanned metadata table
name (and not just the table object, which references the Iceberg table and not
its metadata table) as well as the metadata table type (from the enum) to the
two base scan classes, and then have consistent, enriched error messages for
all metadata table types which state the table name attempting to be scanned
for users (as users encounter this problem often enough when switching back and
forth between editing a snapshot specific query and inspecting a metadata
table).
So I propose changing the inheritance of these classes to match their
`MetadataTableType` counterparts
```
MetadataTableType New Scan Class
ENTRIES **EntriesTableScan extends StaticTableScan**
FILES **FilesTableScan extends StaticTableScan**
HISTORY StaticTableScan
SNAPSHOTS StaticTableScan
MANIFESTS StaticTableScan
PARTITIONS StaticTableScan
ALL_DATA_FILES BaseAllMetadataTableScan
ALL_MANIFESTS BaseAllMetadataTableScan
ALL_ENTRIES BaseAllMetadataTableScan
```
If we decide to implement the snapshot specific functionality, then the
actual concrete scan class can override it. This would cut down on code
duplication and, imho, make the code much more readable and provide much more
user friendly error messages.
I also checked, and passing the additional metadata (actual table name, as
well as the metadata table type) would not affect the serialization done on
static table scans for Spark (or really any engine), as the information is only
used when building the query (e.g. prior to distributing the information across
the cluster which requires the serialization).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]