[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

GitBox Mon, 24 May 2021 02:36:56 -0700


kbendick commented on a change in pull request #2617:
URL: https://github.com/apache/iceberg/pull/2617#discussion_r637817843




##########
File path: core/src/main/java/org/apache/iceberg/StaticTableScan.java
##########
@@ -25,16 +25,33 @@
 
 class StaticTableScan extends BaseTableScan {
   private final Function<StaticTableScan, DataTask> buildTask;
+  // Metadata table name that the buildTask that this StaticTableScan will 
return data for.
+  private final String scannedTableName;
 
-  StaticTableScan(TableOperations ops, Table table, Schema schema, 
Function<StaticTableScan, DataTask> buildTask) {
+  StaticTableScan(TableOperations ops, Table table, Schema schema,  String 
scannedTableName,
+                  Function<StaticTableScan, DataTask> buildTask) {
     super(ops, table, schema);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
   }
 
-  private StaticTableScan(TableOperations ops, Table table, Schema schema,
+  private StaticTableScan(TableOperations ops, Table table, Schema schema,  
String scannedTableName,
                           Function<StaticTableScan, DataTask> buildTask, 
TableScanContext context) {
     super(ops, table, schema, context);
     this.buildTask = buildTask;
+    this.scannedTableName = scannedTableName;
+  }
+
+  @Override
+  public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table 
%s", scannedTableName));
+  }
+
+  @Override
+  public TableScan appendsAfter(long fromSnapshotId) {
+    throw new UnsupportedOperationException(
+        String.format("Incremental scan is not supported for metadata table 
%s", scannedTableName));

Review comment:
       **TLDR**
   
   It seems to me that the inheritance is mostly because of the time at which 
some of this code was written, but I believe it could (and should) be unified 
such that for each `MetadataTableType`:
   
   The current layout of refined error messages for time based / snapshot 
specific / incremental metadata table scans:
   ```
   MetadataTableType  Current Scan Class Super Class         Refined Time Based 
Scan Specific Overrides
     ENTRIES                       BaseTableScan                                
       None
     FILES                             BaseTableScan                            
           None
     HISTORY                       StaticTableScan                              
       None
     SNAPSHOTS                StaticTableScan                                   
   None
     MANIFESTS                  StaticTableScan                                 
    None
     PARTITIONS                 StaticTableScan                                 
    None
     ALL_DATA_FILES        BaseAllMetadataTableScan                   
[asOfTime, 
useSnapshot](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllDataFilesTable.java#L92-L100)
     ALL_MANIFESTS         BaseAllMetadataTableScan                  [asOfTime, 
useSnapshot](https://github.com/apache/iceberg/blob/32ce969218b75915cb7396c541bdc19f4ac8bd9f/core/src/main/java/org/apache/iceberg/AllManifestsTable.java#L99-L107)
     ALL_ENTRIES               BaseAllMetadataTableScan                  None
     ```
   
   All of the above that have overrides are implemented in the final 
implementing class, as shown in the links.
   
   I think it would be much better to pass the actually scanned table name (and 
not just the table object, which references the Iceberg table and not its 
metadata table) to the two base scan classes, and then have consistent error 
messages for all which state the table name attempting to be scanned for users 
(as users encounter this problem often enough when switching back and forth 
between editing a snapshot specific query and inspecting a metadata table).
   
   So I propose:
   ```
   MetadataTableType            New Scan Class
     ENTRIES               **EntriesTableScan extends `StaticTableScan**`       
                           None
     FILES                 **FilesTableScan extends StaticTableScan**           
                      None
     HISTORY                 StaticTableScan                                    
 None
     SNAPSHOTS               StaticTableScan                                    
  None
     MANIFESTS               StaticTableScan                                    
 None
     PARTITIONS              StaticTableScan                                    
 None
     ALL_DATA_FILES        BaseAllMetadataTableScan                   _None -> 
Moved to BaseAllMetadataTableScan_
     ALL_MANIFESTS         BaseAllMetadataTableScan                  _None -> 
Moved to BaseAllMetadataTableScan_
     ALL_ENTRIES               BaseAllMetadataTableScan                  None
     ```
   
   
   
   If we decide to implement the functionality, then the final class can 
override it. This would cut down on code duplication and, imho, make the code 
much more readable.
   
   Passing the additional metadata (actual table name, as well as the metadata 
table type ideally) would not affect the serialization done on static table 
scans, as the information is only used when building the query (e.g. prior to 
distributing the information across the cluster for spark and likey all 
engines).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a change in pull request #2617: Refine error message on most incremental scans on metadata tables

Reply via email to