FrankChen021 commented on code in PR #19449:
URL: https://github.com/apache/druid/pull/19449#discussion_r3226691456


##########
extensions-contrib/druid-iceberg-extensions/src/main/java/org/apache/druid/iceberg/input/IcebergCatalog.java:
##########
@@ -88,15 +92,27 @@ public List<String> extractSnapshotDataFiles(
     try {
       
Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
       TableIdentifier icebergTableIdentifier = 
catalog.listTables(namespace).stream()
-                                                      .filter(tableId -> 
tableId.toString().equals(tableIdentifier))
-                                                      .findFirst()
-                                                      .orElseThrow(() -> new 
IAE(
-                                                          " Couldn't retrieve 
table identifier for '%s'. Please verify that the table exists in the given 
catalog",
-                                                          tableIdentifier
-                                                      ));
+              .filter(tableId -> tableId.toString().equals(tableIdentifier))
+              .findFirst()
+              .orElseThrow(() -> new IAE(
+                      " Couldn't retrieve table identifier for '%s'. Please 
verify that the table exists in the given catalog",
+                      tableIdentifier
+              ));
 
       long start = System.currentTimeMillis();
-      TableScan tableScan = 
catalog.loadTable(icebergTableIdentifier).newScan();
+      Table table = catalog.loadTable(icebergTableIdentifier);
+      TableScan tableScan = table.newScan();
+
+      if (columnsFilter != null) {
+        List<String> projectedColumns = table
+            .schema()
+            .columns()
+            .stream()
+            .map(Types.NestedField::name)
+            .filter(columnsFilter::apply)
+            .collect(Collectors.toList());
+        tableScan = tableScan.select(new ArrayList<>(projectedColumns));

Review Comment:
   [P2] Projection is discarded before data is read
   
   This selects projected columns on the Iceberg TableScan, but the method only 
returns task.file().location() afterward. The projected FileScanTask schema is 
discarded, and IcebergInputSource builds the warehouse delegate from the same 
raw file paths, so Druid's Parquet reader still opens the original files 
without the Iceberg projection. The new test also manually projects with 
Parquet.read(...).project(...), so it would pass even if this select had no 
effect. To make column projection work, the projected schema/split information 
needs to be carried into the reader path or pruning needs to happen in the 
delegate input format.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to