szehon-ho commented on code in PR #5376:
URL: https://github.com/apache/iceberg/pull/5376#discussion_r957690407


##########
core/src/main/java/org/apache/iceberg/BaseFilesTable.java:
##########
@@ -140,42 +143,92 @@ protected CloseableIterable<FileScanTask> doPlanFiles() {
   }
 
   static class ManifestReadTask extends BaseFileScanTask implements DataTask {
+
+    static final Set<Integer> READABLE_METRICS_FIELD_IDS =
+        TypeUtil.getProjectedIds(DataFile.READABLE_METRICS.type());
+    static final Schema MIN_PROJECTION_FOR_READABLE_METRICS =
+        new Schema(
+            DataFile.COLUMN_SIZES,
+            DataFile.VALUE_COUNTS,
+            DataFile.NULL_VALUE_COUNTS,
+            DataFile.NAN_VALUE_COUNTS,
+            DataFile.LOWER_BOUNDS,
+            DataFile.UPPER_BOUNDS);
+
     private final FileIO io;
     private final Map<Integer, PartitionSpec> specsById;
     private final ManifestFile manifest;
-    private final Schema schema;
+    private final Schema dataTableSchema;
+    private final Schema projection;
 
     ManifestReadTask(
         Table table,
         ManifestFile manifest,
-        Schema schema,
+        Schema projection,
         String schemaString,
         String specString,
         ResidualEvaluator residuals) {
       super(DataFiles.fromManifest(manifest), null, schemaString, specString, 
residuals);
       this.io = table.io();
       this.specsById = Maps.newHashMap(table.specs());
       this.manifest = manifest;
-      this.schema = schema;
+      this.dataTableSchema = table.schema();
+      this.projection = projection;
     }
 
     @Override
     public CloseableIterable<StructLike> rows() {
-      return CloseableIterable.transform(manifestEntries(), file -> 
(StructLike) file);
+      if (projection.findColumnName(DataFile.READABLE_METRICS.fieldId()) == 
null) {
+        return CloseableIterable.transform(files(projection), file -> 
(StructLike) file);
+      } else {
+        Schema fileProjection = TypeUtil.selectNot(projection, 
READABLE_METRICS_FIELD_IDS);
+        Schema minProjection =

Review Comment:
   I think putting it there will break the scan right, as its not the 
projection the user requested.  
   
   Note, this is actually a bit subtle here.  Because we are doing the join, 
(original projection + minimum metrics), the file becomes 
   {any_projected_field_on_file} : {readable_metrics because its also 
projected} : {un-projected but required metrics fields}
   
   So the ContentFileWithMetrics works because it will discard any of the 
"un-projected but required metrics fields", given they are outside the range it 
will read.  For the remaining fields it uses the existing logic (delegate to 
file for the first n-1, and then get from MetricsStruct for nth field).
   
   I mean, we could add a select method to GenericDataFile to modify its 
internal 'fromProjectionPos' map to conform back to the original projection 
(without the "un-projected but required metrics fields") for safety, but it's 
strictly needed.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to