aokolnychyi commented on code in PR #5376:
URL: https://github.com/apache/iceberg/pull/5376#discussion_r949553559


##########
core/src/main/java/org/apache/iceberg/BaseFilesTable.java:
##########
@@ -185,5 +232,60 @@ public Iterable<FileScanTask> split(long splitSize) {
     ManifestFile manifest() {
       return manifest;
     }
+
+    private List<Function<ContentFile<?>, Object>> accessors(boolean 
partitioned) {

Review Comment:
   One way to make the custom struct approach to work is to pass the projected 
schema, which will tell us the number of projected columns. Based on that, we 
can either delegate to the content file struct or return metrics.
   
   ```
   public CloseableIterable<StructLike> rows() {
     if (...) {
       return CloseableIterable.transform(files(), this::wrapWithMetrics);
     } else {
       return CloseableIterable.transform(files(), file -> (StructLike) file);
     }
   }
   
   private StructLike wrapWithMetrics(ContentFile<?> file) {
     Map<String, StructLike> metrics = 
MetricsUtil.readableMetricsMap(dataTableSchema, file);
     return new ContentFileStructWithMetrics(projectedSchema, (StructLike) 
file, metrics);
   }
   ```
   
   I don't mind using the accessor-based approach too but we can't initialize 
the accessor list, pick columns, etc for every row. We also can't assign them 
to a field as the task will be serialized. We will have to build them lazily.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to