rdblue commented on a change in pull request #735: Refactor FilteredManifest and ManifestGroup URL: https://github.com/apache/incubator-iceberg/pull/735#discussion_r370291337
########## File path: core/src/main/java/org/apache/iceberg/FilteredManifest.java ########## @@ -95,60 +94,38 @@ public FilteredManifest caseSensitive(boolean isCaseSensitive) { Evaluator evaluator = evaluator(); InclusiveMetricsEvaluator metricsEvaluator = metricsEvaluator(); - return CloseableIterable.filter(reader.entries(projection(fileSchema, columns, caseSensitive)), + // ensure stats columns are present for metrics evaluation + boolean requireStatsProjection = requireStatsProjection(); + Collection<String> projectColumns = requireStatsProjection ? withStatsColumns(columns) : columns; + + CloseableIterable<ManifestEntry> entries = CloseableIterable.filter( + reader.entries(projection(fileSchema, projectColumns, caseSensitive)), entry -> entry != null && evaluator.eval(entry.file().partition()) && metricsEvaluator.eval(entry.file())); + // note that columns itself could have stats projected, that's ok. + // We only drop stats if we have forced stats projection. + return CloseableIterable.transform(entries, e -> requireStatsProjection ? e.copyWithoutStats() : e); Review comment: For performance, I think it is worse to copy here than to return the extra columns. We don't copy because we assume that the caller will use the row immediately -- e.g. to aggregate stats -- or will make a defensive copy. The caller is responsible to decide. That really cuts down on object allocation and helps performance. Copying just to get rid of stats columns forces a copy, when the caller may not need one. Let's just return what was read here. In `iterator`, where we are copying anyway we can do the copy with or without stats. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org