aokolnychyi commented on code in PR #5301:
URL: https://github.com/apache/iceberg/pull/5301#discussion_r923837639
##########
core/src/main/java/org/apache/iceberg/ManifestGroup.java:
##########
@@ -279,4 +287,84 @@ public void close() throws IOException {
}
});
}
+
+ abstract static class ScanTaskFactory<T extends ScanTask> {
+ private final String schemaAsString;
+ private final String specAsString;
+ private final DeleteFileIndex deletes;
+ private final ResidualEvaluator residuals;
+ private final boolean dropStats;
+
+ ScanTaskFactory(PartitionSpec spec, DeleteFileIndex deletes,
ResidualEvaluator residuals, boolean dropStats) {
+ this.schemaAsString = SchemaParser.toJson(spec.schema());
+ this.specAsString = PartitionSpecParser.toJson(spec);
+ this.deletes = deletes;
+ this.residuals = residuals;
+ this.dropStats = dropStats;
+ }
+
+ abstract CloseableIterable<T>
createTasks(CloseableIterable<ManifestEntry<DataFile>> entries);
+
+ String schemaAsString() {
+ return schemaAsString;
+ }
+
+ String specAsString() {
+ return specAsString;
+ }
+
+ DeleteFileIndex deletes() {
+ return deletes;
+ }
+
+ ResidualEvaluator residuals() {
+ return residuals;
+ }
+
+ boolean shouldKeepStats() {
+ return !dropStats;
+ }
+
+ abstract static class Builder<T extends ScanTask> {
Review Comment:
I am not super happy with having a builder as it adds more complexity.
However, I did it this way so that we can have a loading cache of task
factories per spec in `ManifestGroup`. Right now, we parse schema and spec JSON
representations for each manifest, which is not required. As those JSON objects
can get pretty large, I feel doing the parsing once per spec is an important
optimization.
If we want to get rid of the builder, then I'll have to implement per spec
caching in each task factory.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]