[GitHub] [iceberg] rdblue commented on a change in pull request #2358: Core: Add predicate push down for Partitions table

GitBox Tue, 25 May 2021 13:19:03 -0700


rdblue commented on a change in pull request #2358:
URL: https://github.com/apache/iceberg/pull/2358#discussion_r639158941




##########
File path: core/src/main/java/org/apache/iceberg/PartitionsTable.java
##########
@@ -80,21 +88,51 @@ private DataTask task(TableScan scan) {
     return StaticDataTask.Row.of(partition.key, partition.recordCount, 
partition.fileCount);
   }
 
-  private static Iterable<Partition> partitions(Table table, Long snapshotId) {
-    PartitionMap partitions = new PartitionMap(table.spec().partitionType());
-    TableScan scan = table.newScan();
+  private Iterable<Partition> partitions(StaticTableScan scan) {
+    CloseableIterable<FileScanTask> tasks = planTasks(scan);
 
-    if (snapshotId != null) {
-      scan = scan.useSnapshot(snapshotId);
+    PartitionMap partitions = new PartitionMap(table().spec().partitionType());
+    for (FileScanTask task : tasks) {
+      partitions.get(task.file().partition()).update(task.file());
     }
+    return partitions.all();
+  }
 
-    for (FileScanTask task : scan.planFiles()) {
-      partitions.get(task.file().partition()).update(task.file());
+  @VisibleForTesting
+  CloseableIterable<FileScanTask> planTasks(StaticTableScan scan) {
+    boolean caseSensitive = scan.isCaseSensitive();
+    boolean ignoreResiduals = scan.shouldIgnoreResiduals();
+    long snapshotId = scan.snapshot().snapshotId();
+
+    PartitionSpec tableSpec = table().spec();
+    PartitionSpec.Builder identitySpecBuilder = 
PartitionSpec.builderFor(schema());
+    tableSpec.fields().stream().forEach(pf -> 
identitySpecBuilder.identity(PARTITION_FIELD_PREFIX + pf.name()));
+    PartitionSpec identitySpec = identitySpecBuilder.build();
+
+    ManifestEvaluator eval = ManifestEvaluator.forPartitionFilter(
+        Projections.inclusive(identitySpec, 
caseSensitive).project(scan.filter()),

Review comment:
       It would be more clear if you produced an expression from this that was 
the extracted partition filter, then passed that into the `ManifestGroup`.
   
   I'm guessing that you're doing it this way because you need to rewrite the 
column names and this accomplishes that by binding to IDs rather than rewriting 
the names. I think to produce the right expression, you'd need to construct 
your partition spec so that the source field names are `PARTITION_FIELD_PREFIX 
+ pf.name()` and the partition name is `pf.name()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #2358: Core: Add predicate push down for Partitions table

Reply via email to