[GitHub] [hive] lcspinter commented on a change in pull request #2137: HIVE-24962: Implement partition pruning for Iceberg tables

GitBox Wed, 21 Apr 2021 22:37:34 -0700


lcspinter commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r618093086




##########
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -194,6 +210,54 @@ public boolean canProvideBasicStatistics() {
     return stats;
   }
 
+  public boolean 
addDynamicSplitPruningEdge(org.apache.hadoop.hive.ql.metadata.Table table,
+      ExprNodeDesc syntheticFilterPredicate) {
+    try {
+      Collection<String> partitionColumns = ((HiveIcebergSerDe) 
table.getDeserializer()).partitionColumns();
+      if (partitionColumns.size() > 0) {
+        // Collect the column names from the predicate
+        Set<String> filterColumns = Sets.newHashSet();
+        columns(syntheticFilterPredicate, filterColumns);
+
+        // While Iceberg could handle multiple columns the current pruning 
only able to handle filters for a
+        // single column. We keep the logic below to handle multiple columns 
so if pruning is available on executor
+        // side the we can easily adapt to it as well.
+        if (filterColumns.size() > 1) {

Review comment:
       We collect every column name in the filterColumns set through the 
columns() method. That method is traversing every node recursively, so it might 
be time-consuming.  After that, the size of the set is validated, and if it's 
greater than 1, return false. 
   Can we introduce some logic, to fail fast, without the need of traversing 
every node? I'm just thinking aloud, I don't know whether it is feasible or not.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] lcspinter commented on a change in pull request #2137: HIVE-24962: Implement partition pruning for Iceberg tables

Reply via email to