[GitHub] [hive] szlta opened a new pull request, #3203: HIVE-26137 Optimized transfer of Iceberg residual expressions from AM to execution

GitBox Tue, 12 Apr 2022 07:36:39 -0700


szlta opened a new pull request, #3203:
URL: https://github.com/apache/hive/pull/3203


   The filter expression that goes with the file scan tasks is actually not a 
"residual" one, but rather the original data filter. This is good for us, as 
now we know that for any Hive job the expression is the same object - so we can 
transfer it another way to Hive execution processes:
   
   The expression itself is generated via 
https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java#L82-L93
 before split generation within the AM. There's nothing to prevent us from 
reusing this same logic on the executors.
   At the same time we can ask ignoreResiduals() on the table scan, so that 
Iceberg only uses the filter for split generation, but won't actually attach it 
to the file scan tasks, and therefore their enwrapping splits. On the execution 
side we can just simply retrieve the original filter expression by the logic 
above and evaluate it against the current task (whose spec and partition value 
information are present anyway), ending up with the actual residual expression 
for the task. This is then passed to the underlying file formats the same way 
as before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] szlta opened a new pull request, #3203: HIVE-26137 Optimized transfer of Iceberg residual expressions from AM to execution

Reply via email to