szlta opened a new pull request, #3203: URL: https://github.com/apache/hive/pull/3203
The filter expression that goes with the file scan tasks is actually not a "residual" one, but rather the original data filter. This is good for us, as now we know that for any Hive job the expression is the same object - so we can transfer it another way to Hive execution processes: The expression itself is generated via https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java#L82-L93 before split generation within the AM. There's nothing to prevent us from reusing this same logic on the executors. At the same time we can ask ignoreResiduals() on the table scan, so that Iceberg only uses the filter for split generation, but won't actually attach it to the file scan tasks, and therefore their enwrapping splits. On the execution side we can just simply retrieve the original filter expression by the logic above and evaluate it against the current task (whose spec and partition value information are present anyway), ending up with the actual residual expression for the task. This is then passed to the underlying file formats the same way as before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
