marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r609888812
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##########
@@ -250,9 +255,32 @@ public int execute() {
this.setException(new HiveException(monitor.getDiagnostics()));
}
- // fetch the counters
try {
Set<StatusGetOpts> statusGetOpts =
EnumSet.of(StatusGetOpts.GET_COUNTERS);
+ // save useful commit information into session conf, e.g. for custom
commit hooks
+ List<BaseWork> allWork = work.getAllWork();
+ boolean hasReducer =
allWork.stream().map(workToVertex::get).anyMatch(v ->
v.getName().startsWith("Reducer"));
+ for (BaseWork baseWork : allWork) {
+ Vertex vertex = workToVertex.get(baseWork);
+ if (!hasReducer || vertex.getName().startsWith("Reducer")) {
Review comment:
Correct, the goal is to pick out only those vertices that wrote to a
table. I will look into the `dataSinks()` method, that could be a better way to
identify them.
In principle, the mixed table scenario you mentioned should work. If we had
two reducer vertices (that wrote to tables), we'll check both of them here. The
vertex that wrote to the Iceberg table will publish its info (jobId, task num)
with the table suffix into the session conf, and the vertex that wrote the Hive
table will do the same except without any table suffix. But since we need no
commit for the Hive table one, that second part is more or less irrelevant for
now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]