pvary commented on a change in pull request #1407:
URL: https://github.com/apache/iceberg/pull/1407#discussion_r487214980
##########
File path:
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputFormat.java
##########
@@ -271,28 +271,25 @@ public void setupTask(TaskAttemptContext
taskAttemptContext) {
}
@Override
- public boolean needsTaskCommit(TaskAttemptContext taskAttemptContext) {
- return true;
+ public boolean needsTaskCommit(TaskAttemptContext context) {
+ // We need to commit if this is the last phase of a MapReduce process
+ return
TaskType.REDUCE.equals(context.getTaskAttemptID().getTaskID().getTaskType()) ||
+ context.getJobConf().getNumReduceTasks() == 0;
Review comment:
I have tested this with adding the jars to the Hive master from the
apache repo and running the queries in local mode. I got some exception and
started to investigate the source.
The actual query I have used for this had an order by so it had both a
Mapper and a Reducer phase:
```
insert into purchases select * from purchases order by id;
```
My debugging revealed that the commitTask was called for Mappers and
Reducers as well. The JobRunner infrastructure calls those and it does not have
any information about if the task has actually written anything to anywhere or
not. I think this is the purpose of the _needsTaskCommit()_ method, so the
developer of the OutputCommitter could decide.
@massdosage: Do you have an easy example for HiveRunner write tests?
Thanks,
Peter
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]