[
https://issues.apache.org/jira/browse/HIVE-24857?focusedWorklogId=562309&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-562309
]
ASF GitHub Bot logged work on HIVE-24857:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 08/Mar/21 11:27
Start Date: 08/Mar/21 11:27
Worklog Time Spent: 10m
Work Description: marton-bod opened a new pull request #2048:
URL: https://github.com/apache/hive/pull/2048
### What changes were proposed in this pull request?
Move the output commit until after the proc.close operation
### Why are the changes needed?
The commit operation might miss those records flushed out by the close()
operation.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Tested on cluster manually with TPCDS inserts by using the
`HiveIcebergOutputCommitter`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 562309)
Remaining Estimate: 0h
Time Spent: 10m
> Trigger Tez output commit after close operation
> -----------------------------------------------
>
> Key: HIVE-24857
> URL: https://issues.apache.org/jira/browse/HIVE-24857
> Project: Hive
> Issue Type: Improvement
> Reporter: Marton Bod
> Assignee: Marton Bod
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently Tez triggers the OutputCommitter.commit() operation between the
> proc.run() and proc.close() operations in TezProcessor. However, when writing
> out data, calling the proc.close() operation may still produce some extra
> records, which would be missed by the output committer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)