yuquan wang created HIVE-25295:
----------------------------------
Summary: "File already exist exception" during mapper/reducer
retry with old hive(0.13)
Key: HIVE-25295
URL: https://issues.apache.org/jira/browse/HIVE-25295
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 0.13.0
Reporter: yuquan wang
We are now using very old hive version(0.13) due to historical reason, and we
often meet following issue:
{code:java}
Caused by: java.io.IOException: File already
exists:s3://smart-dmp/warehouse/uploaded/ad_dmp_pixel/dt=2021-06-21/key=259f3XXXXXXX
{code}
We have investigated this issue for quite a long time, but didn't get a good
fix, so I may want to ask the hive community for help to see if there are any
solutions.
The error is created during map/reduce stage, once an instance failed due to
some unexpected reason(for example unstable spot instance got killed), then
later retry will throw the above exception, instead of overwriting it.
we have several guesses like following:
1. Is it caused by orc file type? I have found similar issue like
https://issues.apache.org/jira/browse/HIVE-6341 but saw no comments there, and
our table is stored as orc style.
2. Is the problem solved in the higher hive version? because we are also
running hive 2.3.6, but didn't meet such an issue, so want to see if version
upgrade can solve the issue?
3.Do we have such a config that supports always cleaning up existing folders
during retry of mapper/reducer stage. I have searched all mapreduce config but
can not find one.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)