Tim Robertson created HBASE-21183:
-------------------------------------
Summary: loadincrementalHFiles sometimes throws
FileNotFoundException on retry
Key: HBASE-21183
URL: https://issues.apache.org/jira/browse/HBASE-21183
Project: HBase
Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Tim Robertson
On a nightly batch job which prepares 100s of well balanced HFiles at around
2GB each, we see sporadic failures in a bulk load.
I'm unable to paste the logs here (different network) but they show e.g. the
following on a failing day:
{code}
Trying to load hfile... /my/input/path/...
Attempt to bulk load region containing ... failed. This is recoverable and will
be retried
Attempt to bulk load region containing ... failed. This is recoverable and will
be retried
Attempt to bulk load region containing ... failed. This is recoverable and will
be retried
Split occurred while grouping HFiles, retry attempt 1 with 3 files remaining to
group or split
Trying to load hfile...
IOException during splitting
java.io.FileNotFoundException: File does not exist: /my/input/path/...
{code}
The exception get's thrown from [this
line|https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java#L685].
I should note that this is a secure cluster (CDH 5.12.x).
I've tried to go through the code, and don't spot an obvious race condition. I
don't spot any changes related to this for the later 1.x versions so presume
this exists in 1.5.
I'm yet to get access to the NameNode audit logs when this occurs to trace
through the rename() calls around these particular files.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)