[
https://issues.apache.org/jira/browse/HADOOP-14512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Xu updated HADOOP-14512:
----------------------------
Attachment: HADOOP-14512.001.patch
[~nitin_matrix]
I have added a WARN message. Thanks for the suggestion.
> WASB atomic rename should not throw exception if the file is neither in src
> nor in dst when doing the rename
> ------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-14512
> URL: https://issues.apache.org/jira/browse/HADOOP-14512
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/azure
> Reporter: Duo Xu
> Attachments: HADOOP-14512.001.patch
>
>
> During atomic rename operation, WASB creates a rename pending json file to
> document which files need to be renamed and the destination. Then WASB will
> read this file and rename all the files one by one.
> There is a recent customer incident in HBase showing a potential bug in the
> atomic rename implementation,
> For example, below is a rename pending json file,
> {code}
> {
> FormatVersion: "1.0",
> OperationUTCTime: "2017-04-29 06:08:57.465",
> OldFolderName: "hbase\/data\/default\/abc",
> NewFolderName: "hbase\/.tmp\/data\/default\/abc",
> FileList: [
> ".tabledesc",
> ".tabledesc\/.tableinfo.0000000001",
> ".tmp",
> "08e698e0b7d4132c0456b16dcf3772af",
> "08e698e0b7d4132c0456b16dcf3772af\/.regioninfo",
> "08e698e0b7d4132c0456b16dcf3772af\/0\/617294e0737e4d37920e1609cf539a83",
> "08e698e0b7d4132c0456b16dcf3772af\/recovered.edits\/185.seqid",
> "08e698e0b7d4132c0456b16dcf3772af\/.regioninfo",
> "08e698e0b7d4132c0456b16dcf3772af\/0",
> "08e698e0b7d4132c0456b16dcf3772af\/0\/617294e0737e4d37920e1609cf539a83",
> "08e698e0b7d4132c0456b16dcf3772af\/recovered.edits",
> "08e698e0b7d4132c0456b16dcf3772af\/recovered.edits\/185.seqid"
> ]
> }
> {code}
> When HBase regionserver process (underlying is using WASB driver) was
> renaming "08e698e0b7d4132c0456b16dcf3772af\/.regioninfo", the regionserver
> process crashed or the VM got rebooted due to system maintenence. When the
> regionserver process started running again, it found the rename pending json
> file and tried to redo the rename operation.
> However, when it read the first file ".tabledesc" in the file list, it could
> not find this file in src folder and it also could not find the file in
> destination folder. It could not find it in src folder because the file had
> already been renamed/moved to the destination folder. It could not find it in
> destination folder because when HBase starts, it will clean up all the files
> under /hbase/.tmp.
> The current implementation will throw exceptions saying
> {code}
> else {
> throw new IOException(
> "Attempting to complete rename of file " + srcKey + "/" + fileName
> + " during folder rename redo, and file was not found in source "
> + "or destination.");
> }
> {code}
> This will cause HBase HMaster initialization failure and restart HMaster will
> not work because the same exception will throw again.
> My proposal is that if during the redo, WASB finds a file not in src and not
> in dst, WASB should just skip this file and process the next file rather than
> throw the error and let user manually fix it. Reasons are
> 1. Since the rename pending json file contains file A, if the file A is not
> in src, it must have been renamed.
> 2. if the file A is not in src and not in dst, the upper layer service must
> have removed it. One thing to note is that during the atomic rename, the
> folder is locked. So the only situation the file gets deleted is when VM
> reboots or service process crashes. When service process restarts, there
> might be some operations happening before the atomic rename redo, like the
> HBase example above.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]