[ 
https://issues.apache.org/jira/browse/HADOOP-14512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051023#comment-16051023
 ] 

Mingliang Liu commented on HADOOP-14512:
----------------------------------------

Steve, sorry for the late report. I run all the unit and live tests against us 
west. All pass. It's good convention that we post test report before commit.

{code}
hadoop-tools/hadoop-azure $ mvn test -q

-------------------------------------------------------
 T E S T S
-------------------------------------------------------

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractAppend
Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 27.74 sec - in 
org.apache.hadoop.fs.azure.contract.TestAzureNativeContractAppend
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractCreate
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.296 sec - 
in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractCreate
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDelete
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.426 sec - in 
org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDelete
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDistCp
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 210.658 sec - 
in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDistCp
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractGetFileStatus
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.542 sec - 
in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractGetFileStatus
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractMkdir
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 87.217 sec - in 
org.apache.hadoop.fs.azure.contract.TestAzureNativeContractMkdir
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractOpen
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.406 sec - in 
org.apache.hadoop.fs.azure.contract.TestAzureNativeContractOpen
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractRename
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.704 sec - in 
org.apache.hadoop.fs.azure.contract.TestAzureNativeContractRename
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractSeek
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 56.787 sec - 
in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractSeek
Running org.apache.hadoop.fs.azure.metrics.TestAzureFileSystemInstrumentation
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 142.165 sec - 
in org.apache.hadoop.fs.azure.metrics.TestAzureFileSystemInstrumentation
Running org.apache.hadoop.fs.azure.metrics.TestBandwidthGaugeUpdater
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.379 sec - in 
org.apache.hadoop.fs.azure.metrics.TestBandwidthGaugeUpdater
Running 
org.apache.hadoop.fs.azure.metrics.TestNativeAzureFileSystemMetricsSystem
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.734 sec - in 
org.apache.hadoop.fs.azure.metrics.TestNativeAzureFileSystemMetricsSystem
Running org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.217 sec - in 
org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage
Running org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.986 sec - in 
org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Running org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.949 sec - in 
org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions
Running org.apache.hadoop.fs.azure.TestBlobDataValidation
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.96 sec - in 
org.apache.hadoop.fs.azure.TestBlobDataValidation
Running org.apache.hadoop.fs.azure.TestBlobMetadata
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.786 sec - in 
org.apache.hadoop.fs.azure.TestBlobMetadata
Running org.apache.hadoop.fs.azure.TestBlobTypeSpeedDifference
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.63 sec - in 
org.apache.hadoop.fs.azure.TestBlobTypeSpeedDifference
Running org.apache.hadoop.fs.azure.TestContainerChecks
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.44 sec - in 
org.apache.hadoop.fs.azure.TestContainerChecks
Running org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionHandling
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 47.314 sec - 
in org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionHandling
Running org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionMessage
Tests run: 47, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 373.217 sec - 
in org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionMessage
Running 
org.apache.hadoop.fs.azure.TestFileSystemOperationsExceptionHandlingMultiThreaded
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.973 sec - 
in 
org.apache.hadoop.fs.azure.TestFileSystemOperationsExceptionHandlingMultiThreaded
Running org.apache.hadoop.fs.azure.TestFileSystemOperationsWithThreads
Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 399.735 sec - 
in org.apache.hadoop.fs.azure.TestFileSystemOperationsWithThreads
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAppend
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 172.078 sec - 
in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAppend
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAtomicRenameDirList
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.393 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAtomicRenameDirList
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorization
Tests run: 21, Failures: 0, Errors: 0, Skipped: 21, Time elapsed: 6.24 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorization
Running 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorizationWithOwner
Tests run: 24, Failures: 0, Errors: 0, Skipped: 24, Time elapsed: 7.036 sec - 
in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorizationWithOwner
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemBlockLocations
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.822 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemBlockLocations
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemClientLogging
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.837 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemClientLogging
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.999 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrencyLive
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.766 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrencyLive
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractEmulator
Tests run: 43, Failures: 0, Errors: 0, Skipped: 43, Time elapsed: 0.432 sec - 
in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractEmulator
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractLive
Tests run: 43, Failures: 0, Errors: 0, Skipped: 5, Time elapsed: 208.667 sec - 
in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractLive
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked
Tests run: 43, Failures: 0, Errors: 0, Skipped: 5, Time elapsed: 1.151 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractPageBlobLive
Tests run: 43, Failures: 0, Errors: 0, Skipped: 5, Time elapsed: 218.064 sec - 
in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractPageBlobLive
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.811 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemLive
Tests run: 51, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 431.203 sec - 
in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemLive
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked
Tests run: 46, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.041 sec - 
in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked
Tests run: 50, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.346 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemUploadLogic
Tests run: 3, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0.058 sec - in 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemUploadLogic
Running org.apache.hadoop.fs.azure.TestNativeAzureFSPageBlobLive
Tests run: 46, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 443.228 sec - 
in org.apache.hadoop.fs.azure.TestNativeAzureFSPageBlobLive
Running org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.843 sec - in 
org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations
Running org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperationsLive
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.627 sec - in 
org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperationsLive
Running org.apache.hadoop.fs.azure.TestReadAndSeekPageBlobAfterWrite
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 299.048 sec - 
in org.apache.hadoop.fs.azure.TestReadAndSeekPageBlobAfterWrite
Running org.apache.hadoop.fs.azure.TestShellDecryptionKeyProvider
Tests run: 2, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 0.105 sec - in 
org.apache.hadoop.fs.azure.TestShellDecryptionKeyProvider
Running org.apache.hadoop.fs.azure.TestWasbFsck
Tests run: 2, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.728 sec - in 
org.apache.hadoop.fs.azure.TestWasbFsck
Running org.apache.hadoop.fs.azure.TestWasbRemoteCallHelper
Tests run: 8, Failures: 0, Errors: 0, Skipped: 8, Time elapsed: 2.237 sec - in 
org.apache.hadoop.fs.azure.TestWasbRemoteCallHelper
Running org.apache.hadoop.fs.azure.TestWasbUriAndConfiguration
Tests run: 18, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 10.17 sec - in 
org.apache.hadoop.fs.azure.TestWasbUriAndConfiguration

Results :

Tests run: 704, Failures: 0, Errors: 0, Skipped: 119
{code}

> WASB atomic rename should not throw exception if the file is neither in src 
> nor in dst when doing the rename
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14512
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14512
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 2.8.0
>            Reporter: Duo Xu
>            Assignee: Duo Xu
>             Fix For: 3.0.0-alpha4, 2.8.2
>
>         Attachments: HADOOP-14512.001.patch, HADOOP-14512.002.patch
>
>
> During atomic rename operation, WASB creates a rename pending json file to 
> document which files need to be renamed and the destination. Then WASB will 
> read this file and rename all the files one by one.
> There is a recent customer incident in HBase showing a potential bug in the 
> atomic rename implementation,
> For example, below is a rename pending json file,
> {code}
> {
>   FormatVersion: "1.0",
>   OperationUTCTime: "2017-04-29 06:08:57.465",
>   OldFolderName: "hbase\/data\/default\/abc",
>   NewFolderName: "hbase\/.tmp\/data\/default\/abc",
>   FileList: [
>     ".tabledesc",
>     ".tabledesc\/.tableinfo.0000000001",
>     ".tmp",
>     "08e698e0b7d4132c0456b16dcf3772af",
>     "08e698e0b7d4132c0456b16dcf3772af\/.regioninfo",
>     "08e698e0b7d4132c0456b16dcf3772af\/0\/617294e0737e4d37920e1609cf539a83",
>     "08e698e0b7d4132c0456b16dcf3772af\/recovered.edits\/185.seqid",
>     "08e698e0b7d4132c0456b16dcf3772af\/.regioninfo",
>     "08e698e0b7d4132c0456b16dcf3772af\/0",
>  "08e698e0b7d4132c0456b16dcf3772af\/0\/617294e0737e4d37920e1609cf539a83",
>     "08e698e0b7d4132c0456b16dcf3772af\/recovered.edits",
>     "08e698e0b7d4132c0456b16dcf3772af\/recovered.edits\/185.seqid"
>   ]
> }
> {code}  
> When HBase regionserver process (underlying is using WASB driver) was 
> renaming  "08e698e0b7d4132c0456b16dcf3772af\/.regioninfo", the regionserver 
> process crashed or the VM got rebooted due to system maintenence. When the 
> regionserver process started running again, it found the rename pending json 
> file and tried to redo the rename operation. 
> However, when it read the first file ".tabledesc" in the file list, it could 
> not find this file in src folder and it also could not find the file in 
> destination folder. It could not find it in src folder because the file had 
> already been renamed/moved to the destination folder. It could not find it in 
> destination folder because when HBase starts, it will clean up all the files 
> under /hbase/.tmp.
> The current implementation will throw exceptions saying
> {code}
> else {
>         throw new IOException(
>             "Attempting to complete rename of file " + srcKey + "/" + fileName
>             + " during folder rename redo, and file was not found in source "
>             + "or destination.");
>       }
> {code}
> This will cause HBase HMaster initialization failure and restart HMaster will 
> not work because the same exception will throw again.
> My proposal is that if during the redo, WASB finds a file not in src and not 
> in dst, WASB should just skip this file and process the next file rather than 
> throw the error and let user manually fix it. Reasons are
> 1. Since the rename pending json file contains file A, if the file A is not 
> in src, it must have been renamed.
> 2. if the file A is not in src and not in dst, the upper layer service must 
> have  removed it. One thing to note is that during the atomic rename, the 
> folder is locked. So the only situation the file gets deleted is when VM 
> reboots or service process crashes. When service process restarts, there 
> might be some operations happening before the atomic rename redo, like the 
> HBase example above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to