[ 
https://issues.apache.org/jira/browse/HADOOP-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398117#comment-15398117
 ] 

Hadoop QA commented on HADOOP-13403:
------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} hadoop-tools/hadoop-azure: The patch generated 
39 new + 43 unchanged - 1 fixed = 82 total (was 44) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 55 line(s) that end in whitespace. Use 
git apply --whitespace=fix <<patch_file>>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
2s{color} | {color:red} The patch 4 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
39s{color} | {color:red} hadoop-tools/hadoop-azure generated 4 new + 0 
unchanged - 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
31s{color} | {color:green} hadoop-azure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-azure |
|  |  Redundant nullcheck of ioThreadPool, which is known to be non-null in 
org.apache.hadoop.fs.azure.NativeAzureFileSystem.executeParallel(NativeAzureFileSystem$AzureFileSystemOperation,
 String, FileMetadata[], String, int, Configuration, String, 
NativeAzureFileSystem$AzureFileSystemThreadOperation)  Redundant null check at 
NativeAzureFileSystem.java:is known to be non-null in 
org.apache.hadoop.fs.azure.NativeAzureFileSystem.executeParallel(NativeAzureFileSystem$AzureFileSystemOperation,
 String, FileMetadata[], String, int, Configuration, String, 
NativeAzureFileSystem$AzureFileSystemThreadOperation)  Redundant null check at 
NativeAzureFileSystem.java:[line 910] |
|  |  Should 
org.apache.hadoop.fs.azure.NativeAzureFileSystem$AzureFileSystemThreadFactory 
be a _static_ inner class?  At NativeAzureFileSystem.java:inner class?  At 
NativeAzureFileSystem.java:[lines 717-737] |
|  |  new 
org.apache.hadoop.fs.azure.NativeAzureFileSystem$AzureFileSystemThreadRunnable(NativeAzureFileSystem,
 NativeAzureFileSystem$AzureFileSystemOperation, String, FileMetadata[], 
NativeAzureFileSystem$AzureFileSystemThreadOperation) may expose internal 
representation by storing an externally mutable object into 
NativeAzureFileSystem$AzureFileSystemThreadRunnable.files  At 
NativeAzureFileSystem.java:NativeAzureFileSystem$AzureFileSystemThreadOperation)
 may expose internal representation by storing an externally mutable object 
into NativeAzureFileSystem$AzureFileSystemThreadRunnable.files  At 
NativeAzureFileSystem.java:[line 796] |
|  |  Should 
org.apache.hadoop.fs.azure.NativeAzureFileSystem$AzureFileSystemThreadRunnable 
be a _static_ inner class?  At NativeAzureFileSystem.java:inner class?  At 
NativeAzureFileSystem.java:[lines 766-841] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12820673/HADOOP-13403-002.patch
 |
| JIRA Issue | HADOOP-13403 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 52faf7262ecd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 26de4f0 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/10108/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/10108/artifact/patchprocess/whitespace-eol.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/10108/artifact/patchprocess/whitespace-tabs.txt
 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/10108/artifact/patchprocess/new-findbugs-hadoop-tools_hadoop-azure.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/10108/testReport/ |
| modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/10108/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> AzureNativeFileSystem rename/delete performance improvements
> ------------------------------------------------------------
>
>                 Key: HADOOP-13403
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13403
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: azure
>    Affects Versions: 2.7.2
>            Reporter: Subramanyam Pattipaka
>             Fix For: 2.9.0
>
>         Attachments: HADOOP-13403-001.patch, HADOOP-13403-002.patch
>
>
> WASB Performance Improvements
> Problem
> -----------
> Azure Native File system operations like rename/delete which has large number 
> of directories and/or files in the source directory are experiencing 
> performance issues. Here are possible reasons
> a)    We first list all files under source directory hierarchically. This is 
> a serial operation. 
> b)    After collecting the entire list of files under a folder, we delete or 
> rename files one by one serially.
> c)    There is no logging information available for these costly operations 
> even in DEBUG mode leading to difficulty in understanding wasb performance 
> issues.
> Proposal
> -------------
> Step 1: Rename and delete operations will generate a list all files under the 
> source folder. We need to use azure flat listing option to get list with 
> single request to azure store. We have introduced config 
> fs.azure.flatlist.enable to enable this option. The default value is 'false' 
> which means flat listing is disabled.
> Step 2: Create thread pool and threads dynamically based on user 
> configuration. These thread pools will be deleted after operation is over.  
> We are introducing introducing two new configs
>       a)      fs.azure.rename.threads : Config to set number of rename 
> threads. Default value is 0 which means no threading.
>       b)      fs.azure.delete.threads: Config to set number of delete 
> threads. Default value is 0 which means no threading.
>       We have provided debug log information on number of threads not used 
> for the operation which can be useful .
>       Failure Scenarios:
>       If we fail to create thread pool due to ANY reason (for example trying 
> create with thread count with large value such as 1000000), we fall back to 
> serialization operation. 
> Step 3: Bob operations can be done in parallel using multiple threads 
> executing following snippet
>       while ((currentIndex = fileIndex.getAndIncrement()) < files.length) {
>               FileMetadata file = files[currentIndex];
>               Rename/delete(file);
>       }
>       The above strategy depends on the fact that all files are stored in a 
> final array and each thread has to determine synchronized next index to do 
> the job. The advantage of this strategy is that even if user configures large 
> number of unusable threads, we always ensure that work doesn’t get serialized 
> due to lagging threads. 
>       We are logging following information which can be useful for tuning 
> number of threads
>       a) Number of unusable threads
>       b) Time taken by each thread
>       c) Number of files processed by each thread
>       d) Total time taken for the operation
>       Failure Scenarios:
>       Failure to queue a thread execute request shouldn’t be an issue if we 
> can ensure at least one thread has completed execution successfully. If we 
> couldn't schedule one thread then we should take serialization path. 
> Exceptions raised while executing threads are still considered regular 
> exceptions and returned to client as operation failed. Exceptions raised 
> while stopping threads and deleting thread pool shouldn't can be ignored if 
> operation all files are done with out any issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to