[jira] [Commented] (HADOOP-10560) Update NativeS3FileSystem to issue copy commands for files with in a directory with a configurable number of threads

Andrei Savu (JIRA) Tue, 06 May 2014 16:21:06 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991314#comment-13991314
 ]


Andrei Savu commented on HADOOP-10560:
--------------------------------------

I had quick look over the patch. Some improvements ideas:

* that property should be named fs.s3n.copyThreads to be consistent with other 
s3n related configs
* you need to shutdown() s3CopyExecutor in a finally {} block to avoid leaking 
threads 
* set recognizable thread names to make it easy to debug when stuck for 
whatever reason (see 
http://stackoverflow.com/questions/6113746/naming-threads-and-thread-pools-of-executorservice)
* fix typo in config description - directy - you should probably also add a 
note on why this is necessary as an workaround
* nice to have: a better test that actually tests concurrency and not only 
correctness

[[email protected]] just curious - is there a better way to get a handle on a 
Hadoop managed ExecutorService? 

> Update NativeS3FileSystem to issue copy commands for files with in a 
> directory with a configurable number of threads
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10560
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: performance
>         Attachments: HADOOP-10560-1.patch, HADOOP-10560.patch
>
>
> In NativeS3FileSystem if you do a copy of a directory it will copy all the 
> files to the new location, but it will do this with one thread. Code is 
> below. This jira will allow a configurable number of threads to be used to 
> issue the copy commands to S3.
> do {
> PartialListing listing = store.list(srcKey, S3_MAX_LISTING_LENGTH, 
> priorLastKey, true);
> for (FileMetadata file : listing.getFiles())
> { keysToDelete.add(file.getKey()); store.copy(file.getKey(), dstKey + 
> file.getKey().substring(srcKey.length())); }
> priorLastKey = listing.getPriorLastKey();
> } while (priorLastKey != null);



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10560) Update NativeS3FileSystem to issue copy commands for files with in a directory with a configurable number of threads

Reply via email to