[ https://issues.apache.org/jira/browse/HADOOP-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489073#comment-16489073 ]
Steve Loughran commented on HADOOP-15492: ----------------------------------------- FWIW I'm thinking this could be used to for a fast update of a directory tree as maintenance, but I don't think it's efficient enough yet > increase performance of s3guard import command > ---------------------------------------------- > > Key: HADOOP-15492 > URL: https://issues.apache.org/jira/browse/HADOOP-15492 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Reporter: Steve Loughran > Priority: Major > > Some perf improvements which spring to mind having looked at the s3guard > import command > Key points: it can handle the import of a tree with existing data better > # if the bucket is already under s3guard, then the listing will return all > listed files, which will then be put() again. > # import calls {{putParentsIfNotPresent()}}, but DDBMetaStore.put() will do > the parent creation anyway > # For each entry in the store (i.e. a file), the full parent listing is > created, then a batch write created to put all the parents and the actual file > As a result, it's at risk of doing many more put calls than needed, > especially for wide/deep directory trees. > It would be much more efficient to put all files in a single directory as > part of 1+ batch request, with 1 parent tree. Better yet: a get() of that > parent could skip the put of parent entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org