[
https://issues.apache.org/jira/browse/HADOOP-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran resolved HADOOP-15492.
-------------------------------------
Resolution: Won't Fix
> increase performance of s3guard import command
> ----------------------------------------------
>
> Key: HADOOP-15492
> URL: https://issues.apache.org/jira/browse/HADOOP-15492
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Steve Loughran
> Priority: Minor
>
> Some perf improvements which spring to mind having looked at the s3guard
> import command
> Key points: it can handle the import of a tree with existing data better
> # if the bucket is already under s3guard, then the listing will return all
> listed files, which will then be put() again.
> # import calls {{putParentsIfNotPresent()}}, but DDBMetaStore.put() will do
> the parent creation anyway
> # For each entry in the store (i.e. a file), the full parent listing is
> created, then a batch write created to put all the parents and the actual file
> As a result, it's at risk of doing many more put calls than needed,
> especially for wide/deep directory trees.
> It would be much more efficient to put all files in a single directory as
> part of 1+ batch request, with 1 parent tree. Better yet: a get() of that
> parent could skip the put of parent entries.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]