[
https://issues.apache.org/jira/browse/HDFS-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mukul Kumar Singh updated HDFS-11786:
-------------------------------------
Attachment: HDFS-11786.002.patch
Thanks for the review [~anu], I have modified the copyfromLocal to make it
multithreaded.
Number of threads is an optional parameter, default value for number of threads
is 1.
This improvement does help in reducing time to copy files drastically, reducing
copy time from 14m7s to 3m18s. Please note that the test was done with 12,000
files with random file sizes between 1-10 MB.
*Single threaded put with the put command*
{code}
[hdfs@y129 ~]$ time /opt/hadoop/hadoop-3.0.0-alpha4-SNAPSHOT/bin/hdfs dfs -put
test /single2
17/06/30 12:06:48 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
real 14m7.093s
user 5m48.357s
sys 1m54.895s
{code}
*For Multi threaded put with 10 threads using copyFromLocal command*
{code}
[hdfs@y129 ~]$ time /opt/hadoop/hadoop-3.0.0-alpha4-SNAPSHOT/bin/hdfs dfs
-copyFromLocal -nt 10 test /multi1
17/06/30 12:24:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
real 3m18.574s
user 3m42.582s
sys 1m18.718s
{code}
> Add a new command for multi threaded Put/CopyFromLocal
> ------------------------------------------------------
>
> Key: HDFS-11786
> URL: https://issues.apache.org/jira/browse/HDFS-11786
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Reporter: Mukul Kumar Singh
> Assignee: Mukul Kumar Singh
> Attachments: HDFS-11786.001.patch, HDFS-11786.002.patch
>
>
> CopyFromLocal/Put is not currently multithreaded.
> In case, where there are multiple files which need to be uploaded to the
> hdfs, a single thread reads the file and then copies the data to the cluster.
> This copy to hdfs can be made faster by uploading multiple files in parallel.
> I am attaching the initial patch so that I can get some initial feedback.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]