Re: Hadoop optimization for Lustre FS

Robert Evans Wed, 16 May 2012 02:55:47 -0700

Zam,

http://wiki.apache.org/hadoop/HowToContribute is a wiki that can tell you in 
more detail the steps you need to do for this. In general though to push the 
patch upstream you want to file a Map/Reduce JIRA, and attach your patch.  
After that several people from the community are likely to comment on the JIRA. 
 If you don't get feedback you can bug us on the dev mailing list about it.  As 
part of this you are also going to need to do a port to trunk, as we do not 
want to have new features go into any line without having it go into trunk as 
well.  Even though this sounds potentially complex because trunk uses YARN 
instead of the previous Map/Reduce specific framework both 1.0 and trunk are in 
the process of getting a pluggable shuffle service MAPREDUCE-4049.  It would 
probably be best to port your patch to be a plugin for this.  Then hopefully 
the porting between trunk and 1.0 will be relatively simple.

If this is the route you want to go you should put 1.1 and 3.0.0 as the target
versions of the JIRA. 3.0.0 corresponds to trunk, and 1.1 is the next release
of the 1 line that is accepting new major feature work. You probably also want
to link your JIRA to the MAPREDUCE-4049 JIRA as a dependency, if you are making
it a plugin.

In addition because this is an optimization it would be nice to have some
information in the JIRA showing the benchmarks you ran and the performance
improvements you got. Ultimately we are also going to want to have some
documentation about this as well, but that is something that can come later
after you lock down the code more.

--Bobby Evans

On 5/16/12 3:34 AM, "Alexander Zarochentsev"
<[email protected]> wrote:

Hello,

there is an optimization for Hadoop on Lustre FS, or any
high-performance distributed filesystem.

The research paper with test results can be found here
http://www.xyratex.com/pdfs/whitepapers/Xyratex_white_paper_MapReduce_1-4.pdf
and a presentation for LUG 2011:
http://www.olcf.ornl.gov/wp-content/events/lug2011/4-12-2011/1100-1130_Nathan_Rutman_MapReduce_Lug_2011.pptx

Basically the optimization is a replacement for http transport in
shuffle phase by simple linking target file to the source one. I
attached a draft patch against hadoop-1.0.0 to illustrate the idea.
How to push this patch upstream?

Thanks,
--

Alexander "Zam" Zarochentsev
[email protected]

______________________________________________________________________
This email may contain privileged or confidential information, which should
only be used for the purpose for which it was sent by Xyratex. No further
rights or licenses are granted to use such information. If you are not the
intended recipient of this message, please notify the sender by return and
delete it. You may not use, copy, disclose or rely on the information contained
in it.

Internet email is susceptible to data corruption, interception and unauthorised
amendment for which Xyratex does not accept liability. While we have taken
reasonable precautions to ensure that this email is free of viruses, Xyratex
does not accept liability for the presence of any computer viruses in this
email, nor for any losses caused as a result of viruses.

Xyratex Technology Limited (03134912), Registered in England & Wales,
Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.

The Xyratex group of companies also includes, Xyratex Ltd, registered in
Bermuda, Xyratex International Inc, registered in California, Xyratex
(Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd
registered in The People's Republic of China and Xyratex Japan Limited
registered in Japan.
______________________________________________________________________

Re: Hadoop optimization for Lustre FS

Reply via email to