Hi Sriram,

>> The I-file concept could possibly be implemented here in a fairly self 
>> contained way. One
>> could even colocate/embed a KFS filesystem with such an alternate
>> shuffle, like how MR task temporary space is usually colocated with
>> HDFS storage.

>  Exactly.

>> Does this seem reasonable in any way?

> Great. Where do go from here?  How do we get a colloborative effort going? 


Sounds like a JIRA issue should be opened, the approach briefly described, and 
the first implementation attempt made.  Then iterate.

I look forward to seeing this! :)

Otis
--

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>________________________________
> From: Sriram Rao <srirams...@gmail.com>
>To: common-dev@hadoop.apache.org 
>Sent: Tuesday, May 8, 2012 6:48 PM
>Subject: Re: Sailfish
> 
>Dear Andy,
>
>> From: Andrew Purtell <apurt...@apache.org>
>> ...
>
>> Do you intend this to be a joint project with the Hadoop community or
>> a technology competitor?
>
>As I had said in my email, we are looking for folks to colloborate
>with us to help get us integrated with Hadoop.  So, to be explicitly
>clear, we are intending for this to be a joint project with the
>community.
>
>> Regrettably, KFS is not a "drop in replacement" for HDFS.
>> Hypothetically: I have several petabytes of data in an existing HDFS
>> deployment, which is the norm, and a continuous MapReduce workflow.
>> How do you propose I, practically, migrate to something like Sailfish
>> without a major capital expenditure and/or downtime and/or data loss?
>
>Well, we are not asking for KFS to replace HDFS.  One path you could
>take is to experiment with Sailfish---use KFS just for the
>intermediate data and HDFS for everything else.  There is no major
>capex :).  While you get comfy with pushing intermediate data into a
>DFS, we get the ideas added to HDFS.  This simplifies deployment
>considerations.
>
>> However, can the Sailfish I-files implementation be plugged in as an
>> alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and
>> MAPREDUCE-4049),
>
>This'd be great!
>
>> with necessary additional plumbing for dynamic
>> adjustment of reduce task population? And the workbuilder could be
>> part of an alternate MapReduce Application Manager?
>
>It should be part of the AM.  (Currently, with our implementation in
>Hadoop-0.20.2, the workbuilder serves the role of an AM).
>
>> The I-file concept could possibly be implemented here in a fairly self 
>> contained way. One
>> could even colocate/embed a KFS filesystem with such an alternate
>> shuffle, like how MR task temporary space is usually colocated with
>> HDFS storage.
>
>Exactly.
>
>> Does this seem reasonable in any way?
>
>Great. Where do go from here?  How do we get a colloborative effort going?
>
>Best,
>
>Sriram
>
>>>  From: Sriram Rao <srirams...@gmail.com>
>>> To: common-dev@hadoop.apache.org
>>> Sent: Tuesday, May 8, 2012 10:32 AM
>>> Subject: Project announcement: Sailfish (also, looking for colloborators)
>>>
>>> Hi,
>>>
>>> I'd like to announce the release of a new open source project, Sailfish.
>>>
>>> http://code.google.com/p/sailfish/
>>>
>>> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
>>> which process TB's of data and run for hours.  In building Sailfish, we
>>> modify how map-output is handled and transported from map->reduce.
>>>
>>> The project pages provide more information about the project.
>>>
>>> We are looking for colloborators who can help get some of the ideas into
>>> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
>>> Hadoop pluggable.
>>>
>>> If you are interested in working with us, please get in touch with me.
>>>
>>> Sriram
>>
>
>
>
>-- 
>Best regards,
>
>   - Andy
>
>Problems worthy of attack prove their worth by hitting back. - Piet
>Hein (via Tom White)
>
>
>

Reply via email to