Re: Which proposed distro of Hadoop, 0.20.206 or 0.22, will be better for HBase?

Konstantin Boudnik Fri, 07 Oct 2011 12:05:31 -0700

On Fri, Oct 07, 2011 at 10:17AM, Steve Loughran wrote:
> On 06/10/2011 17:49, [email protected] wrote:
>> Steve,
>>
>>> Summary: I'm not sure that HDFS is the right FS in this world, as it
>>> contains a lot of assumptions about system stability and HDD persistence
>>> that aren't valid any more. With the ability to plug in new placers you
>>> could do tricks like ensure 1 replica lives in a persistent blockstore
>>> (and rely on it always being there), and add other replicas in transient
>>> storage if the data is about to be needed in jobs.
>>
>> Can you please shed more light on the statement "... as it
>> contains a lot of assumptions about system stability and HDD persistence
>> that aren't valid any more..." ?
>>
>> I know that you were doing some analysis of disk failure modes sometime
>> ago. Is this the result of that research ? I am very interested.
>
> no, it's unrelated -experience in hosting virtual hadoop  
> infrastructures. Which is how my short-lived clusters exist today
>
> -you don't know the hostname of the master nodes until allocated, so you  
> need to allocate them and dynamically push out configs to the workers


This is of course is a big win for non-autodiscoverable architecture ;)

> -the Datanodes spin when the namenode goes down, forever, rather than  
> checking somewhere to see if its changed. HDFS HA may fix that.
..
> -again, the TaskTrackers spin when the JT goes down, rather than look to  
> see if its moved.
..
> -Blacklisting isn't the right way to deal with task tracker failures:  
> termination of VM is.

See my above comment.

Auto-discovery would solve a lot of these issues and many others such as
shared distributed memory suitable for condig management etc.

Cos

Re: Which proposed distro of Hadoop, 0.20.206 or 0.22, will be better for HBase?

Reply via email to