Not sure if this list is the right place for this discussion, but these are my thoughts based on what I've seen:
I think most people operating at scale are more in the 30-50 node range. One admin can handle several of those. You will have to replace disks occasionally, but the system is as fault tolerant as advertised. The main challenges here are resource allocation and scheduling - handling both long-running and ad hoc workloads on the same cluster. All of the schedulers have their deficiencies. I have never noticed a performance hit from a node going down, as it was always a few % hit at most. As you approach 100 nodes and high utilization things get harder. You have to stay on top of things or block loss can occur, even with triple replication. Administration is a two man job at that point, or someone is always getting woken up. Only a few shops operate at this scale, however. Generally speaking, the greatest headache around operating a Hadoop cluster consuming logs/data is not in the Hadoop itself - it is in the infrastructure to stuff the cluster with data. ETL processes can fail, etc. Hadoop operates as advertised, with very little trouble. Operating Hadoop is still fairly primitive however, and one has the sense of feeding punch cards through a mainframe. Debugging can be painful. I'm sure this area will improve. Russ On Fri, Apr 9, 2010 at 9:12 AM, HadoopNoob <[email protected]> wrote: > > All, > > Thanks for your feedback! > > Russ, > > You brought up the point that was on my mind re: hardware failures. > > What are your experiences on that front? > How often did they occur? Where they predictable? > > Hadoop handles it but what kind of performance hits did you see? > > Jim > > > Additional functionality to support Hadoop development would be great, but > Karmasphere seems to be focusing more on this area. > > A framework for deploying Hadoop based applications is the stated goal of > Cloudera Desktop, but it is not yet clear what a Cloudera Desktop 'app' is. > Is that something that submits a jar and pops up a window? I'd like to > see > tighter integration with HIVE and Pig around an intuitive interface, tools > that enable normal users to use Hadoop, similar to: > http://wiki.apache.org/hadoop/Hive/HiveWebInterface > http://wiki.apache.org/pig/PigPen > > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.151.2353&rep=rep1&type=pdf > http://vimeo.com/6032078 and > http://www-01.ibm.com/software/ebusiness/jstart/bigsheets/index.html > Speculative execution a la ILLUSTRATE is critical for > developing/parametizing long-running jobs, and the user interface should > include this feature as an integral part. > > On the operational front, I want a GUI that requires minimal configuration > to discover a bunch of machines and that will automatically install, > configure and optimize hadoop on an entire cluster with minimal involvement > from me. I want logs and auditing of usage, latency, etc. per user and per > queue, etc. I want the ability to automatically move computation to EC2 > when load spikes and when the scale of the data makes it possible to do so. > I want a cluster that builds and runs itself, and only bothers me when > hardware is broken. > > Russ > > <SNIP> > -- > View this message in context: > http://old.nabble.com/Cloudera-desktop-tp28181265p28191274.html > Sent from the HBase User mailing list archive at Nabble.com. > >
