Functional programming paradigms, the map/reduce pattern, and to a lesser extent distributed and parallel processing in general are subjects not widely understood by most quasi-technical management. Further, the notion of commodity machines with guarenteed lack of reliability as a means of achieving high performance and high scalability is essentially counterintuitive. Even referring newcomers to what I regard as the seminal papers on these topics (the papers written by Ghemewat, Dean et al at Google, and yes I know it all started with LISP but my management wasn't even alive in the 1970s although I was :-)), people steeped in a long tradition of "shared everything" database architectures still don't quite get it. I spend considerable amounts of time in what amounts to management de-programming: No MySQL can't do this, and Oracle can't either, except with Oracle it will cost you a lot more to find that out. Hadoop, hTable, PIG and the like offer adopters a competitive edge which, in my mind, is so great that list participants may not wish their company identities to be known. On the other hand, a good list of "Company X is solving general problem Y using N nodes of configuration K" is extremely helpful in advancing the "cause" of this technology. Any success stories, even those carefully disguised to protect identity, product, and process are extremely helpful. In our case I am planning to deploy Hadoop to process substantial quantities of data generated by our application's users. Our current plan is to deploy on a 32 or 64 node cluster, with machines which contain: 4 cores, 4G memory, 2T local disk (JBOD). A 32 node implementation with replication set to 3 will yield 18-20T of useable space. We are also actively experimenting and researching on EC2, as a substantially larger grid of EC2 machines may yield acceptable performance at a price far lower than building out and maintaining our own grid. Prototyping to date has yielded results that I can only describe as astounding :-). More stories most welcome.... C G Konstantin Shvachko <[EMAIL PROTECTED]> wrote: This is exactly the reason I am proposing to create a testimonial wiki page for hadoop.
See HADOOP-1754. https://issues.apache.org/jira/browse/HADOOP-1754 As C G says the company name might not be always relevant as in "powered by hadoop", but applications, problems, tasks are. --Konstantin Eric Baldeschwieler wrote: > Responses to the list welcome. I know of several companies not on > that list that are using it. > > It would be great to hear from you guys. > > E14 > > On Sep 4, 2007, at 6:59 AM, C G wrote: > >> All: >> >> I am interested in hearing any success stories around deploying >> Hadoop in a commercial/non-academic environment. My interest is >> mostly around generating collateral for justifying our own >> deployment of Hadoop. Any stories would be great...if you can't >> name your company, if you could at least describe the application >> and/or business that would be incredibly useful. >> >> Thanks very much, >> C G >> >> >> >> --------------------------------- >> Be a better Globetrotter. Get better travel answers from someone who >> knows. >> Yahoo! Answers - Check it out. >> > > --------------------------------- Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when.
