i definitely agree with scott. as a user of the hadoop open source stack for building our banking focused big data analytics applications, i speak on behalf of our clients and the emerging hadoop eco-system that open and honest conversations on this thread/group, irrespective of whether one represents a company or apache, should be encouraged.
as an instance, with the fact that cloudera, mapR and soon hortonworks are all going to be offering competing hadoop distros for enterprises, it is important for all of us (and prospective users) to understand what they are doing to address critical gaps on the platform, and how the hadoop ecosystem benefits from it. From our perspective, it doesn't matter if one is better than the other (which is not the point i saw ted or mc making), but that companies, startups, apache and everybody else: 1. is thinking of the right issues 2. willing to solve them (and ideally contributing the solutions back) and 3. informing the exploding hadoop userbase of what not to do I see it benefitting all of us, especially as Hadoop rapidly jumps the transom and becomes the platform of choice for data management in industries like banking, retail and healthcare...just as it has for social media and the web... isn't that what we are launching our business plans around anyway... And in that sense we all owe ASF and the hadoop community (and not any one company) an equal amount of gratitude, humility and respect. On Jul 1, 2011, at 1:22 PM, Scott Carey wrote: > Although this thread is wandering a bit, I disagree strongly that it is > inappropriate to discuss other vendor specific features (or competing > compute platform features) on general@. The topic has become the factors > that influence hardware purchase choices, and one of those is how the > system deals with disk failure. Compare/contrast with other platforms is > healthy for the Hadoop project! +1 > > On 6/30/11 9:47 PM, "Ian Holsman" <[email protected]> wrote: > >> >> On Jul 1, 2011, at 2:08 PM, M. C. Srivas wrote: >> >>> On Thu, Jun 30, 2011 at 5:24 PM, Todd Lipcon <[email protected]> wrote: >>> >>>> >>>> I'd advise you to look at "stock hadoop" again. This used to be true, >>>> but >>>> was fixed a long while back by HDFS-457 and several followup JIRAs. >>>> >>>> If MapR does something fancier, I'm sure we'd be interested to hear >>>> about >>>> it >>>> so we can compare the approaches. >>>> >>>> -Todd >>>> >>>> >>> MapR tracks disk responsiveness. In other words, a moving histogram of >>> IO-completion times is maintained internally, and if a disk starts >>> getting >>> really slow, it is pre-emptively taken offline so it does not create >>> long >>> tails for running jobs (and the data on the disk is re-replicated using >>> whatever re-replication policy is in place). One of the benefits of >>> managing the disks directly instead of through ext3 / xfs / or other ... >>> >>> All these stats can be fed into Ganglia (or pushed out centrally via a >>> text >>> file that can be pulled out using NFS) if historical info about disk >>> behavior (and failures) needs to be preserved. >>> >>> - Srivas. >> >> While I am intrigued about how MapR performs internally, I don't think >> this is the forum for it. >> please keep MapR (and other vendor specific discussions) on their >> respective support forums. >> >> Thanks! >> >> Ian. >> >
