WAL is a major issue, but another one that is coming up fast is the SPOF that is the namenode.
Right now, namenode aside, I can rolling restart my entire cluster, including rebooting the machines if I needed to. But not so with the namenode, because if it does AWOL, all sorts of bad can happen. I hope that HDFS 0.21 addresses both these issues. Can we get positive confirmation that this is being worked on? -ryan On Thu, Aug 6, 2009 at 10:25 AM, Andrew Purtell<[email protected]> wrote: > I updated the roadmap up on the wiki: > > > * Data integrity > * Insure that proper append() support in HDFS actually closes the > WAL last block write hole > * HBase-FSCK (HBASE-7) -- Suggest making this a blocker for 0.21 > > I have had several recent conversations on my travels with people in > Fortune 100 companies (based on this list: > http://www.wageproject.org/content/fortune/index.php). > > You and I know we can set up well engineered HBase 0.20 clusters that > will be operationally solid for a wide range of use cases, but given > those aforementioned discussions there are certain sectors which would > say HBASE-7 is #1 before HBase is "bank ready". Not until we can say: > > - Yes, when the client sees data has been committed, it actually has > been written and replicated on spinning or solid state media in all > cases. > > - Yes, we go to great lengths to recover data if ${deity} forbid you > crush some underprovisioned cluster with load or some bizarre bug or > system fault happens. > > HBASE-1295 is also required for business continuity reasons, but this > is already a priority item for some HBase committers. > > The question is I think does the above align with project goals. > Making HBase-FSCK a blocker will probably knock something someone > wants for the 0.21 timeframe off the list. > > - Andy > > >
