You'd probably know better than I Kevin but I'd worry about the 1000*1000*32 case, where HDFS is as (over)committed as the HBase tier.
On Tue, Apr 15, 2014 at 9:26 AM, Kevin O'dell <[email protected]>wrote: > In general I have never seen nor heard of Federated Namespaces in the wild, > so I would be hesitant to go down that path. But you know for "Science" I > would be interested in seeing how that worked out. Would we be looking at > 32 WALs per region? At a large cluster with 1000nodes, 100 regions per > node, and a WAL per region(I like easy math): > > 1000*100*32= 3.2 million files for WALs This is not ideal, but it is not > horrible if we are using 128MB block sizes etc. > > I feel like I am missing something above though. Thoughts? > > > On Tue, Apr 15, 2014 at 12:20 PM, Andrew Purtell <[email protected] > >wrote: > > > # of WALs as roughly spindles / replication factor seems intuitive. Would > > be interesting to benchmark. > > > > As for one WAL per region, the BigTable paper IIRC says they didn't > because > > of concerns about the number of seeks in the filesystems underlying GFS > and > > because it would reduce the effectiveness of group commit throughput > > optimization. If WALs are backed by SSD certainly the first consideration > > no longer holds. We also had a global HDFS file limit to contend with. I > > know HDFS is incrementally improving the scalabilty of a namespace, but > > this is still an active consideration. (Or we could try partitioning a > > deploy over a federated namespace? Could be "interesting". Has anyone > tried > > that? I haven't heard.) > > > > > > > > On Tue, Apr 15, 2014 at 7:11 AM, Jonathan Hsieh <[email protected]> > wrote: > > > > > It makes sense to have as many wals as # of spindles / replication > factor > > > per machine. This should be decoupled from the number of regions on a > > > region server. So for a cluster with 12 spindles we should likely have > > at > > > least 4 wals (12 spindles / 3 replication factor), and need to do > > > experiments to see if going to 8 or some higher number makes sense (new > > wal > > > uses a disruptor pattern which avoids much contention on individual > > > writes). So with your example, your 1000 regions would get sharded > into > > > the 4 wals which would maximize io throughput, disk utilization, and > > reduce > > > time for recovery in the face of failure. > > > > > > In the case of an SSD world, it makes more sense to have one wal per > node > > > once we have decent HSM support in HDFS. The key win here will be in > > > recovery time -- if any RS goes down we only have to replay a regions > > edits > > > and not have to split or demux different region's edits. > > > > > > Jon. > > > > > > > > > On Mon, Apr 14, 2014 at 10:37 PM, Vladimir Rodionov > > > <[email protected]>wrote: > > > > > > > Todd, how about 300 regions with 3x replication? Or 1000 regions? > This > > > is > > > > going to be 3000 files. on HDFS. per one RS. When I said that it does > > not > > > > scale, I meant that exactly that. > > > > > > > > > > > > > > > > -- > > > // Jonathan Hsieh (shay) > > > // HBase Tech Lead, Software Engineer, Cloudera > > > // [email protected] // @jmhsieh > > > > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > > > > -- > Kevin O'Dell > Systems Engineer, Cloudera > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
