Even with 100 regions, times 1000 region servers, we talk about potentially having 100 000 opened files instead of 1000 (and also we have to count every replica).
I guess that an OS that was configured for such usage would be able to sustain it... You would have to watch that metric cluster-wide, get new nodes when needed, etc. Then you need to make sure that GC pauses won't block for too long to have a very low unavailability time. J-D On Tue, Jan 12, 2010 at 1:07 PM, Kannan Muthukkaruppan <kan...@facebook.com> wrote: >> I presume you intend to run HBase region servers >> colocated with HDFS DataNodes. > > Yes. > > --- > > Seems like we all generally agree that large number of regions per region > server may not be the way to go. > > So coming back to Dhruba's question on having one commit log per region > instead of one commit log per region server. Is the number of HDFS files open > still a major concern? > > Is my understanding correct that unavailability window during region server > failover is large due to the time it takes to split the shared commit log > into a per region log? Instead, if we always had per-region commit logs even > in the normal mode of operation, then the unavailability window would be > minimized? It does minimize the extent of batch/group commits you can do > though-- since you can only batch updates going to the same region. Any other > gotchas/issues? > > regards, > Kannan > -----Original Message----- > From: Andrew Purtell [mailto:apurt...@apache.org] > Sent: Tuesday, January 12, 2010 12:50 PM > To: hbase-dev@hadoop.apache.org > Subject: Re: commit semantics > >> But would say having a >> smaller number of regions per region server (say ~50) be really bad. > > Not at all. > > There are some (test) HBase deployments I know of that go pretty > vertical, multiple TBs of disk on each node therefore wanting a high > number of regions per region server to match that density. That may meet > with operational success but it is architecturally suspect. I ran a test > cluster once with > 1,000 regions per server on 25 servers, in the 0.19 > timeframe. 0.20 is much better in terms of resource demand (less) and > liveness (enormously improved), but I still wouldn't recommend it, > unless your clients can wait for up to several minutes on blocked reads > and writes to affected regions should a node go down. With that many > regions per server, it stands to reason just about every client would be > affected. > > The numbers I have for Google's canonical BigTable deployment are several > years out of date but they go pretty far in the other direction -- about > 100 regions per server is the target. > > I think it also depends on whether you intend to colocate TaskTrackers > with the region servers. I presume you intend to run HBase region servers > colocated with HDFS DataNodes. After you have a HBase cluster up for some > number of hours, certainly ~24, background compaction will bring the HDFS > blocks backing region data local to the server, generally. MapReduce > tasks backed by HBase tables will see similar advantages of data locality > that you are probably accustomed to with working with files in HDFS. If > you mix storage and computation this way it makes sense to seek a balance > between the amount of data stored on each node (number of regions being > served) and the available computational resources (available CPU cores, > time constraints (if any) on task execution). > > Even if you don't intend to do the above, it's possible that an overly > high region density can negatively impact performance if too much I/O > load is placed on average on each region server. Adding more servers to > spread load would then likely help**. > > These considerations bias against hosting a very large number of regions > per region server. > > - Andy > > **: I say likely because this presumes query and edit patterns have been > guided as necessary through engineering to be widely distributed in the > key space. You have to take some care to avoid hot regions. > > > ----- Original Message ---- >> From: Kannan Muthukkaruppan <kan...@facebook.com> >> To: "hbase-dev@hadoop.apache.org" <hbase-dev@hadoop.apache.org> >> Sent: Tue, January 12, 2010 11:40:00 AM >> Subject: RE: commit semantics >> >> Btw, is there much gains in having a large number of regions-- i.e. to the >> tune >> of 500 -- per region server? >> >> I understand that having multiple regions per region server allows finer >> grained >> rebalancing when new nodes are added or a node goes down. But would say >> having a >> smaller number of regions per region server (say ~50) be really bad. If a >> region >> server goes down, 50 other nodes would pick up ~1/50 of its work. Not as >> good as >> 500 other nodes picking up 1/500 of its work each-- but seems acceptable >> still. >> Are there other advantages of having a large number of regions per region >> server? >> >> regards, >> Kannan >> -----Original Message----- >> From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel >> Cryans >> Sent: Tuesday, January 12, 2010 9:42 AM >> To: hbase-dev@hadoop.apache.org >> Subject: Re: commit semantics >> >> wrt 1 HLog per region server, this is from the Bigtable paper. Their >> main concern is the number of opened files since if you have 1000 >> region servers * 500 regions then you may have 100 000 HLogs to >> manage. Also you can have more than one file per HLog, so let's say >> you have on average 5 log files per HLog that's 500 000 files on HDFS. >> >> J-D >> >> On Tue, Jan 12, 2010 at 12:24 AM, Dhruba Borthakur wrote: >> > Hi Ryan, >> > >> > thanks for ur response. >> > >> >>Right now each regionserver has 1 log, so if 2 puts on different >> >>tables hit the same RS, they hit the same HLog. >> > >> > I understand. My point was that the application could insert the same >> > record >> > into two different tables on two different Hbase instances on two different >> > piece of hardware. >> > >> > On a related note, can somebody explain what the tradeoff is if each region >> > has its own hlog? are you worried about the number of files in HDFS? or >> > maybe the number of sync-threads in the region server? Can multiple hlog >> > files provide faster region splits? >> > >> > >> >> I've thought about this issue quite a bit, and I think the sync every >> >> 1 rows combined with optional no-sync and low time sync() is the way >> >> to go. If you want to discuss this more in person, maybe we can meet >> >> up for brews or something. >> >> >> > >> > The group-commit thing I can understand. HDFS does a very similar thing. >> > But >> > can you explain your alternative "sync every 1 rows combined with optional >> > no-sync and low time sync"? For those applications that have the natural >> > characteristics of updating only one row per logical operation, how can >> > they >> > be sure that their data has reached some-sort-of-stable-storage unless they >> > sync after every row update? >> > >> > thanks, >> > dhruba >> > > > > > > >