On Tue, Jan 12, 2010 at 1:07 PM, Kannan Muthukkaruppan <kan...@facebook.com>wrote:
> > Seems like we all generally agree that large number of regions per region > server may not be the way to go. > > What Andrew says. You could make regions bigger so more data per regionserver but same rough (small) number to redeploy on crash but the logs to replay will be correspondingly bigger taking longer to process > So coming back to Dhruba's question on having one commit log per region > instead of one commit log per region server. Is the number of HDFS files > open still a major concern? > Yes. From "Commit-log implementation" section of the BT paper: "If we kept the commit log for each tablet in a separate log file, a very large number of files would be written concurrently in GFS. Depending on the underlying file system implementation on each GFS server, these writes could cause a large number of disk seeks to write to the different physical log files. In addition, having separate log files per tablet also reduces the effectiveness of the group commit optimization, since groups would tend to be smaller. To fix these issues, we append mutations to a single commit log per tablet server, co-mingling mutations for different tablets in the same physical log file." Not knowing any better, we presume hdfs is kinda-like gfs. > > Is my understanding correct that unavailability window during region server > failover is large due to the time it takes to split the shared commit log > into a per region log? Yes, though truth be told, this area of hbase performance has had very little attention paid to it. There are things that we could do much better -- e.g. distributed split instead of threaded split inside in a single procss -- and ideas for making it so we can take on writes much sooner than we currently do; e.g. open regions immediately on new server before split completes. > Instead, if we always had per-region commit logs even in the normal mode of > operation, then the unavailability window would be minimized? It does > minimize the extent of batch/group commits you can do though-- since you can > only batch updates going to the same region. Any other gotchas/issues? Just those listed above. St.Ack