On Tue, Jan 12, 2010 at 12:24 AM, Dhruba Borthakur <dhr...@gmail.com> wrote: > Hi Ryan, > > thanks for ur response. > >>Right now each regionserver has 1 log, so if 2 puts on different >>tables hit the same RS, they hit the same HLog. > > I understand. My point was that the application could insert the same record > into two different tables on two different Hbase instances on two different > piece of hardware.
Ah yes, of course, I thought you meant 2 tables in the same cluster. > > On a related note, can somebody explain what the tradeoff is if each region > has its own hlog? are you worried about the number of files in HDFS? or > maybe the number of sync-threads in the region server? Can multiple hlog > files provide faster region splits? So each hlog needs to be treated as a stream of edits for log recovery. So adding more logs, requires the code to still treat the pool as 1 log and keep an overall ordering across all logs as a merged set. It just adds complexity, and I'd like to put it off as long as possible. Initially when I was worried about performance issues, adding a pool only extended the performance by a linear amount, and I was looking for substantially more than that. > > >> I've thought about this issue quite a bit, and I think the sync every >> 1 rows combined with optional no-sync and low time sync() is the way >> to go. If you want to discuss this more in person, maybe we can meet >> up for brews or something. >> > > The group-commit thing I can understand. HDFS does a very similar thing. But > can you explain your alternative "sync every 1 rows combined with optional > no-sync and low time sync"? For those applications that have the natural > characteristics of updating only one row per logical operation, how can they > be sure that their data has reached some-sort-of-stable-storage unless they > sync after every row update? Normally this would be the case, but consider the case of the call 'incrementColumnValue' which maintains a counter essentially. Losing some edits means losing counter values - if we we are talking about a counter that is incremented 100m times a day, then speed is more important than potentially losing some extremely small number of updates when a server crashes. -ryan > > thanks, > dhruba >