Hi:Stack Can you take some time to take a look at it. On Fri, May 20, 2011 at 1:04 PM, Anty <[email protected]> wrote:
> Hi:All > after reading source code of HLog, i'm wandering wheather it's a bug. > for example, only one region is active. max log size is a fraction of > region size. > flush begins, region A acquire a sequecne number,say, N. > insert operation can continue while we flush the cache. > flush opeartion complete, delete region A's entry in > lastSeqWritten(Map of regions to most recet sequence/edit id in their > memstore) > when flush compelte, current sequence number maybe N+5, five log > messages added to the log for region A during the flush operation . > region A going on to accept update, insert a new entry into > lastSeqWritten for region A, but in current HLog implementation the value > is N+6 . > But i tink the value corresponding to Region A in lastSeqWritten > should be N,not N+6. > N+6 means all edits whose sequence number smaller than N+6 in Region > A is already persisent on disk, but it's not the fact. > edits N+1,N+2,N+3,N+4,N+5, the new five edit are maybe in memstore of > Region A. > So, the value should be N, the sequence number when flush begins and > flush completes. > the above procedure leave a change of data loss. > though in current implementation the chance of data loss is rare. > So,i think it's a bug. > the fix is easy, when flush complete, just set the value for Region A > in lastSeqWritten to N instead of removing the entry . > if you want a data loss scenario, i can you give you one. > > if i miss something , Pls let me known. > > -- > Best Regards > Anty Rao > -- Best Regards Anty Rao
