just want to add some feel comments. We've tried out Hypertable 0.9.0.* and 0.9.2.*, and Hbase 0.18.*, 0.19.*.
Seems that Hypertable's usability and robustness has not improved as much as that of Hbase recently. -SimonZ On Mar 26, 2:01 am, Schubert Zhang <[email protected]> wrote: > Rrgards this duplicated assignment issue. > In my consideration, both the interim fix and persistence fix may be > not robust. > > Following MSC chart is my proposal. > I am not familar with the latest Hypertable code (I had studied > 0.9.0.7), if I am wrong, please point me. > > char1: success assignment case, we should design a acknowledge > mechanism. > origRS ----------------------------- > Master-----------------------------------RS1 > --------split range notify-------> > select a RS > ------------ > assign to RS1--------> > <--------succ ack--------------- <-----------succ > ack--------------- > > chart2: failuer/timeout assignment case > origRS ----------------------------- > Master-----------------------------------RS1-------------------RS2 > --------split range notify-------> > select a RS > ------------ > assign to RS1--------> > timeout or failed > -------retry 2 > times assign ------> > still timeout or failed > select another RS > ------------ > deassign--------> > > ----------------------- assign to another RS2---------> > still timeout or failed > <--------report failure----------- > ...................(another round)................... > > 3. a mechanism to avoid duplicated or wrong assigment > origRS ----------------------------- > Master-----------------------------------RS1 > <----------- > succ ack--------------- > check, but find > the range is in RS2 > ------------ > deassign-------------> > <----------- > succ ack--------------- > > On Mar 21, 1:41 am, Doug Judd <[email protected]> wrote: > > > P.S. The memory exhaustion problem will be fixed in the 0.9.2.4 release. > > > On Fri, Mar 20, 2009 at 10:37 AM, Doug Judd <[email protected]> wrote: > > > With the help of Earle Ady, we've found and fixed the large load > > > corruption > > > problem with the 0.9.2.2 release. To get the fixed version, please pull > > > the > > > latest code from the git > > > repository<http://code.google.com/p/hypertable/wiki/SourceCode?tm=4>. > > > We'll be releasing 0.9.2.3 soon. > > > > Here's a summary of the problem: > > > > With the fix of issue > > > 246<http://code.google.com/p/hypertable/issues/detail?id=246>, > > > compactions are now happening regularly as they should. However, this has > > > added substantial load on the system. When a range split and the master > > > was > > > notified of the newly split-off range, the master selected (round-robin) a > > > new RangeServer to own the range. However, due to the increased load on > > > the > > > system and a 30 second hardcoded timeout in the Master, the > > > RangeServer::load_range() command was timing out (It was taking 32 to 37 > > > seconds). This timeout was reported back to the originating RangeServer, > > > which paused a fifteen seconds and tried it again. But on the second > > > attempt to notify the Master of the newly split-off range, the Master > > > would > > > (round-robin) select another RangeServer and invoke > > > RangeServer::load_range() on that (different) server. This had the effect > > > of the same range being loaded by three different RangeServers which was > > > wreaking havoc with the system. There were two fixes for this problem: > > > > 1. The hardcoded timeout was removed and (almost) all timeouts in the > > > system are based on the "Hypertable.Request.Timeout" property which now > > > has > > > a default value of 180 seconds. > > > > 2. An interim fix was put in place in the Master where upon > > > RangeServer::load_range() failure, the Master will remember what > > > RangeServer > > > it attmpted to do the load on. The next time it gets notified and > > > attempts > > > to load the same range, it will choose the same RangeServer. If it gets > > > an > > > error message back, RANGE_ALREADY_LOADED, it will interpret that as > > > success. The reason this fix is interim is because it does not persist > > > the > > > Range-to-RangeServer mapping information, so if it were to fail at an > > > inopportune time and come back up, we'd be subject to the same failure. > > > This will get fixed with Issue 74 - Master directed > > > RangeServer<http://code.google.com/p/hypertable/issues/detail?id=79>recovery > > > since the Master will have a meta-log and will be able to persist > > > this mapping as re-constructible state information. > > > > After we fixed this problem, the next problem that Earle ran into was that > > > the RangeServer was exhausting memory and crashing. To fix this, we added > > > the following property to the hypertable.cfg file on the machine that was > > > doing the LOAD DATA INFILE: > > > > Hypertable.Lib.Mutator.FlushDelay=100 > > > > Keep this in mind if you encounter the same problem. > > > > - Doug --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
