[hypertable-dev] Re: Problem with Hypertable 0.9.2.2 found and fixed

SimonZ Wed, 25 Mar 2009 21:59:32 -0700


just want to add some feel comments. We've tried out Hypertable
0.9.0.* and 0.9.2.*, and Hbase 0.18.*, 0.19.*.


Seems that Hypertable's usability and robustness has not improved as
much as that of Hbase recently.

-SimonZ

On Mar 26, 2:01 am, Schubert Zhang <[email protected]> wrote:
> Rrgards this duplicated assignment issue.
> In my consideration, both the interim fix and persistence fix may be
> not robust.
>
> Following MSC chart is my proposal.
> I am not familar with the latest Hypertable code (I had studied
> 0.9.0.7), if I am wrong, please point me.
>
> char1: success assignment case, we should design a acknowledge
> mechanism.
> origRS -----------------------------
> Master-----------------------------------RS1
>       --------split range notify------->
>                                                 select a RS
>                                                         ------------
> assign to RS1-------->
>       <--------succ ack---------------     <-----------succ
> ack---------------
>
> chart2: failuer/timeout assignment case
> origRS -----------------------------
> Master-----------------------------------RS1-------------------RS2
>       --------split range notify------->
>                                                 select a RS
>                                                         ------------
> assign to RS1-------->
>                                            timeout or failed
>                                                         -------retry 2
> times assign ------>
>                                            still timeout or failed
>                                            select another RS
>                                                         ------------
> deassign-------->
>
> ----------------------- assign to another RS2--------->
>                                             still timeout or failed
>     <--------report failure-----------
> ...................(another round)...................
>
> 3. a mechanism to avoid duplicated or wrong assigment
> origRS -----------------------------
> Master-----------------------------------RS1
>                                                        <-----------
> succ ack---------------
>                                              check, but find
>                                              the range is in RS2
>                                                         ------------
> deassign------------->
>                                                         <-----------
> succ ack---------------
>
> On Mar 21, 1:41 am, Doug Judd <[email protected]> wrote:
>
> > P.S. The memory exhaustion problem will be fixed in the 0.9.2.4 release.
>
> > On Fri, Mar 20, 2009 at 10:37 AM, Doug Judd <[email protected]> wrote:
> > > With the help of Earle Ady, we've found and fixed the large load 
> > > corruption
> > > problem with the 0.9.2.2 release.  To get the fixed version, please pull 
> > > the
> > > latest code from the git 
> > > repository<http://code.google.com/p/hypertable/wiki/SourceCode?tm=4>.
> > > We'll be releasing 0.9.2.3 soon.
>
> > > Here's a summary of the problem:
>
> > > With the fix of issue 
> > > 246<http://code.google.com/p/hypertable/issues/detail?id=246>,
> > > compactions are now happening regularly as they should.  However, this has
> > > added substantial load on the system.  When a range split and the master 
> > > was
> > > notified of the newly split-off range, the master selected (round-robin) a
> > > new RangeServer to own the range.  However, due to the increased load on 
> > > the
> > > system and a 30 second hardcoded timeout in the Master, the
> > > RangeServer::load_range() command was timing out (It was taking 32 to 37
> > > seconds).  This timeout was reported back to the originating RangeServer,
> > > which paused a fifteen seconds and tried it again.  But on the second
> > > attempt to notify the Master of the newly split-off range, the Master 
> > > would
> > > (round-robin) select another RangeServer and invoke
> > > RangeServer::load_range() on that (different) server.  This had the effect
> > > of the same range being loaded by three different RangeServers which was
> > > wreaking havoc with the system.  There were two fixes for this problem:
>
> > > 1. The hardcoded timeout was removed and (almost) all timeouts in the
> > > system are based on the "Hypertable.Request.Timeout" property which now 
> > > has
> > > a default value of 180 seconds.
>
> > > 2. An interim fix was put in place in the Master where upon
> > > RangeServer::load_range() failure, the Master will remember what 
> > > RangeServer
> > > it attmpted to do the load on.  The next time it gets notified and 
> > > attempts
> > > to load the same range, it will choose the same RangeServer.  If it gets 
> > > an
> > > error message back, RANGE_ALREADY_LOADED, it will interpret that as
> > > success.  The reason this fix is interim is because it does not persist 
> > > the
> > > Range-to-RangeServer mapping information, so if it were to fail at an
> > > inopportune time and come back up, we'd be subject to the same failure.
> > > This will get fixed with Issue 74 - Master directed 
> > > RangeServer<http://code.google.com/p/hypertable/issues/detail?id=79>recovery
> > >  since the Master will have a meta-log and will be able to persist
> > > this mapping as re-constructible state information.
>
> > > After we fixed this problem, the next problem that Earle ran into was that
> > > the RangeServer was exhausting memory and crashing.  To fix this, we added
> > > the following property to the hypertable.cfg file on the machine that was
> > > doing the LOAD DATA INFILE:
>
> > > Hypertable.Lib.Mutator.FlushDelay=100
>
> > > Keep this in mind if you encounter the same problem.
>
> > > - Doug
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[hypertable-dev] Re: Problem with Hypertable 0.9.2.2 found and fixed

Reply via email to