Comments inline below:
> -----Original Message-----
> From: Cosmin Lehene [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 09, 2008 7:25 AM
> To: [email protected]
> Subject: Re: Hbase corrupts data after reporting MSG_REPORT_CLOSE to master
> during compaction and split process
>
> Hi,
>
> I managed to reproduce the corruption and also have full debug logs, but
> first I'll explain the whys and hows of the bug and also how I think it can
> be fixed.
> ( I'm going to send my takeaways on how we managed to insert 300GB in less
> then 6 hours on a 5 node cluster and also some advice/issues in another
> mail.)
>
> Next assumptions are based on understanding the actual code (don't worry if I
> didn't get them all right, please read the entire mail).
>
> - The master _assigns_ a region to a server by sending a MSG_REGION_OPEN
> - On heartbeat region servers report the current load and a list of MLR -
> most loaded regions (in fact just a list of first N online regions).
> - Upon opening a newly assigned region, a region server will try to compact
> and split that region.
> - The region is NOT marked offline when compaction starts
> - The region is marked OFFLINE:true, SPLIT:true during a SPLIT
>
> Our scenario goes this way:
>
> Master (M) assigns region A to region server R1
> R1 starts compaction and split of A
> R1 on heart beat sends it's load and a list of MLR that contains A

This list should only be a list of open regions and should not include any 
regions in the process of being opened. In addition, the region server should 
attach a number of MSG_REPORT_PROCESS_OPEN to the heartbeat (one for each 
region being opened). This should prevent the master from reassigning those 
regions.

> M decides to reassign the extra regions and sends a MSG_CLOSE_REGION A to R1
> R1 finishes the compaction and splits A into A1 and A2 (A1 has the same start
> key as A)

If, in fact, the region server is including regions that are not completely 
open in the load list, this is a bug.

> M assigns A a to R2
> R2 starts compaction and split of A
> R2 finishes the compaction and splits A into A_clone_1 and A_clone_2
> (A_clone_1 has the same start key as A and IMPORTANT the same start key as
> A1)

Whenever two region servers start working on the same reason, chaos ensues. It 
is rare that corruption *will not* happen in this case.

> Now we get A1 and A_clone_1 almost identical starting with the same key.
> Cluster is corrupted. We should care less what happens next. But the ideea is
> that they are both in .META.
>
> I figured several places where this could be avoided and I'm going to state a
> few disjoint questions. Both Master and Region could be held responsible in
> my opinion but I guess it's a matter of architectural philosophy. Please note
> that any of these question would be a starting point for the fix.
>
> - Why when getting a MSG_CLOSE_REGION A, the region server doesn't abort the
> current compact split operation to leaving A in the original state and close
> it immediately?

MSG_CLOSE_REGION is sent for various different purposes. Maybe, if the master 
has timed out the region server, it should send something like MSG_ABORT_OPEN.

> - Why doesn't a region server DELETE a region after a SPLIT?( I guess it
> could be offline by then and it's not himself to decide that, but still..)

The reason splits are fast is because the two children use the parent until 
they do a compaction. Thus the parent region must remain around until both 
children are no longer using the parent region. The master then garbage 
collects the parent.

> - Why when assigning a region to a new region server the master doesn't check
> the region status? It might be splitting or already split. I guess this would
> need a new state.

The master does check to see if a region is split or offline and will not 
assign it. This information is only available after the split is complete.

> - Why when opening/compacting/splitting a region server doesn't check if the
> region is OFFLINE:true or SPLIT:true?

A region server should never receive an open message for a split or offline 
region. When the region server is told to open a region, it assumes it has 
exclusive rights to all the files of the region.

> I have the logs available, however they are pretty large and I might need to
> clean them a little, but I could make them available if that's really needed.
> However I think the scenario and questions might be enough for a bug and a
> fix.
>
> Thanks,
> Cosmin

Reply via email to