Hello Stack,
Thank you for your explainations, it's very helpful, Thank you.
If I get something new, I'll connect you.
Regards,
LvZheng
2010/3/24 Stack <[email protected]>
> On Tue, Mar 23, 2010 at 8:42 PM, Zheng Lv <[email protected]>
> wrote:
> > Hello Stack,
> > >So, for sure ugly stuff is going on. I filed
> > >https://issues.apache.org/jira/browse/HBASE-2365. It looks like we're
> > >doubly assigning a region.
> > Can you tell me how this happened in detail? Thanks a lot.
> >
>
> Yes.
>
> Splits are run by the regionserver. It figures a region needs to be
> split and goes ahead closing the parent and creating the daughter
> regions. It then adds edits to the meta table offlining the parent
> and inserting the two new daughter regions. Next it sends a message
> to the master telling it that a region has been split. The message
> contains names of the daughter regions. On receipt of the message,
> the master adds the new daughter regions to the unassigned regions
> list so they'll be passed out the next time a regionserver checks in.
>
> Concurrently, the master is running a scan of the meta table every
> minute making sure all is in order. One thing it does is if it finds
> unassigned regions, it'll add them to the unassigned regions (this
> process is what gets all regions assigned after a startup).
>
> In your case, whats happening is that there is a long period between
> the add of the new split regions to the meta table and the report of
> split to the master. During this time, the master meta scan ran,
> found one of the daughters and went and assigned it. Then the split
> message came in and the daughter was assigned again!
>
> There was supposed to be protection against this happening IIRC.
> Looking at responsible code, we are trying to defend against this
> happening in ServerManager:
>
> /*
> * Assign new daughter-of-a-split UNLESS its already been assigned.
> * It could have been assigned already in rare case where there was a
> large
> * gap between insertion of the daughter region into .META. by the
> * splitting regionserver and receipt of the split message in master (See
> * HBASE-1784).
> * @param hri Region to assign.
> */
> private void assignSplitDaughter(final HRegionInfo hri) {
> MetaRegion mr =
> this.master.regionManager.getFirstMetaRegionForRegion(hri);
> Get g = new Get(hri.getRegionName());
> g.addFamily(HConstants.CATALOG_FAMILY);
> try {
> HRegionInterface server =
> master.connection.getHRegionConnection(mr.getServer());
> Result r = server.get(mr.getRegionName(), g);
> // If size > 3 -- presume regioninfo, startcode and server -- then
> presume
> // that this daughter already assigned and return.
> if (r.size() >= 3) return;
> } catch (IOException e) {
> LOG.warn("Failed get on " + HConstants.CATALOG_FAMILY_STR +
> "; possible double-assignment?", e);
> }
> this.master.regionManager.setUnassigned(hri, false);
> }
>
> So, the above is not working in your case for some reason. I'll take
> a look but I'm not sure I can figure it w/o DEBUG (thanks for letting
> me know about the out-of-sync clocks... Now I can have more faith in
> what the logs are telling me).
>
> >
> > >With DEBUG enabled have you been able to reproduce?
> > These days the exception did not appera again, if it would, I'll show
> you
> > the logs.
> >
>
> For sure, if you come across it again, I'm interested.
>
> Thanks Zheng,
> St.Ack
>