The log shows that the first time the region was transitioned to SPLITTING
even then it was not populated in the Master's memory.

On Tue, Nov 6, 2012 at 11:29 AM, ramkrishna vasudevan <
[email protected]> wrote:

> Could you attach the master logs at this time
> 2012-11-05 00:24:55?
>
> Regards
> Ram
>
> On Tue, Nov 6, 2012 at 11:15 AM, lars hofhansl <[email protected]>wrote:
>
>> Took a brief look through all SPLIT related commits since 0.94.0... Found
>> these:
>>
>> HBASE-6854 *
>> HBASE-6713
>> HBASE-6329 *
>>
>> HBASE-6088
>>
>> HBASE-5986
>> HBASE-6070 *
>>
>>
>> The ones marked with * are (IMHO) more likely to be related.
>>
>> -- Lars
>>
>> ________________________________
>> From: Matt Corgan <[email protected]>
>> To: dev <[email protected]>; lars hofhansl <[email protected]>
>> Sent: Monday, November 5, 2012 9:28 PM
>> Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
>>
>> Yeah - we were running .94.0 since it came out but never saw it there.
>> I'll keep trying to narrow it down.  The only harm it's causing is log
>> spam and failing to move daughters to a new regionserver, which are
>> definitely problems, but it's not bringing down the cluster.
>>
>>
>> On Mon, Nov 5, 2012 at 9:17 PM, lars hofhansl <[email protected]>
>> wrote:
>>
>> > So it seems you can repeat this to some extend in 0.94.2, but you have
>> > never seen this before?
>> >
>> >
>> > -- Lars
>> >
>> >
>> >
>> > ________________________________
>> >  From: Matt Corgan <[email protected]>
>> > To: dev <[email protected]>
>> > Sent: Monday, November 5, 2012 9:10 PM
>> > Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
>> >
>> > It happened in this new table that I have all the logs for.  The region
>> in
>> > question this time is 6839663e4f8f79be3d7469784c21cbc2, and the first
>> trace
>> > of this region is on the regionserver with the "Intantiated
>> tableName..."
>> > message
>> >
>> > 2012-11-05 22:24:21,162 DEBUG
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > Instantiated
>> >
>> >
>> StatAreaModelLink,\x00\x00\x07\xD9\x00\x00\x00\x0C\x00\x00\x00\x004H\xC4\xB5\x00\x00\x00\x02\x00\x00\x00\x05\x00\x00\x00\x00G.l\x9B,1352172257535.6839663e4f8f79be3d74
>> > 9784c21cbc2.
>> >
>> > I also know this region came from a recent split, but I can't find any
>> log
>> > messages show the parent finishing the split that created this daughter
>> > region.  So my guess now is that the split is actually finishing and
>> > letting clients continue to write data, but something is failing to
>> print
>> > the log line and correctly tell the master about the new region.
>> >
>> > I've noticed that these regions are showing up on the clients in calls
>> to
>> > HTable.getRegionLocations(), so the clients continue to function, but
>> if I
>> > call HBaseAdmin.move() i get an UnknownRegionException.
>> >
>> >
>> > On Mon, Nov 5, 2012 at 7:07 PM, Ted Yu <[email protected]> wrote:
>> >
>> > > I think what Matt encountered was this:
>> > > https://issues.apache.org/jira/browse/HBASE-7101
>> > >
>> > > On Mon, Nov 5, 2012 at 5:33 PM, Matt Corgan <[email protected]>
>> wrote:
>> > >
>> > > > Beats me - i can't find any record of it before that.
>> > > >
>> > > > I'm importing data into another table now.  I disabled/enabled the
>> > table
>> > > > first to make sure we have the original 4 region locations logged
>> > > > everywhere.  Will report back...
>> > > >
>> > > >
>> > > > On Mon, Nov 5, 2012 at 2:15 PM, Jean-Daniel Cryans <
>> > [email protected]
>> > > > >wrote:
>> > > >
>> > > > > Wow, and can you figure how this happened?
>> > > > >
>> > > > > 2012-11-05 00:24:54,538 DEBUG
>> > > > > org.apache.hadoop.hbase.regionserver.HRegion: Instantiated
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ActiveListingRecord16,\x86\x07\xDC\x03\x17RealtyTrac\x0044737383,1352093084141.22f8fa73d8af837410ca270f344f6bb8.
>> > > > >
>> > > > > Split? Else how did the master told that RS to open a region that
>> > > doesn't
>> > > > > exist?
>> > > > >
>> > > > > On Mon, Nov 5, 2012 at 2:02 PM, Matt Corgan <[email protected]>
>> > > wrote:
>> > > > > > Here's one from last night with the master logs at the bottom.
>> > > > > > http://pastebin.com/cSdMbA2a
>> > > > > >
>> > > > > > I don't see that region in the master logs for 4 days back.  I'm
>> > > going
>> > > > to
>> > > > > > import some new data from scratch soon and will be sure to keep
>> all
>> > > the
>> > > > > > master logs.
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Nov 5, 2012 at 9:52 AM, Jean-Daniel Cryans <
>> > > > [email protected]
>> > > > > >wrote:
>> > > > > >
>> > > > > >> This reminds me a lot of
>> > > > > https://issues.apache.org/jira/browse/HBASE-4792
>> > > > > >>
>> > > > > >> What I'd like to see is the log from the first time the master
>> > > > > >> receives the split message. I guess it says the region doesn't
>> > exist
>> > > > > >> anymore because the split was processed already in the master
>> but
>> > > > > >> there's a failure mode similar to 4792.
>> > > > > >>
>> > > > > >> I saw this on another cluster last week but the logs were
>> rolling
>> > > > > >> based on size so the original split message was lost.
>> > > > > >>
>> > > > > >> J-D
>> > > > > >>
>> > > > > >> On Sun, Nov 4, 2012 at 5:16 PM, Matt Corgan <
>> [email protected]>
>> > > > > wrote:
>> > > > > >> > I've had a few successful splits today without any errors.
>> I'll
>> > > > turn
>> > > > > up
>> > > > > >> > the importer speed tomorrow to start stressing the cluster
>> more.
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > On Sun, Nov 4, 2012 at 2:07 PM, Ted Yu <[email protected]>
>> > > wrote:
>> > > > > >> >
>> > > > > >> >> Matt:
>> > > > > >> >> Was there any region server crash before you noticed the
>> > repeated
>> > > > log
>> > > > > >> for
>> > > > > >> >> bc62a8a72124a4ba3f6b9f302587903c ?
>> > > > > >> >>
>> > > > > >> >> I wonder if the following JIRA might be related:
>> > > > > >> >> HBASE-7083 SSH#fixupDaughter should force re-assign missing
>> > > > daughter
>> > > > > >> >>
>> > > > > >> >> Thanks
>> > > > > >> >>
>> > > > > >> >> On Sat, Nov 3, 2012 at 7:27 PM, Matt Corgan <
>> > [email protected]
>> > > >
>> > > > > >> wrote:
>> > > > > >> >>
>> > > > > >> >> > Strangely, I don't see any record of that region in the
>> > master
>> > > > > before
>> > > > > >> >> what
>> > > > > >> >> > I already pasted even though I have logs back to 10/30.
>> Next
>> > > > time
>> > > > > it
>> > > > > >> >> > happens I'll gather a full log record and try to debug
>> while
>> > > it's
>> > > > > >> >> > occurring.
>> > > > > >> >> >
>> > > > > >> >> >
>> > > > > >> >> > On Sat, Nov 3, 2012 at 7:10 PM, rajesh babu chintaguntla <
>> > > > > >> >> > [email protected]> wrote:
>> > > > > >> >> >
>> > > > > >> >> > > Hi Matt,
>> > > > > >> >> > > can you paste some more master logs of region
>> > > > > >> >> > > bc62a8a72124a4ba3f6b9f30258790 before split.
>> > > > > >> >> > > I think Its not problem with splitting.
>> > > > > >> >> > > We are getting
>> > > > > >> >> > >       LOG.warn("Region " + encodedName + " not found on
>> > > server
>> > > > "
>> > > > > +
>> > > > > >> >> > > serverName +
>> > > > > >> >> > >         "; failed processing");
>> > > > > >> >> > > this log means no entry in servers map(not fully
>> assigned).
>> > > > > >> >> > >     Set<HRegionInfo> hris = this.servers.get(sn);
>> > > > > >> >> > >     HRegionInfo foundHri = null;
>> > > > > >> >> > >     for (HRegionInfo hri: hris) {
>> > > > > >> >> > >       if (hri.getEncodedName().equals(encodedName)) {
>> > > > > >> >> > >         foundHri = hri;
>> > > > > >> >> > >         break;
>> > > > > >> >> > >       }
>> > > > > >> >> > >     }
>> > > > > >> >> > >     return foundHri;
>> > > > > >> >> > >
>> > > > > >> >> > >
>> > > > > >> >> > >
>> > > > > >> >> > >
>> > > > > >> >> > > On Sun, Nov 4, 2012 at 6:07 AM, lars hofhansl <
>> > > > > [email protected]>
>> > > > > >> >> > wrote:
>> > > > > >> >> > >
>> > > > > >> >> > > > CC'ing dev list...
>> > > > > >> >> > > >
>> > > > > >> >> > > > Is anybody aware of any changes that went in recently
>> > that
>> > > > > could
>> > > > > >> >> cause
>> > > > > >> >> > > > this?
>> > > > > >> >> > > > I looked around a bit, but could not find anything
>> > obvious.
>> > > > > >> >> > > >
>> > > > > >> >> > > > -- Lars
>> > > > > >> >> > > >
>> > > > > >> >> > > >
>> > > > > >> >> > > >
>> > > > > >> >> > > > ________________________________
>> > > > > >> >> > > >  From: Matt Corgan <[email protected]>
>> > > > > >> >> > > > To: user <[email protected]>
>> > > > > >> >> > > > Sent: Saturday, November 3, 2012 5:27 PM
>> > > > > >> >> > > > Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on
>> .94.2
>> > > > > >> >> > > >
>> > > > > >> >> > > > I think the cluster is ok without running hbck, as
>> > > restarting
>> > > > > the
>> > > > > >> >> > > > regionserver process stops the warnings and everything
>> > > looks
>> > > > ok
>> > > > > >> >> > > otherwise.
>> > > > > >> >> > > >
>> > > > > >> >> > > > here's the regionserver right after the split happens:
>> > > > > >> >> > > > ------------------------
>> > > > > >> >> > > > 2012-11-01 22:45:28,726 DEBUG
>> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > > >> >> > > > regionserver:60020-0x13ab46479832953 Attempting to
>> > > transition
>> > > > > node
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c from *RS_ZK_R*
>> > > > > >> >> > > > *EGION_SPLITTING to RS_ZK_REGION_SPLIT*
>> > > > > >> >> > > > 2012-11-01 22:45:28,730 DEBUG
>> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > > >> >> > > > regionserver:60020-0x13ab46479832953 Successfully
>> > > > transitioned
>> > > > > >> node
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_
>> > > > > >> >> > > > REGION_SPLITTING to RS_ZK_REGION_SPLIT
>> > > > > >> >> > > > 2012-11-01 22:45:28,730 DEBUG
>> > > > > >> >> > > > org.apache.hadoop.hbase.regionserver.SplitTransaction:
>> > > Still
>> > > > > >> waiting
>> > > > > >> >> on
>> > > > > >> >> > > the
>> > > > > >> >> > > > master to process the split for
>> > > > > bc62a8a72124a4ba3f6b9f302587903c
>> > > > > >> >> > > > 2012-11-01 22:45:28,832 DEBUG
>> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > > >> >> > > > regionserver:60020-0x13ab46479832953 Attempting to
>> > > transition
>> > > > > node
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_R
>> > > > > >> >> > > > EGION_SPLIT to RS_ZK_REGION_SPLIT
>> > > > > >> >> > > > 2012-11-01 22:45:28,837 DEBUG
>> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > > >> >> > > > regionserver:60020-0x13ab46479832953 Successfully
>> > > > transitioned
>> > > > > >> node
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c from RS_ZK_
>> > > > > >> >> > > > REGION_SPLIT to RS_ZK_REGION_SPLIT
>> > > > > >> >> > > > -----------------------------
>> > > > > >> >> > > >
>> > > > > >> >> > > > The "transitioned node from RS_ZK_REGION_SPLIT to
>> > > > > >> RS_ZK_REGION_SPLIT"
>> > > > > >> >> > > > continues for 15 or so hours and finally settles
>> without
>> > > > manual
>> > > > > >> >> > > > intervention with these regionserver log messages:
>> > > > > >> >> > > > -----------------------
>> > > > > >> >> > > > 2012-11-02 13:55:00,906 DEBUG
>> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > > >> >> > > > regionserver:60020-0x13ab46479832953 Attempting to
>> > > transition
>> > > > > >> node *
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c* from
>> RS_ZK_REGION_SPLIT
>> > > to
>> > > > > >> >> > > > RS_ZK_REGION_SPLIT
>> > > > > >> >> > > >
>> > > > > >> >> > > > 2012-11-02 13:55:00,916 DEBUG
>> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > > >> >> > > > regionserver:60020-0x13ab46479832953 Successfully
>> > > > transitioned
>> > > > > >> node *
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c* from
>> RS_ZK_REGION_SPLIT
>> > > to
>> > > > > >> >> > > > RS_ZK_REGION_SPLIT
>> > > > > >> >> > > >
>> > > > > >> >> > > > 2012-11-02 13:55:00,916 INFO
>> > > > > >> >> > > > org.apache.hadoop.hbase.regionserver.SplitRequest:
>> Region
>> > > > > split,
>> > > > > >> META
>> > > > > >> >> > > > updated, and report to master.
>> > > > > >> >> > > >
>> > > > > >> >> > > >
>> > > > > >> >> > >
>> > > > > >> >> >
>> > > > > >> >>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> Parent=ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351816858693.
>> > > > > >> >> > > > *bc62a8a72124a4ba3f6b9f302587903c*., new regions:
>> > > > > >> >> > > >
>> > > > > >> >> > > >
>> > > > > >> >> > >
>> > > > > >> >> >
>> > > > > >> >>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351824327023.22c3fa48d17aa7312ca53566c680f0fd.,
>> > > > > >> >> > > >
>> > > > > >> >> > > >
>> > > > > >> >> > >
>> > > > > >> >> >
>> > > > > >> >>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> ActiveListingRecord16,\x83\x07\xDC\x07\x11WebsiteIDX\x009024215,1351824327023.b0e0a488c711e5c7f74ee6198a9755a2..
>> > > > > >> >> > > > Split took 15hrs, 9mins, 33sec
>> > > > > >> >> > > >
>> > > > > >> >> > > > 2012-11-02 13:55:00,945 INFO
>> > > > > >> >> > > > org.apache.hadoop.hbase.regionserver.SplitRequest:
>> Region
>> > > > > split,
>> > > > > >> META
>> > > > > >> >> > > > updated, and report to master.
>> > > > > >> >> > > >
>> > > > > >> >> > > >
>> > > > > >> >> > >
>> > > > > >> >> >
>> > > > > >> >>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> Parent=ActiveListingRecord16,\x83\x07\xDC\x06\x0EThreeWide\x00SWMRIC-11001540,1351790329631.f720436a6f8fd412d76fe3255f24e3b3.,
>> > > > > >> >> > > > new regions:
>> > > > > >> >> > > >
>> > > > > >> >> > > >
>> > > > > >> >> > >
>> > > > > >> >> >
>> > > > > >> >>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> ActiveListingRecord16,\x83\x07\xDC\x06\x0EThreeWide\x00SWMRIC-11001540,1351816858693.2880cf893175d2a852947c63ee8554a3.,
>> > > > > >> >> > > >
>> > > > > >> >> >
>> > > > > >>
>> > > >
>> > ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351816858693.*
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c*.. Split took 17hrs,
>> > > 14mins,
>> > > > > 2sec
>> > > > > >> >> > > > ----------------------
>> > > > > >> >> > > >
>> > > > > >> >> > > > And the master finally logs this:
>> > > > > >> >> > > > -----------------------
>> > > > > >> >> > > > 2012-11-02 13:55:00,783 WARN
>> > > > > >> >> > > > org.apache.hadoop.hbase.master.AssignmentManager:
>> Region
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c not found on server
>> > > > > >> >> > > > HadoopNode162.hotpads.srv,60020,1351788248279; failed
>> > > > > processing
>> > > > > >> >> > > >
>> > > > > >> >> > > > 2012-11-02 13:55:00,783 WARN
>> > > > > >> >> > > > org.apache.hadoop.hbase.master.AssignmentManager:
>> > Received
>> > > > > SPLIT
>> > > > > >> for
>> > > > > >> >> > > region
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c from server
>> > > > > >> >> > > > HadoopNode162.hotpads.srv,60020,1351788248279 but it
>> > > doesn't
>> > > > > exist
>> > > > > >> >> > > anymore,
>> > > > > >> >> > > > probably already processed its split
>> > > > > >> >> > > > ------------------------
>> > > > > >> >> > > >
>> > > > > >> >> > > > I can't find any evidence that the region left the
>> node
>> > > > during
>> > > > > >> that
>> > > > > >> >> > time.
>> > > > > >> >> > > > I'll have to catch it in action next time and see what
>> > the
>> > > > > region
>> > > > > >> is
>> > > > > >> >> up
>> > > > > >> >> > > to
>> > > > > >> >> > > > during the problem period.
>> > > > > >> >> > > >
>> > > > > >> >> > > > What does it mean that it successfully transitioned
>> from
>> > > > SPLIT
>> > > > > to
>> > > > > >> >> > SPLIT?
>> > > > > >> >> > > > Is that a valid transition?
>> > > > > >> >> > > >
>> > > > > >> >> > > > Matt
>> > > > > >> >> > > >
>> > > > > >> >> > > >
>> > > > > >> >> > > >
>> > > > > >> >> > > > On Sat, Nov 3, 2012 at 2:55 PM, Ted Yu <
>> > > [email protected]>
>> > > > > >> wrote:
>> > > > > >> >> > > >
>> > > > > >> >> > > > > Matt:
>> > > > > >> >> > > > > From the following we can see that region
>> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c
>> > > > > >> >> > > > > cannot be found:
>> > > > > >> >> > > > >
>> > > > > >> >> > > > > 2012-11-02 00:00:02,909 DEBUG
>> > > > > >> >> > > > > org.apache.hadoop.hbase.master.AssignmentManager:
>> > > Handling
>> > > > > >> >> > > > > transition=RS_ZK_REGION_SPLIT,
>> > > > > >> >> > > > >
>> server=HadoopNode162.hotpads.srv,60020,1351788248279,
>> > > > > >> >> > > > > region=bc62a8a72124a4ba3f6b9f302587903c
>> > > > > >> >> > > > > 2012-11-02 00:00:02,909 WARN
>> > > > > >> >> > > > > org.apache.hadoop.hbase.master.AssignmentManager:
>> > Region
>> > > > > >> >> > > > > bc62a8a72124a4ba3f6b9f302587903c *not found on
>> server
>> > > > > >> >> > > > >
>> HadoopNode162.hotpads*.srv,60020,1351788248279;failed
>> > > > > >> >> > > > > processing
>> > > > > >> >> > > > > 2012-11-02 00:00:02,909 WARN
>> > > > > >> >> > > > > org.apache.hadoop.hbase.master.AssignmentManager:
>> > > Received
>> > > > > SPLIT
>> > > > > >> >> for
>> > > > > >> >> > > > region
>> > > > > >> >> > > > > bc62a8a72124a4ba3f6b9f302587903c from server
>> > > > > >> >> > > > > HadoopNode162.hotpads.srv,60020,1351788248279 but it
>> > > > doesn't
>> > > > > >> exist
>> > > > > >> >> > > > anymore,
>> > > > > >> >> > > > > probably already processed its split
>> > > > > >> >> > > > >
>> > > > > >> >> > > > > Have you run hbck to repair your cluster ?
>> > > > > >> >> > > > >
>> > > > > >> >> > > > > Thanks
>> > > > > >> >> > > > >
>> > > > > >> >> > > > > On Sat, Nov 3, 2012 at 2:29 PM, Matt Corgan <
>> > > > > >> [email protected]>
>> > > > > >> >> > > wrote:
>> > > > > >> >> > > > >
>> > > > > >> >> > > > > > Here's a sample of the master's logs from
>> yesterday.
>> > > >  It's
>> > > > > not
>> > > > > >> >> > > > correlated
>> > > > > >> >> > > > > > exactly with the other pastebin log, but there's
>> 3GB
>> > of
>> > > > > this
>> > > > > >> from
>> > > > > >> >> > > > > > yesterday: http://pastebin.com/wP2rNN1t
>> > > > > >> >> > > > > >
>> > > > > >> >> > > > > > I'm am pushing the cluster a bit with importing
>> data
>> > so
>> > > > > >> testing
>> > > > > >> >> the
>> > > > > >> >> > > > split
>> > > > > >> >> > > > > > code harder than normal.  The regions are 500-1GB
>> > > > gzipped.
>> > > > >  I
>> > > > > >> can
>> > > > > >> >> > > look
>> > > > > >> >> > > > > into
>> > > > > >> >> > > > > > it more but trying to figure out what to look for.
>> > > > > >> >> > > > > >
>> > > > > >> >> > > > > > Thanks Ted,
>> > > > > >> >> > > > > > Matt
>> > > > > >> >> > > > > >
>> > > > > >> >> > > > > >
>> > > > > >> >> > > > > > On Sat, Nov 3, 2012 at 2:03 PM, Ted Yu <
>> > > > > [email protected]>
>> > > > > >> >> > wrote:
>> > > > > >> >> > > > > >
>> > > > > >> >> > > > > > > Matt:
>> > > > > >> >> > > > > > > This is the method which made the logging:
>> > > > > >> >> > > > > > >   private static int
>> > tickleNodeSplit(ZooKeeperWatcher
>> > > > > zkw,
>> > > > > >> >> > > > > > >       HRegionInfo parent, HRegionInfo a,
>> > HRegionInfo
>> > > b,
>> > > > > >> >> > ServerName
>> > > > > >> >> > > > > > > serverName,
>> > > > > >> >> > > > > > >       final int znodeVersion)
>> > > > > >> >> > > > > > >   throws KeeperException, IOException {
>> > > > > >> >> > > > > > >     byte [] payload = Writables.getBytes(a, b);
>> > > > > >> >> > > > > > >     return ZKAssign.transitionNode(zkw, parent,
>> > > > > serverName,
>> > > > > >> >> > > > > > >       EventType.RS_ZK_REGION_SPLIT,
>> > > > > >> >> EventType.RS_ZK_REGION_SPLIT,
>> > > > > >> >> > > > > > >       znodeVersion, payload);
>> > > > > >> >> > > > > > >   }
>> > > > > >> >> > > > > > >
>> > > > > >> >> > > > > > > transitionZKNode() calls tickleNodeSplit() when
>> > > waiting
>> > > > > for
>> > > > > >> >> > master
>> > > > > >> >> > > to
>> > > > > >> >> > > > > > split
>> > > > > >> >> > > > > > > the region. Obviously something caused the
>> master
>> > not
>> > > > > able
>> > > > > >> to
>> > > > > >> >> > > split.
>> > > > > >> >> > > > > > >
>> > > > > >> >> > > > > > > How large is the region ?
>> > > > > >> >> > > > > > >
>> > > > > >> >> > > > > > > Can you pastebin master log for that period of
>> > time ?
>> > > > > >> >> > > > > > >
>> > > > > >> >> > > > > > > Thanks
>> > > > > >> >> > > > > > >
>> > > > > >> >> > > > > > > On Sat, Nov 3, 2012 at 1:54 PM, Matt Corgan <
>> > > > > >> >> [email protected]
>> > > > > >> >> > >
>> > > > > >> >> > > > > wrote:
>> > > > > >> >> > > > > > >
>> > > > > >> >> > > > > > > > We upgraded from .94.0 to .94.2 last week and
>> > have
>> > > > > >> started to
>> > > > > >> >> > > > > encounter
>> > > > > >> >> > > > > > > > infinite loops of region-transition on splits.
>> > I'm
>> > > > not
>> > > > > >> sure
>> > > > > >> >> > yet
>> > > > > >> >> > > if
>> > > > > >> >> > > > > > it's
>> > > > > >> >> > > > > > > > all splits nor if it's related to load.
>> Solution
>> > > so
>> > > > > far
>> > > > > >> has
>> > > > > >> >> > been
>> > > > > >> >> > > > to
>> > > > > >> >> > > > > > > > restart the regionserver process.
>> > > > > >> >> > > > > > > >
>> > > > > >> >> > > > > > > > log snippet:
>> > > > > >> >> > > > > > > > http://pastebin.com/LpienZ7B
>> > > > > >> >> > > > > > > >
>> > > > > >> >> > > > > > > > It's repeating these two lines:
>> > > > > >> >> > > > > > > > 2012-11-02 01:35:33,312 DEBUG
>> > > > > >> >> > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > > >> >> > > > > > > > regionserver:60020-0x13ab46479832b76
>> Attempting
>> > to
>> > > > > >> transition
>> > > > > >> >> > > node
>> > > > > >> >> > > > > > > > cf3e9bc069e1888983c06dc8e053ffcf from
>> > > > > RS_ZK_REGION_SPLIT
>> > > > > >> to
>> > > > > >> >> > > > > > > > RS_ZK_REGION_SPLIT
>> > > > > >> >> > > > > > > > 2012-11-02 01:35:33,364 DEBUG
>> > > > > >> >> > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
>> > > > > >> >> > > > > > > > regionserver:60020-0x13ab46479832b76
>> Successfully
>> > > > > >> >> transitioned
>> > > > > >> >> > > node
>> > > > > >> >> > > > > > > > cf3e9bc069e1888983c06dc8e053ffcf from
>> > > > > RS_ZK_REGION_SPLIT
>> > > > > >> to
>> > > > > >> >> > > > > > > > RS_ZK_REGION_SPLIT
>> > > > > >> >> > > > > > > >
>> > > > > >> >> > > > > > > > with the occasional:
>> > > > > >> >> > > > > > > > 2012-11-02 01:35:34,476 DEBUG
>> > > > > >> >> > > > > > > >
>> > > > org.apache.hadoop.hbase.regionserver.SplitTransaction:
>> > > > > >> Still
>> > > > > >> >> > > > waiting
>> > > > > >> >> > > > > on
>> > > > > >> >> > > > > > > the
>> > > > > >> >> > > > > > > > master to process the split for
>> > > > > >> >> > cf3e9bc069e1888983c06dc8e053ffcf
>> > > > > >> >> > > > > > > >
>> > > > > >> >> > > > > > > > Should the region transition from
>> > > RS_ZK_REGION_SPLIT
>> > > > to
>> > > > > >> >> itself?
>> > > > > >> >> > >  It
>> > > > > >> >> > > > > > looks
>> > > > > >> >> > > > > > > > wrong, but I'm not familiar with the code at
>> all.
>> > > > > >> >> > > > > > > >
>> > > > > >> >> > > > > > > > Thanks,
>> > > > > >> >> > > > > > > > Matt
>> > > > > >> >> > > > > > > >
>> > > > > >> >> > > > > > >
>> > > > > >> >> > > > > >
>> > > > > >> >> > > > >
>> > > > > >> >> > > >
>> > > > > >> >> > >
>> > > > > >> >> >
>> > > > > >> >>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to