[
https://issues.apache.org/jira/browse/HBASE-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Corgan updated HBASE-3408:
-------------------------------
Description:
If AssignmentManager tries to move a region to an invalid destination server,
rather than choosing a random server as intended, it throws an NPE.
Line 1009 should check if existingPlan.getDestination()!=null:
if (existingPlan == null || forceNewPlan ||
(existingPlan.getDestination() != null &&
existingPlan.getDestination().equals(serverToExclude))) {
I triggered it by trying to manually move regions around, probably to an
invalid destination server. I'm not currently able to build the project to
test if that's the extent of the problem, so here's a little more info...
It leaves a stranded region-in-transition until the master and/or regionserver
are restarted and causes problems like the following. "hbck -fix" was unable
to repair it.
2011-01-04 00:14:10,948 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor:
Scanned 4287 catalog row(s) and gc'd 0 unreferenced parent region(s)
2011-01-04 00:14:18,574 DEBUG org.apache.hadoop.hbase.master.HMaster: Not
running balancer because 1 region(s) in transition:
{23ebce9a5d174f87bfb96ed1da387fdc=RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc.
state=OFFLINE, ts=1294118046139}
2011-01-04 00:14:36,142 INFO org.apache.hadoop.hbase.master.AssignmentManager:
Regions in transition timed out:
RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. state=OFFLINE,
ts=1294118046139
2011-01-04 00:14:36,142 INFO org.apache.hadoop.hbase.master.AssignmentManager:
Region has been OFFLINE for too long, reassigning
RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. to a random server
2011-01-04 00:14:36,142 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Forcing OFFLINE;
was=RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. state=OFFLINE,
ts=1294118046139
2011-01-04 00:14:36,142 ERROR
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: Caught
exception
java.lang.NullPointerException
at
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:934)
(i think this is .90.0RC1, so same bug on a different line number)
at
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:909)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:822)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:663)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:643)
at
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1481)
at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
was:
If AssignmentManager tries to move a region to an invalid destination server,
rather than choosing a random server as intended, it throws an NPE.
Line 1009 should check if getDestination()==null:
if (existingPlan == null || forceNewPlan ||
(existingPlan.getDestination() != null &&
existingPlan.getDestination().equals(serverToExclude))) {
I triggered it by trying to manually move regions around, probably to an
invalid destination server. I'm not currently able to build the project to
test if that's the extent of the problem, so here's a little more info...
It leaves a stranded region-in-transition until the master and/or regionserver
are restarted and causes problems like the following. "hbck -fix" was unable
to repair it.
2011-01-04 00:14:10,948 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor:
Scanned 4287 catalog row(s) and gc'd 0 unreferenced parent region(s)
2011-01-04 00:14:18,574 DEBUG org.apache.hadoop.hbase.master.HMaster: Not
running balancer because 1 region(s) in transition:
{23ebce9a5d174f87bfb96ed1da387fdc=RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc.
state=OFFLINE, ts=1294118046139}
2011-01-04 00:14:36,142 INFO org.apache.hadoop.hbase.master.AssignmentManager:
Regions in transition timed out:
RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. state=OFFLINE,
ts=1294118046139
2011-01-04 00:14:36,142 INFO org.apache.hadoop.hbase.master.AssignmentManager:
Region has been OFFLINE for too long, reassigning
RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. to a random server
2011-01-04 00:14:36,142 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Forcing OFFLINE;
was=RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. state=OFFLINE,
ts=1294118046139
2011-01-04 00:14:36,142 ERROR
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: Caught
exception
java.lang.NullPointerException
at
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:934)
(i think this is .90.0RC1, so same bug on a different line number)
at
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:909)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:822)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:663)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:643)
at
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1481)
at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> AssignmentManager NullPointerException
> --------------------------------------
>
> Key: HBASE-3408
> URL: https://issues.apache.org/jira/browse/HBASE-3408
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Matt Corgan
> Fix For: 0.90.0
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> If AssignmentManager tries to move a region to an invalid destination server,
> rather than choosing a random server as intended, it throws an NPE.
> Line 1009 should check if existingPlan.getDestination()!=null:
> if (existingPlan == null || forceNewPlan ||
> (existingPlan.getDestination() != null &&
> existingPlan.getDestination().equals(serverToExclude))) {
> I triggered it by trying to manually move regions around, probably to an
> invalid destination server. I'm not currently able to build the project to
> test if that's the extent of the problem, so here's a little more info...
> It leaves a stranded region-in-transition until the master and/or
> regionserver are restarted and causes problems like the following. "hbck
> -fix" was unable to repair it.
> 2011-01-04 00:14:10,948 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor:
> Scanned 4287 catalog row(s) and gc'd 0 unreferenced parent region(s)
> 2011-01-04 00:14:18,574 DEBUG org.apache.hadoop.hbase.master.HMaster: Not
> running balancer because 1 region(s) in transition:
> {23ebce9a5d174f87bfb96ed1da387fdc=RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc.
> state=OFFLINE, ts=1294118046139}
> 2011-01-04 00:14:36,142 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed
> out: RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc.
> state=OFFLINE, ts=1294118046139
> 2011-01-04 00:14:36,142 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE for
> too long, reassigning
> RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. to a random
> server
> 2011-01-04 00:14:36,142 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> was=RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc.
> state=OFFLINE, ts=1294118046139
> 2011-01-04 00:14:36,142 ERROR
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: Caught
> exception
> java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:934)
> (i think this is .90.0RC1, so same bug on a different line number)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:909)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:822)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:663)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:643)
> at
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1481)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.