[ 
https://issues.apache.org/jira/browse/HBASE-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Corgan updated HBASE-3408:
-------------------------------

    Attachment: HBASE-3408[0.90.0].patch

> AssignmentManager NullPointerException
> --------------------------------------
>
>                 Key: HBASE-3408
>                 URL: https://issues.apache.org/jira/browse/HBASE-3408
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Matt Corgan
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3408[0.90.0].patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If AssignmentManager tries to move a region to an invalid destination server, 
> rather than choosing a random server as intended, it throws an NPE.
> Line 1009 should check if existingPlan.getDestination()!=null:
>  if (existingPlan == null || forceNewPlan ||
>           (existingPlan.getDestination() != null && 
> existingPlan.getDestination().equals(serverToExclude))) {
> I triggered it by trying to manually move regions around, probably to an 
> invalid destination server.  I'm not currently able to build the project to 
> test if that's the extent of the problem, so here's a little more info...  
> It leaves a stranded region-in-transition until the master and/or 
> regionserver are restarted and causes problems like the following.  "hbck 
> -fix" was unable to repair it.
> 2011-01-04 00:14:10,948 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: 
> Scanned 4287 catalog row(s) and gc'd 0 unreferenced parent region(s)
> 2011-01-04 00:14:18,574 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because 1 region(s) in transition: 
> {23ebce9a5d174f87bfb96ed1da387fdc=RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc.
>  state=OFFLINE, ts=1294118046139}
> 2011-01-04 00:14:36,142 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. 
> state=OFFLINE, ts=1294118046139
> 2011-01-04 00:14:36,142 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE for 
> too long, reassigning 
> RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. to a random 
> server
> 2011-01-04 00:14:36,142 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=RandomValue,,1291219068335.23ebce9a5d174f87bfb96ed1da387fdc. 
> state=OFFLINE, ts=1294118046139
> 2011-01-04 00:14:36,142 ERROR 
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: Caught 
> exception
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:934)
>  (i think this is .90.0RC1, so same bug on a different line number)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:909)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:822)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:663)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:643)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1481)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:66)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to