[ 
https://issues.apache.org/jira/browse/HBASE-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-495:
------------------------

    Attachment: 495-0.1.patch

Here is a patch against 0.1.  Will make others if this passes muster.

My thought on this issue is that the cluster is so messy w/ millions of log 
lines, its hard to debug.  Suggest that we commit this patch against this issue 
and open another when we see duplicate regions next time.

What seems to be happening is regions are failing to open out on the 
regionservers because dfs is corrupt.  Was thinking could shutdown if IOE out 
of HDFS but looking at where the exception is coming up, we actually do do a 
filesystem check and it must be succeeding.  Also, a failed compaction may not 
always be worthy of our shutting down regionserver -- in this case on region 
startup it probably is but later as part of normal operation it probably is 
not.  DFS health seems to be a tad more involved.

HBASE-495 No server address listed in .META.
M src/java/org/apache/hadoop/hbase/HMaster.java
  (regionServerStartup): Refactor.  Create lease BEFORE scheduling shutdown
  process.  We used do things other way round; meant that we'd shedule a
  shutdown process for every report the regionserver made.  Could be many
  if old lease hanging around.
  (registerRegionServer): Added.  This is body of what used to be in
  regionServerStartup moved here so easy to have a finally in the calling
  method (Should never be an exception out of this method so finally should
  never have to run).

  Removed some useless DEBUG level logs; If thousands of rows in .META.,
  then at least a DEBUG per row multiplied by the shutdown processes
  queued.

> No server address listed in .META.
> ----------------------------------
>
>                 Key: HBASE-495
>                 URL: https://issues.apache.org/jira/browse/HBASE-495
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.16.0
>            Reporter: stack
>             Fix For: 0.1.0, 0.2.0
>
>         Attachments: 495-0.1.patch
>
>
> Michael Bieniosek manufactured the following in a 0.16.0 install:
> {code}
> 08/03/06 17:52:02 DEBUG hbase.HTable: Advancing internal scanner to startKey 
> g80Fi5WZHlzLqGzErrAd7V==
> 08/03/06 17:52:02 DEBUG hbase.HConnectionManager$TableServers: reloading 
> table servers because: No server address listed in .META. for region 
> enwiki_080103,g80Fi5WZHlzLqGzErrAd7V==,1204768636421
> 08/03/06 17:52:12 DEBUG hbase.HConnectionManager$TableServers: reloading 
> table servers because: No server address listed in .META. for region 
> enwiki_080103,g80Fi5WZHlzLqGzErrAd7V==,1204768636421
> 08/03/06 17:52:22 DEBUG hbase.HConnectionManager$TableServers: reloading 
> table servers because: No server address listed in .META. for region 
> enwiki_080103,g80Fi5WZHlzLqGzErrAd7V==,1204768636421
> org.apache.hadoop.hbase.NoServerForRegionException: No server address listed 
> in .META. for region enwiki_080103,g80Fi5WZHlzLqGzErrAd7V==,1204768636421
>         at 
> org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:449)
>         at 
> org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:346)
>         at 
> org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:309)
>         at org.apache.hadoop.hbase.HTable.getRegionLocation(HTable.java:103)
>         at 
> org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:854)
>         at org.apache.hadoop.hbase.HTable$ClientScanner.next(HTable.java:915)
>         at 
> org.apache.hadoop.hbase.hql.SelectCommand.scanPrint(SelectCommand.java:233)
>         at 
> org.apache.hadoop.hbase.hql.SelectCommand.execute(SelectCommand.java:100)
>         at 
> org.apache.hadoop.hbase.hql.HQLClient.executeQuery(HQLClient.java:50)
>         at org.apache.hadoop.hbase.Shell.main(Shell.java:114)
> {code}
> When I look in the .META., I see that the above region range has multiple 
> mentions... : one offlined, two that have startcodes and servers associated 
> and about 5 others that are just HRIs.  Table is broke.  At least need the 
> merge of overlapping regions tool to fix.  Digging more....

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to