Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

2010-08-06 Thread Ryan Rawson
Hi,

When you run into this problem, it's usually a sign of a META problem,
specifically you have a 'hole' in the META table.

The META table contains a series of keys like so:
table,start_row1,timestamp[data]
table,start_row2,timestamp[data]

etc

When we search for a region for a given row, we build a key like so:
'table,my_row,9*19' and so a search called 'closestRowBefore'.  This
finds the region that contains this row.

Now notice that we only put the start row in the key each region
has a start_row,end_row, and all the regions are mutually exclusive
and form complete coverage.  Imagine a row for a region was missing,
we'd consistently find the wrong region and the regionserver would
reject the request (correctly so).

That is what is probably happening here.  Check the table dump in the
master web-ui and see if you can find a 'hole'... where the end-key
doesnt match up with the start-key.

If that is the case, there is a script add_table.rb which is used to
fix these things.

-ryan

On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith stu24m...@yahoo.com wrote:
 Hello,

  I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to 
 insert a specific item into the database.

 Client side I see:

 org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
 region server Some server, retryOnlyOne=true, index=0, islastrow=true, 
 tries=9, numtries=10, i=0, listsize=1, 
 region=filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836
  for region filestore,

 I then looked up which node was hosting the given region 
 (filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b) 
 on the gui, found the following debug message in the regionserver log:

 2010-08-06 14:23:47,414 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts interrupted at 
 index=0 because:Requested row out of range for HRegion 
 filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836,
  startKey='bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b', 
 getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633',
  row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d'


 Which appears to be coming from:

 /regionserver/HRegionServer.java:1786:      LOG.debug(Batch puts interrupted 
 at index= + i +  because: +

 Which is coming from:

 ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658:      throw new 
 WrongRegionException(Requested row out of range for  +

 This happens repeatedly on a specific item over at least a day or so, even 
 when not much is happening with the cluster.

 As far as I can tell, it looks like the logic to select the correct region 
 for a given row is wrong. The row is indeed not in the correct range (at 
 least from what I can tell of the exception thrown), and the check in 
 HRegion.java:1658:

  /** Make sure this is a valid row for the HRegion */
  private void checkRow(final byte [] row) throws IOException {
    if(!rowIsInRange(regionInfo, row)) {

 Is correctly rejecting the Put().

 So it appears the error would be somewhere in:
 HRegion.java:1550:
  private void put(final Mapbyte [],ListKeyValue familyMap,
      boolean writeToWAL) throws IOException {

 Which appears to be the actual guts of the insert operation.
 However, I don't know enough about the design of HRegions to really decipher 
 this method. I'll dig into it more, but I thought it might be more efficient 
 just to ask you guys first.

 Any ideas?

 I can update to 0.20.6, but I don't see any fixed jira's on 0.20.6 that seem 
 related.. I could be wrong. I'm not sure what I should do next. Any more 
 information you guys need?

 Note that I am inserting file into the database, and using it's sha256sum as 
 the key. And the file that is failing does indeed have a sha that corresponds 
 to the key in the message above (and is out of range).

 Take care,
  -stu








Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

2010-08-06 Thread Stuart Smith
Hello Ryan,

  Yup. There's a hole, exactly where it should be.

I used add_table.rb once before, and am no expert on it.
All I have is a note written down:

To recover lost tables:
./hbase org.jruby.Main add_table.rb /hbase/filestore

Any thing else I need to know? Do I just run the script like so?
Anything need to be shut down before I do?

Thanks!

Take care,
  -stu


--- On Fri, 8/6/10, Ryan Rawson ryano...@gmail.com wrote:

 From: Ryan Rawson ryano...@gmail.com
 Subject: Re: Batch puts interrupted ... Requested row out of range for 
 HRegion  filestore 
 ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
 To: user@hbase.apache.org
 Date: Friday, August 6, 2010, 6:08 PM
 Hi,
 
 When you run into this problem, it's usually a sign of a
 META problem,
 specifically you have a 'hole' in the META table.
 
 The META table contains a series of keys like so:
 table,start_row1,timestamp    [data]
 table,start_row2,timestamp    [data]
 
 etc
 
 When we search for a region for a given row, we build a key
 like so:
 'table,my_row,9*19' and so a search called
 'closestRowBefore'.  This
 finds the region that contains this row.
 
 Now notice that we only put the start row in the key
 each region
 has a start_row,end_row, and all the regions are mutually
 exclusive
 and form complete coverage.  Imagine a row for a
 region was missing,
 we'd consistently find the wrong region and the
 regionserver would
 reject the request (correctly so).
 
 That is what is probably happening here.  Check the
 table dump in the
 master web-ui and see if you can find a 'hole'... where the
 end-key
 doesnt match up with the start-key.
 
 If that is the case, there is a script add_table.rb which
 is used to
 fix these things.
 
 -ryan
 
 On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith stu24m...@yahoo.com
 wrote:
  Hello,
 
   I'm running hbase 0.20.5, and seeing Puts() fail
 repeatedly when trying to insert a specific item into the
 database.
 
  Client side I see:
 
 
 org.apache.hadoop.hbase.client.RetriesExhaustedException:
 Trying to contact region server Some server,
 retryOnlyOne=true, index=0, islastrow=true, tries=9,
 numtries=10, i=0, listsize=1,
 region=filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836
 for region filestore,
 
  I then looked up which node was hosting the given
 region
 (filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b)
 on the gui, found the following debug message in the
 regionserver log:
 
  2010-08-06 14:23:47,414 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegionServer: Batch
 puts interrupted at index=0 because:Requested row out of
 range for HRegion
 filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836,
 startKey='bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b',
 getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633',
 row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d'
 
 
  Which appears to be coming from:
 
  /regionserver/HRegionServer.java:1786:    
  LOG.debug(Batch puts interrupted at index= + i + 
 because: +
 
  Which is coming from:
 
 
 ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658:
      throw new WrongRegionException(Requested row out of
 range for  +
 
  This happens repeatedly on a specific item over at
 least a day or so, even when not much is happening with the
 cluster.
 
  As far as I can tell, it looks like the logic to
 select the correct region for a given row is wrong. The row
 is indeed not in the correct range (at least from what I can
 tell of the exception thrown), and the check in
 HRegion.java:1658:
 
   /** Make sure this is a valid row for the HRegion
 */
   private void checkRow(final byte [] row) throws
 IOException {
     if(!rowIsInRange(regionInfo, row)) {
 
  Is correctly rejecting the Put().
 
  So it appears the error would be somewhere in:
  HRegion.java:1550:
   private void put(final Mapbyte
 [],ListKeyValue familyMap,
       boolean writeToWAL) throws IOException {
 
  Which appears to be the actual guts of the insert
 operation.
  However, I don't know enough about the design of
 HRegions to really decipher this method. I'll dig into it
 more, but I thought it might be more efficient just to ask
 you guys first.
 
  Any ideas?
 
  I can update to 0.20.6, but I don't see any fixed
 jira's on 0.20.6 that seem related.. I could be wrong. I'm
 not sure what I should do next. Any more information you
 guys need?
 
  Note that I am inserting file into the database, and
 using it's sha256sum as the key. And the file that is
 failing does indeed have a sha that corresponds to the key
 in the message above (and is out of range).
 
  Take care,
   -stu
 
 
 
 
 
 
 





Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:

2010-08-06 Thread Stuart Smith
Just to follow up - I ran add_table as I had done when I lost a table before - 
and it fixed the error.

Thanks!

Take care,
  -stu

--- On Fri, 8/6/10, Stuart Smith stu24m...@yahoo.com wrote:

 From: Stuart Smith stu24m...@yahoo.com
 Subject: Re: Batch puts interrupted ... Requested row out of range for 
 HRegion  filestore 
 ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
 To: user@hbase.apache.org
 Date: Friday, August 6, 2010, 6:50 PM
 Hello Ryan,
 
   Yup. There's a hole, exactly where it should be.
 
 I used add_table.rb once before, and am no expert on it.
 All I have is a note written down:
 
 To recover lost tables:
 ./hbase org.jruby.Main add_table.rb /hbase/filestore
 
 Any thing else I need to know? Do I just run the script
 like so?
 Anything need to be shut down before I do?
 
 Thanks!
 
 Take care,
   -stu
 
 
 --- On Fri, 8/6/10, Ryan Rawson ryano...@gmail.com
 wrote:
 
  From: Ryan Rawson ryano...@gmail.com
  Subject: Re: Batch puts interrupted ... Requested row
 out of range for HRegion  filestore
 ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
  To: user@hbase.apache.org
  Date: Friday, August 6, 2010, 6:08 PM
  Hi,
  
  When you run into this problem, it's usually a sign of
 a
  META problem,
  specifically you have a 'hole' in the META table.
  
  The META table contains a series of keys like so:
  table,start_row1,timestamp    [data]
  table,start_row2,timestamp    [data]
  
  etc
  
  When we search for a region for a given row, we build
 a key
  like so:
  'table,my_row,9*19' and so a search called
  'closestRowBefore'.  This
  finds the region that contains this row.
  
  Now notice that we only put the start row in the
 key
  each region
  has a start_row,end_row, and all the regions are
 mutually
  exclusive
  and form complete coverage.  Imagine a row for a
  region was missing,
  we'd consistently find the wrong region and the
  regionserver would
  reject the request (correctly so).
  
  That is what is probably happening here.  Check the
  table dump in the
  master web-ui and see if you can find a 'hole'...
 where the
  end-key
  doesnt match up with the start-key.
  
  If that is the case, there is a script add_table.rb
 which
  is used to
  fix these things.
  
  -ryan
  
  On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith stu24m...@yahoo.com
  wrote:
   Hello,
  
    I'm running hbase 0.20.5, and seeing Puts()
 fail
  repeatedly when trying to insert a specific item into
 the
  database.
  
   Client side I see:
  
  
 
 org.apache.hadoop.hbase.client.RetriesExhaustedException:
  Trying to contact region server Some server,
  retryOnlyOne=true, index=0, islastrow=true, tries=9,
  numtries=10, i=0, listsize=1,
 
 region=filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836
  for region filestore,
  
   I then looked up which node was hosting the
 given
  region
 
 (filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b)
  on the gui, found the following debug message in the
  regionserver log:
  
   2010-08-06 14:23:47,414 DEBUG
  org.apache.hadoop.hbase.regionserver.HRegionServer:
 Batch
  puts interrupted at index=0 because:Requested row out
 of
  range for HRegion
 
 filestore,bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836,
 
 startKey='bdfa9f217300cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b',
 
 getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633',
 
 row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d'
  
  
   Which appears to be coming from:
  
   /regionserver/HRegionServer.java:1786:    
   LOG.debug(Batch puts interrupted at index= + i +
 
  because: +
  
   Which is coming from:
  
  
 
 ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658:
       throw new WrongRegionException(Requested row
 out of
  range for  +
  
   This happens repeatedly on a specific item over
 at
  least a day or so, even when not much is happening
 with the
  cluster.
  
   As far as I can tell, it looks like the logic to
  select the correct region for a given row is wrong.
 The row
  is indeed not in the correct range (at least from what
 I can
  tell of the exception thrown), and the check in
  HRegion.java:1658:
  
    /** Make sure this is a valid row for the
 HRegion
  */
    private void checkRow(final byte [] row)
 throws
  IOException {
      if(!rowIsInRange(regionInfo, row)) {
  
   Is correctly rejecting the Put().
  
   So it appears the error would be somewhere in:
   HRegion.java:1550:
    private void put(final Mapbyte
  [],ListKeyValue familyMap,
        boolean writeToWAL) throws IOException {
  
   Which appears to be the actual guts of the
 insert
  operation.
   However, I don't know enough about the design of
  HRegions to really decipher this method. I'll dig into
 it
  more, but I thought it might be more efficient just to
 ask
  you guys first.
  
   Any ideas?
  
   I can update to 0.20.6