Hi, When you run into this problem, it's usually a sign of a META problem, specifically you have a 'hole' in the META table.
The META table contains a series of keys like so: table,start_row1,<timestamp> [data] table,start_row2,<timestamp> [data] etc When we search for a region for a given row, we build a key like so: 'table,my_row,9*19' and so a search called 'closestRowBefore'. This finds the region that contains this row. Now notice that we only put the start row in the key.... each region has a start_row,end_row, and all the regions are mutually exclusive and form complete coverage. Imagine a row for a region was missing, we'd consistently find the wrong region and the regionserver would reject the request (correctly so). That is what is probably happening here. Check the table dump in the master web-ui and see if you can find a 'hole'... where the end-key doesnt match up with the start-key. If that is the case, there is a script add_table.rb which is used to fix these things. -ryan On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith <[email protected]> wrote: > Hello, > > I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to > insert a specific item into the database. > > Client side I see: > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact > region server Some server, retryOnlyOne=true, index=0, islastrow=true, > tries=9, numtries=10, i=0, listsize=1, > region=filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836 > for region filestore, > > I then looked up which node was hosting the given region > (filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b) > on the gui, found the following debug message in the regionserver log: > > 2010-08-06 14:23:47,414 DEBUG > org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts interrupted at > index=0 because:Requested row out of range for HRegion > filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836, > startKey='bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b', > getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633', > row='be1681910b02db5da061659c2cb08f501a135c2f065559a37a1761bf6e203d1d' > > > Which appears to be coming from: > > /regionserver/HRegionServer.java:1786: LOG.debug("Batch puts interrupted > at index=" + i + " because:" + > > Which is coming from: > > ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658: throw new > WrongRegionException("Requested row out of range for " + > > This happens repeatedly on a specific item over at least a day or so, even > when not much is happening with the cluster. > > As far as I can tell, it looks like the logic to select the correct region > for a given row is wrong. The row is indeed not in the correct range (at > least from what I can tell of the exception thrown), and the check in > HRegion.java:1658: > > /** Make sure this is a valid row for the HRegion */ > private void checkRow(final byte [] row) throws IOException { > if(!rowIsInRange(regionInfo, row)) { > > Is correctly rejecting the Put(). > > So it appears the error would be somewhere in: > HRegion.java:1550: > private void put(final Map<byte [],List<KeyValue>> familyMap, > boolean writeToWAL) throws IOException { > > Which appears to be the actual guts of the insert operation. > However, I don't know enough about the design of HRegions to really decipher > this method. I'll dig into it more, but I thought it might be more efficient > just to ask you guys first. > > Any ideas? > > I can update to 0.20.6, but I don't see any fixed jira's on 0.20.6 that seem > related.. I could be wrong. I'm not sure what I should do next. Any more > information you guys need? > > Note that I am inserting file into the database, and using it's sha256sum as > the key. And the file that is failing does indeed have a sha that corresponds > to the key in the message above (and is out of range). > > Take care, > -stu > > > > > >
