Report errors better when async flushCommits() fail
---------------------------------------------------

                 Key: HBASE-3452
                 URL: https://issues.apache.org/jira/browse/HBASE-3452
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.89.20100924
            Reporter: Lars George
            Priority: Minor
             Fix For: 0.90.1


We had the issue where a MapReduce would fail with the following error:

{code}
org.apache.hadoop.hbase.client.RetriesExhaustedException: Still had 3913 puts 
left after retrying 10 times.
at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1526)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:664)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:549)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535)
at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:104)
at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:65)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:512)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at 
com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:91)
at 
com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:570)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
{code}

which sent us on a wild goose hunt to figure why we could read from a table but 
not write back into it. We finally checked the server logs and got this:

{code}
2011-01-18 13:47:56,479 WARN org.apache.hadoop.hbase.regionserver.HRegion: No 
such column family in batch put
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family 
dim does not exist in region 
sessions6,,1295358051638.272a4ba588438119f9f866f491a4428c. in table {NAME =
> 'foo', FAMILIES => [{NAME => 'dims', BLOOMFILTER => 'NONE', REPLICATION_SCOPE 
> => '0'
, COMPRESSION => 'LZO', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => 
'65536', IN_MEMOR
Y => 'false', BLOCKCACHE => 'true'}, ...
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2931)
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamilies(HRegion.java:1683)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1356)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1321)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1814)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionServer.java:24
79)
{code}

So we had a typo in the colfam and that was reported server side. That error 
never made it into the task logs, but should have to make this much easier to 
track down. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to