Re: HBase 0.92/Hadoop 0.22 test results

Ted Yu Tue, 08 Nov 2011 16:20:49 -0800

Maybe the following is related ?

11/11/08 18:50:04 WARN hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: File
/hbase/splitlog/domU-12-31-39-09-E8-31.compute-1.internal,60020,1320792889412_hdfs%3A%2F%2Fip-10-46-114-25.ec2.internal%3A17020%2Fhbase%2F.logs%2Fip-10-245-191-239.ec2.internal%2C60020%2C1320792860210-splitting%2Fip-10-245-191-239.ec2.internal%252C60020%252C1320792860210.1320796004063/TestLoadAndVerify_1320795370905/d76a246e81525444beeea99200b3e9a4/recovered.edits/0000000000000048149
could only be replicated to 0 nodes, instead of 1
  at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1646)
  at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:829)
  at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)


On Tue, Nov 8, 2011 at 4:10 PM, Roman Shaposhnik <[email protected]> wrote:

> +Konstantin (there's something weird in append handling)
>
> Some more updates. Hope this will help. I had this hunch that
> I was seeing those weird issues when HDFS DN was at 80%
> capacity (but nowhere near full!). So I quickly spun off a cluster
> that had 5 DNs with modest (and unbalanced!) amount of
> storage. Here's what started happening towards the end of
> loading 2M records into HBase:
>
> On the master:
>
> {"statustimems":-1,"status":"Waiting for distributed tasks to finish.
> scheduled=4 done=0
> error=3","starttimems":1320796207862,"description":"Doing distributed
> log split in
> [hdfs://ip-10-46-114-25.ec2.internal:17020/hbase/.logs/ip-10-245-191-239.ec2.internal,60020,1320792860210-splitting]","state":"RUNNING","statetimems":-1},{"statustimems":1320796275317,"status":"Waiting
> for distributed tasks to finish. scheduled=4 done=0
> error=1","starttimems":1320796206563,"description":"Doing distributed
> log split in
> [hdfs://ip-10-46-114-25.ec2.internal:17020/hbase/.logs/ip-10-245-191-239.ec2.internal,60020,1320792860210-splitting]","state":"ABORTED","statetimems":1320796275317},{"statustimems":1320796275317,"status":"Waiting
> for distributed tasks to finish. scheduled=4 done=0
> error=2","starttimems":1320796205304,"description":"Doing distributed
> log split in
> [hdfs://ip-10-46-114-25.ec2.internal:17020/hbase/.logs/ip-10-245-191-239.ec2.internal,60020,1320792860210-splitting]","state":"ABORTED","statetimems":1320796275317},{"statustimems":1320796275317,"status":"Waiting
> for distributed tasks to finish. scheduled=4 done=0
> error=3","starttimems":1320796203957,"description":"Doing distributed
> log split in
> [hdfs://ip-10-46-114-25.ec2.internal:17020/hbase/.logs/ip-10-245-191-239.ec2.internal,60020,1320792860210-splitting]","state":"ABORTED","statetimems":1320796275317}]
>
> 11/11/08 18:51:15 WARN monitoring.TaskMonitor: Status Doing
> distributed log split in
>
> [hdfs://ip-10-46-114-25.ec2.internal:17020/hbase/.logs/ip-10-245-191-239.ec2.internal,60020,1320792860210-splitting]:
> status=Waiting for distributed tasks to finish.  scheduled=4 done=0
> error=3, state=RUNNING, startTime=1320796203957, completionTime=-1
> appears to have been leaked
> 11/11/08 18:51:15 WARN monitoring.TaskMonitor: Status Doing
> distributed log split in
>
> [hdfs://ip-10-46-114-25.ec2.internal:17020/hbase/.logs/ip-10-245-191-239.ec2.internal,60020,1320792860210-splitting]:
> status=Waiting for distributed tasks to finish.  scheduled=4 done=0
> error=2, state=RUNNING, startTime=1320796205304, completionTime=-1
> appears to have been leaked
> 11/11/08 18:51:15 WARN monitoring.TaskMonitor: Status Doing
> distributed log split in
>
> [hdfs://ip-10-46-114-25.ec2.internal:17020/hbase/.logs/ip-10-245-191-239.ec2.internal,60020,1320792860210-splitting]:
> status=Waiting for distributed tasks to finish.  scheduled=4 done=0
> error=1, state=RUNNING, startTime=1320796206563, completionTime=-1
> appears to have been leaked
>
> And the behavior on the DNs was even weirder. I'm attaching a log
> from one of the DNs. The last exception is a shocker to me:
>
> 11/11/08 18:51:07 WARN regionserver.SplitLogWorker: log splitting of
> hdfs://ip-10-46-114-25.ec2.internal:17020/
>
> hbase/.logs/ip-10-245-191-239.ec2.internal,60020,1320792860210-splitting/ip-10-245-191-239.ec2.internal%2C60020
> %2C1320792860210.1320796004063 failed, returning error
> java.io.IOException: Failed to open
> hdfs://ip-10-46-114-25.ec2.internal:17020/hbase/.logs/ip-10-245-191-239.ec2
>
> .internal,60020,1320792860210-splitting/ip-10-245-191-239.ec2.internal%2C60020%2C1320792860210.1320796004063
> fo
> r append
>
> But perhaps its is cascading from some of the earlier ones.
>
> Anyway, take a look at the attached log.
>
> Now, this is a tricky issue to reproduce. Just before it started failing
> again I had a completely clean run over here:
>
> http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-trunk-smoketest-22/33/testReport/
>
> Which makes me believe it is NOT configuration related.
>
> Thanks,
> Roman.
>

Re: HBase 0.92/Hadoop 0.22 test results

Reply via email to