On Mon, Jan 31, 2011 at 10:04 PM, Ted Yu <[email protected]> wrote:
> J-D: > Thanks for your kind offer. > Full output from a reducer contains CIQ proprietary code information which > I > cannot disclose. > > Also the size of data node logs would be big. > > It would be nice if people from Cloudera can take a look. Or they can > outline their hadoop release schedule which covers hdfs-724 and hdfs-895. > Like JD said, you have to provide a lot more data than you're providing. "Retrying connect" indicates likely network issues, but who knows past that? Doesn't look like HDFS-724, and we've had HDFS-895 in our build for months and months. -Todd > > On Mon, Jan 31, 2011 at 11:23 AM, Jean-Daniel Cryans <[email protected] > >wrote: > > > The timestamps from those logs don't correlate to the issue you pasted > > earlier, would it be possible to see all logs from a single instance > > of the issue? It would make our life much easier helping you. > > > > In fact, I would like to see all the logs from all the datanodes plus > > the full output from a reducer. You could compress it and leave that > > on a webserver or send it to me directly. What you pasted only gives a > > very restricted view of what happened. > > > > J-D > > > > On Sun, Jan 30, 2011 at 7:40 AM, Ted Yu <[email protected]> wrote: > > > Datanode log snippet can be found here: > > > http://pastebin.com/Q555XdVU > > > > > > Here is reducer log snippet: > > > http://pastebin.com/a7RBq5aa > > > > > > Since cdh3b2 doesn't contain hdfs-724, I am not sure whether Hairong's > > patch > > > ( > > > https://issues.apache.org/jira/secure/attachment/12459664/hbAckReply.patch > > ) > > > should be applied. > > > > > > If someone can share how hadoop-core-0.20-append-r1056497.jar (with > fixed > > > hdfs-724) is used with their hadoop cluster, that would be great. > > > > > > On Mon, Jan 24, 2011 at 4:58 PM, Ted Yu <[email protected]> wrote: > > > > > >> Hi, > > >> Running 0.90 in dev cluster where I used cdh3b2 hadoop jar, I > frequently > > >> saw the following in reduce task log: > > >> > > >> INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - 2011-01-24 > 22:55:39,009 > > >> INFO com.carrieriq.m2m.platform.mmp3.output.DimensionMapper: Total > > >> requets=15523640 cache hit ratio=0.84543097 avg time=90.1465879780713 > > >> INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - 2011-01-24 > 23:17:03,216 > > >> WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream > ResponseProcessor > > >> exception for block > blk_8207645655823156697_2836871java.io.IOException: > > Bad > > >> response 1 for block blk_8207645655823156697_2836871 from datanode > > >> 10.202.50.71:50010 > > >> INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - at > > >> > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2497) > > >> INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - > > >> INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - 2011-01-24 > 23:17:03,217 > > >> WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block > > >> blk_8207645655823156697_2836871 bad datanode[1] 10.202.50.71:50010 > > >> INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - 2011-01-24 > 23:17:03,217 > > >> WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block > > >> blk_8207645655823156697_2836871 in pipeline 10.202.50.78:50010, > > >> 10.202.50.71:50010: bad datanode 10.202.50.71:50010 > > >> INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - 2011-01-24 > 23:17:03,252 > > >> INFO org.apache.hadoop.ipc.Client: Retrying connect to server: / > > >> 10.202.50.78:50020. Already tried 0 time(s). > > >> INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - 2011-01-24 > 23:27:27,931 > > >> WARN org.apache.hadoop.mapred.TaskRunner: Parent died. Exiting > > >> > > >> HDFS-895 <https://issues.apache.org/jira/browse/HDFS-895> is in > > >> http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320.releasenotes.html > > >> > > >> Expert opinion on what I saw is appreciated. > > >> > > > > > > -- Todd Lipcon Software Engineer, Cloudera
