Hi,all
I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
and running hadoop on one namenode and 4 slaves.
attached is my hadoop-site.xml, and I didn't change the file
hadoop-default.xml
when data in segments are large,this kind of errors occure:
java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129
file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
at
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
at
org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
at
org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1712)
at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1787)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:104)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
at
org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.java:112)
at
org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReader.java:130)
at
org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(CompositeRecordReader.java:398)
at
org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:56)
at
org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:33)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
how can I correct this?
thanks.
Xu
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.map.tasks</name>
<value>41</value>
<description>The default number of map tasks per job. Typically set
to a prime several times greater than number of available hosts.
Ignored when mapred.job.tracker is "local".
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>8</value>
<description>The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local".
</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/mnt/nutch</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode:50001/</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>namenode:50002</value>
</property>
<property>
<name>tasktracker.http.threads</name>
<value>80</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>BLOCK</value>
</property>
<property>
<name>dfs.client.block.write.retries</name>
<value>3</value>
</property>
</configuration>