question on Hadoop configuration for non cpu intensive jobs - 0.15.1

Jason Venner Tue, 25 Dec 2007 13:57:52 -0800

We have two flavors of jobs we run through hadoop, the first flavor is asimple merge sort, where there is very little happening in the mapper orthe reducer.

The second flavor are very compute intensive.

In the first type, our each map task consumes its (default sized) 64meginput split in a small number of seconds, resulting quite a bit of theelapsed time being spent in job setup and shutdown.

We have tried reducing the number of splits by increasing the blocksizes to 10x and 5x 64meg, but then we constantly have out of memoryerrors and timeouts. At this point each jvm is getting 768M and I can'treadily allocate more without dipping into swap.


What suggestions do people have for this case?

07/12/25 11:49:59 INFO mapred.JobClient: Task Id :task_200712251146_0001_m_000002_0, Status : FAILED

java.lang.OutOfMemoryError: Java heap space

atorg.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)atorg.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)atorg.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1763)atorg.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1663)atorg.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1709)atorg.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)

       at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:174)
       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)

atorg.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

07/12/25 11:51:35 INFO mapred.JobClient: Task Id :task_200712251146_0001_r_000038_0, Status : FAILED

java.net.SocketTimeoutException: timed out waiting for rpc response
       at org.apache.hadoop.ipc.Client.call(Client.java:484)
       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
       at org.apache.hadoop.dfs.$Proxy1.getProtocolVersion(Unknown Source)
       at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:269)

atorg.apache.hadoop.dfs.DFSClient.createNamenode(DFSClient.java:147)

       at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:161)

atorg.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:65)

       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
       at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118)
       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90)

atorg.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1759)

question on Hadoop configuration for non cpu intensive jobs - 0.15.1

Reply via email to