Re: Hadoop scripting when to use dfs -put

Håvard Wahl Kongsgård Mon, 13 Feb 2012 13:41:22 -0800

My environment heap size varies from 18GB to 2GB
in mapred-site.xml mapred.child.java.opts = -Xmx512M


System Ubuntu 10.04 LTS, java-6-sun-1.6.0.26, ,latest cloudera version of hadoop


This log from the tasklog
Original exception was:
java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:376)
        at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hadoop.typedbytes.TypedBytesInput.readRawBytes(TypedBytesInput.java:212)
        at 
org.apache.hadoop.typedbytes.TypedBytesInput.readRaw(TypedBytesInput.java:152)
        at 
org.apache.hadoop.streaming.io.TypedBytesOutputReader.readKeyValue(TypedBytesOutputReader.java:51)
        at 
org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:418)


I don't have a recursive loop like while or something else

my dumbo code

multi_tree() is just a simple function

where the error handling is
try:
except:
pass

def mapper(key, value):
   v = value.split(" ")[0]
   yield multi_tree(v),1


if __name__ == "__main__":
   import dumbo
   dumbo.run(mapper)


-Håvard


On Mon, Feb 13, 2012 at 8:52 PM, Rohit <[email protected]> wrote:
> Hi,
>
> What threw the heap error? Was it the Java VM, or the shell environment?
>
> It would be good to look at free RAM memory on your system before and after 
> you ran the script as well, to see if your system is running low on memory.
>
> Are you using a recursive loop in your script?
>
> Thanks,
> Rohit
>
>
> Rohit Bakhshi
>
>
>
>
>
> www.hortonworks.com (http://www.hortonworks.com/)
>
>
>
>
>
> On Monday, February 13, 2012 at 10:39 AM, Håvard Wahl Kongsgård wrote:
>
>> Hi, I originally posted this on the dumbo forum, but it's more a
>> general scripting hadoop issue.
>>
>> When testing a simple script that created some local files
>> and then copied them to hdfs
>> with os.system("hadoop dfs -put /home/havard/bio_sci/file.json
>> /tmp/bio_sci/file.json")
>>
>> the tasks fail with out of heap memory. The files are tiny, and I have
>> tried increasing the
>> heap size. When skipping the hadoop dfs -put, the tasks do not fail.
>>
>> Is it wrong to use hadoop dfs -put inside running a script with
>> hadoop? Should I instead
>> transfer the files at the end with a combiner, or simply mount hdfs
>> locally and write directly to hdfs? Any general suggestions?
>>
>>
>> --
>> Håvard Wahl Kongsgård
>> NTNU
>>
>> http://havard.security-review.net/
>



-- 
Håvard Wahl Kongsgård
NTNU

http://havard.security-review.net/

Re: Hadoop scripting when to use dfs -put

Reply via email to