RE: OutOfMemory Error

Palleti, Pallavi Thu, 18 Sep 2008 22:50:55 -0700

Yeah. That was the problem. And Hama can be surely useful for large scale 
matrix operations.


But for this problem, I have modified the code to just pass the ID information 
and read the vector information only when it is needed. In this case, it was 
needed only in the reducer phase. This way, it avoided this problem of out of 
memory error and also faster now.

Thanks
Pallavi
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward J. Yoon
Sent: Friday, September 19, 2008 10:35 AM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [email protected]
Subject: Re: OutOfMemory Error

> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.00111111, 3.002, ...... 1.001,....]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Hi all,
>>>
>>>    I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org

RE: OutOfMemory Error

Reply via email to