Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations.
But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way, it avoided this problem of out of memory error and also faster now. Thanks Pallavi -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward J. Yoon Sent: Friday, September 19, 2008 10:35 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [email protected] Subject: Re: OutOfMemory Error > The key is of the form "ID :DenseVector Representation in mahout with I guess vector size seems too large so it'll need a distributed vector architecture (or 2d partitioning strategies) for large scale matrix operations. The hama team investigate these problem areas. So, it will be improved If hama can be used for mahout in the future. /Edward On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote: > > Hadoop Version - 17.1 > io.sort.factor =10 > The key is of the form "ID :DenseVector Representation in mahout with > dimensionality size = 160k" > For example: C1:[,0.00111111, 3.002, ...... 1.001,....] > So, typical size of the key of the mapper output can be 160K*6 (assuming > double in string is represented in 5 bytes)+ 5 (bytes for C1:[]) + size > required to store that the object is of type Text > > Thanks > Pallavi > > > > Devaraj Das wrote: >> >> >> >> >> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote: >> >>> >>> Hi all, >>> >>> I am getting outofmemory error as shown below when I ran map-red on >>> huge >>> amount of data.: >>> java.lang.OutOfMemoryError: Java heap space >>> at >>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52) >>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) >>> at >>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974) >>> at >>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence >>> File.java:3002) >>> at >>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28 >>> 02) >>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) >>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124 >>> The above error comes almost at the end of map job. I have set the heap >>> size >>> to 1GB. Still the problem is persisting. Can someone please help me how >>> to >>> avoid this error? >> What is the typical size of your key? What is the value of io.sort.factor? >> Hadoop version? >> >> >> >> > > -- > View this message in context: > http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
