If vector size too large, Current Hama will returns out of memory, too. So, I would like to add the 2D layout version to 0.1 release plan for parallel matrix multiplication.
Therefore, I'll renaming some classes. MultiplicationMap.java -> Mult1DLayoutMap.java MultiplicationReduce.java -> Mult1DLayoutReduce.java /Edward On Fri, Sep 19, 2008 at 5:41 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: > Great experience! > > /Edward > > On Fri, Sep 19, 2008 at 2:50 PM, Palleti, Pallavi > <[EMAIL PROTECTED]> wrote: >> Yeah. That was the problem. And Hama can be surely useful for large scale >> matrix operations. >> >> But for this problem, I have modified the code to just pass the ID >> information and read the vector information only when it is needed. In this >> case, it was needed only in the reducer phase. This way, it avoided this >> problem of out of memory error and also faster now. >> >> Thanks >> Pallavi >> -----Original Message----- >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward J. Yoon >> Sent: Friday, September 19, 2008 10:35 AM >> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [email protected] >> Subject: Re: OutOfMemory Error >> >>> The key is of the form "ID :DenseVector Representation in mahout with >> >> I guess vector size seems too large so it'll need a distributed vector >> architecture (or 2d partitioning strategies) for large scale matrix >> operations. The hama team investigate these problem areas. So, it will >> be improved If hama can be used for mahout in the future. >> >> /Edward >> >> On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote: >>> >>> Hadoop Version - 17.1 >>> io.sort.factor =10 >>> The key is of the form "ID :DenseVector Representation in mahout with >>> dimensionality size = 160k" >>> For example: C1:[,0.00111111, 3.002, ...... 1.001,....] >>> So, typical size of the key of the mapper output can be 160K*6 (assuming >>> double in string is represented in 5 bytes)+ 5 (bytes for C1:[]) + size >>> required to store that the object is of type Text >>> >>> Thanks >>> Pallavi >>> >>> >>> >>> Devaraj Das wrote: >>>> >>>> >>>> >>>> >>>> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote: >>>> >>>>> >>>>> Hi all, >>>>> >>>>> I am getting outofmemory error as shown below when I ran map-red on >>>>> huge >>>>> amount of data.: >>>>> java.lang.OutOfMemoryError: Java heap space >>>>> at >>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52) >>>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) >>>>> at >>>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974) >>>>> at >>>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence >>>>> File.java:3002) >>>>> at >>>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28 >>>>> 02) >>>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511) >>>>> at >>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040) >>>>> at >>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) >>>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124 >>>>> The above error comes almost at the end of map job. I have set the heap >>>>> size >>>>> to 1GB. Still the problem is persisting. Can someone please help me how >>>>> to >>>>> avoid this error? >>>> What is the typical size of your key? What is the value of io.sort.factor? >>>> Hadoop version? >>>> >>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html >>> Sent from the Hadoop core-user mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> Best regards, Edward J. Yoon >> [EMAIL PROTECTED] >> http://blog.udanax.org >> > > > > -- > Best regards, Edward J. Yoon > [EMAIL PROTECTED] > http://blog.udanax.org > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
