We have done this by blocking but without using BlockMatrix. We used our
own blocking mechanism because BlockMatrix didn't exist in Spark 1.2. What
is the size of your block? How much memory are you giving to the executors?
I assume you are running on YARN, if so you would want to make sure your
yarn executor memory overhead is set to a higher value than default.

Just curious, could you also explain why you need matrix multiplication
with transpose? Smells like similarity computation.

Regards
Sab

On Thu, Nov 12, 2015 at 7:27 PM, Eilidh Troup <e.tr...@epcc.ed.ac.uk> wrote:

> Hi,
>
> I’m trying to multiply a large squarish matrix with its transpose.
> Eventually I’d like to work with matrices of size 200,000 by 500,000, but
> I’ve started off first with 100 by 100 which was fine, and then with 10,000
> by 10,000 which failed with an out of memory exception.
>
> I used MLlib and BlockMatrix and tried various block sizes, and also tried
> switching disk serialisation on.
>
> We are running on a small cluster, using a CSV file in HDFS as the input
> data.
>
> Would anyone with experience of multiplying large, dense matrices in spark
> be able to comment on what to try to make this work?
>
> Thanks,
> Eilidh
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 

Architect - Big Data
Ph: +91 99805 99458

Manthan Systems | *Company of the year - Analytics (2014 Frost and Sullivan
India ICT)*
+++

Reply via email to