I'm not sure; can you try using smaller datasets as input and do some rough
benchmarking?

On Saturday, February 13, 2016, Ram VISWANADHA <
[email protected]> wrote:

> The set has 21,367,781 records. Would it take 17+ hours for 21M records?
>
>
> Best Regards,
> Ram
> --
>
>
>
>
>
>
> On 2/13/16, 11:31 AM, "Ram VISWANADHA" <[email protected]
> <javascript:;>> wrote:
>
> >Hi,
> >I am calling SimilarityAnalysis.cooccurrencesIDS api from Java. Here is
> the code https://gist.github.com/ramv-dailymotion/38a32f379865e8ee5a58
> >I am running this on a Spark cluster 3 worker nodes and 1 master node.
> Each machine has 108GB RAM and 32 CPUs. What am I doing wrong? Thanks in
> advance.
> >
> >Best Regards,
> >Ram
> >
>

Reply via email to