I'm not sure; can you try using smaller datasets as input and do some rough benchmarking?
On Saturday, February 13, 2016, Ram VISWANADHA < [email protected]> wrote: > The set has 21,367,781 records. Would it take 17+ hours for 21M records? > > > Best Regards, > Ram > -- > > > > > > > On 2/13/16, 11:31 AM, "Ram VISWANADHA" <[email protected] > <javascript:;>> wrote: > > >Hi, > >I am calling SimilarityAnalysis.cooccurrencesIDS api from Java. Here is > the code https://gist.github.com/ramv-dailymotion/38a32f379865e8ee5a58 > >I am running this on a Spark cluster 3 worker nodes and 1 master node. > Each machine has 108GB RAM and 32 CPUs. What am I doing wrong? Thanks in > advance. > > > >Best Regards, > >Ram > > >
