Thank you very much Sean! If you would like, this could serve as an answer in StackOverflow's question: [Is Spark's kMeans unable to handle bigdata?]( http://stackoverflow.com/questions/39260820/is-sparks-kmeans-unable-to-handle-bigdata ).
Enjoy your weekend, George On Sat, Sep 3, 2016 at 1:22 AM, Sean Owen <so...@cloudera.com> wrote: > I opened https://issues.apache.org/jira/browse/SPARK-17389 to track > some improvements, but by far the big one is that the init steps > defaults to 5, when the paper says that 2 is pretty much optimal here. > It's much faster with that setting. > > On Fri, Sep 2, 2016 at 6:45 PM, Georgios Samaras > <georgesamaras...@gmail.com> wrote: > > I am not using the "runs" parameter anyway, but I see your point. If you > > could point out any modifications in the minimal example I posted, I > would > > be more than interested to try them! > > >