Re: Is Spark's KMeans unable to handle bigdata?

Georgios Samaras Sat, 03 Sep 2016 10:32:51 -0700

Thank you very much Sean! If you would like, this could serve as an answer
in StackOverflow's question:
[Is Spark's kMeans unable to handle bigdata?](
http://stackoverflow.com/questions/39260820/is-sparks-kmeans-unable-to-handle-bigdata
).


Enjoy your weekend,
George

On Sat, Sep 3, 2016 at 1:22 AM, Sean Owen <[email protected]> wrote:

> I opened https://issues.apache.org/jira/browse/SPARK-17389 to track
> some improvements, but by far the big one is that the init steps
> defaults to 5, when the paper says that 2 is pretty much optimal here.
> It's much faster with that setting.
>
> On Fri, Sep 2, 2016 at 6:45 PM, Georgios Samaras
> <[email protected]> wrote:
> > I am not using the "runs" parameter anyway, but I see your point. If you
> > could point out any modifications in the minimal example I posted, I
> would
> > be more than interested to try them!
> >
>

Re: Is Spark's KMeans unable to handle bigdata?

Reply via email to