I am not using the "runs" parameter anyway, but I see your point. If you
could point out any modifications in the minimal example I posted, I would
be more than interested to try them!

On Fri, Sep 2, 2016 at 10:43 AM, Sean Owen <so...@cloudera.com> wrote:

> Eh... more specifically, since Spark 2.0 the "runs" parameter in the
> KMeans mllib implementation has been ignored and is always 1. This
> means a lot of code that wraps this stuff up in arrays could be
> simplified quite a lot. I'll take a shot at optimizing this code and
> see if I can measure an effect.
>
> On Fri, Sep 2, 2016 at 6:33 PM, Sean Owen <so...@cloudera.com> wrote:
> > Yes it works fine, though each iteration of the parallel init step is
> > slow indeed -- about 5 minutes on my cluster. Given your question I
> > think you are actually 'hanging' because resources are being killed.
> >
> > I think this init may need some love and optimization. For example, I
> > think treeAggregate might work better. An Array[Float] may be just
> > fine and cut down memory usage, etc.
> >
> > On Fri, Sep 2, 2016 at 5:47 PM, Georgios Samaras
> > <georgesamaras...@gmail.com> wrote:
> >> So you were able to execute the minimal example I posted?
> >>
> >> I mean that the application doesn't progresses, it hangs (I would be OK
> if
> >> it was just slower). It doesn't seem to me a configuration issue.
> >>
> >> On Fri, Sep 2, 2016 at 1:07 AM, Sean Owen <so...@cloudera.com> wrote:
> >>>
> >>> Hm, what do you mean? k-means|| init is certainly slower because it's
> >>> making passes over the data in order to pick better initial centroids.
> >>> The idea is that you might then spend fewer iterations converging
> >>> later, and converge to a better clustering.
> >>>
> >>> Your problem doesn't seem to be related to scale. You aren't even
> >>> running out of memory it seems. Your memory settings are causing YARN
> >>> to kill the executors for using more memory than they advertise. That
> >>> could mean it never proceeds if this happens a lot.
> >>>
> >>> I don't have any problems with it.
> >>>
> >>> On Thu, Sep 1, 2016 at 11:35 PM, Georgios Samaras
> >>> <georgesamaras...@gmail.com> wrote:
> >>> > Dear all,
> >>> >
> >>> >   the random initialization works well, but the default
> initialization
> >>> > is
> >>> > k-means|| and has made me struggle. Also, I had heard people one year
> >>> > ago
> >>> > struggling with it too, and everybody would just skip it and use
> random,
> >>> > but
> >>> > I cannot keep it inside me!
> >>> >
> >>> >   I have posted a minimal example here..
> >>> >
> >>> > Please advice,
> >>> > George Samaras
> >>
> >>
>

Reply via email to