Hi Evan,
Sorry that I forgot to mention about it. I set the value of K as 10 for the benchmark study.

On Friday 19 September 2014 11:24 PM, Evan R. Sparks wrote:
Hey Meethu - what are you setting "K" to in the benchmarks you show? This can greatly affect the runtime.

On Thu, Sep 18, 2014 at 10:38 PM, Meethu Mathew <meethu.mat...@flytxt.com <mailto:meethu.mat...@flytxt.com>> wrote:

    Hi all,
    Please find attached the image of benchmark results. The table in
    the previous mail got messed up. Thanks.



    On Friday 19 September 2014 10:55 AM, Meethu Mathew wrote:
    Hi all,

    We have come up with an initial distributed implementation of Gaussian
    Mixture Model in pyspark where the parameters are estimated using the
    Expectation-Maximization algorithm.Our current implementation considers
    diagonal covariance matrix for each component.
    We did an initial benchmark study on a 2 node Spark standalone cluster
    setup where each node config is 8 Cores,8 GB RAM, the spark version used
    is 1.0.0. We also evaluated python version of k-means available in spark
    on the same datasets.Below are the results from this benchmark study.
    The reported stats are average from 10 runs.Tests were done on multiple
    datasets with varying number of features and instances.


               Dataset        Gaussian mixture model
                       Kmeans(Python)

    Instances   Dimensions      Avg time per iteration  Time for 100 iterations
        Avg time per iteration  Time for 100 iterations
    0.7million  13
        7s
        12min
          13s   26min
    1.8million  11
        17s
         29min     33s
         53min
    10 million  16
        1.6min  2.7hr
          1.2min        2 hr


    We are interested in contributing this implementation as a patch to
    SPARK. Does MLLib accept python implementations? If not, can we
    contribute to the pyspark component
    I have created a JIRA for the same
    https://issues.apache.org/jira/browse/SPARK-3588  .How do I get the
    ticket assigned to myself?

    Please review and suggest how to take this forward.




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
    <mailto:dev-unsubscr...@spark.apache.org>
    For additional commands, e-mail: dev-h...@spark.apache.org
    <mailto:dev-h...@spark.apache.org>



--

Regards,

*Meethu Mathew*

*Engineer*

*Flytxt*

www.flytxt.com | Visit our blog <http://blog.flytxt.com/> | Follow us <http://www.twitter.com/flytxt> | _Connect on Linkedin <http://www.linkedin.com/home?trk=hb_tab_home_top>_

Reply via email to