yannis ats created MAHOUT-1431:
----------------------------------

             Summary: Comparison of Mahout 0.8 vs mahout 0.9 in EMR
                 Key: MAHOUT-1431
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1431
             Project: Mahout
          Issue Type: Question
          Components: Clustering
    Affects Versions: 0.9, 0.8
            Reporter: yannis ats


Hi all,
i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and 
i performed kmeans experiments with both versions in amazon EMR.
What i found is that mahout 0.8 is faster than mahout 0.9
in particular i observed that mahout 0.8 is performing less iterations and 
every iteration of kmeans is faster than mahout 0.9.Every iteration in mahout 
0.8 is twice as fast as that of 0.9
the hadoop version was 1.0.x and the input of the data was roughly 2 million 
datapoints with dimensionality of 1800.
The input parameters in both experiments were exactly the same,modulo the 
initialization which was random in both cases and i can understand that this 
may affect the convergence(the amount of iterations),but i am buffled by the 
fact that every iteration takes almost twice the time in 0.9 vs 0.8

Is this normal?is this  expected?

thank you in advance for your time.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to