[ 
https://issues.apache.org/jira/browse/SPARK-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092699#comment-15092699
 ] 

Joseph K. Bradley commented on SPARK-12703:
-------------------------------------------

Oh no problem.  I just sent a Pull Request (PR) which you can view here: 
[https://github.com/apache/spark/pull/10707/files]
Could you please check it out and make sure the updated code works for you?  
Thanks!

If you do want to get involved in contributing more, you can find (a lot) more 
info here: 
[https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark]

> Spark KMeans Documentation Python Api
> -------------------------------------
>
>                 Key: SPARK-12703
>                 URL: https://issues.apache.org/jira/browse/SPARK-12703
>             Project: Spark
>          Issue Type: Documentation
>          Components: MLlib
>            Reporter: Anton
>            Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the documentation of Spark's Kmeans - Python api:
> http://spark.apache.org/docs/latest/mllib-clustering.html#k-means
> the cost of the final result is calculated using the 'error()' function where 
> its returning:
> {quote}
> return sqrt(sum([x**2 for x in (point - center)]))
> {quote}
> As I understand, it's wrong to use sqrt() and it should be omitted:
> {quote} return sum([x**2 for x in (point - center)]).{quote}
> Please refer to :
> https://en.wikipedia.org/wiki/K-means_clustering#Description
> Where you can see that the power is canceling the square.
> What do you think? It's minor but wasted me a few min to understand why the 
> result isn't what I'm expecting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to