[ 
https://issues.apache.org/jira/browse/SPARK-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15466916#comment-15466916
 ] 

Jagadeesan A S commented on SPARK-12844:
----------------------------------------

The algebraic properties have already been taken care by this 
[commit|https://github.com/apache/spark/commit/fb7e21797ed618d9754545a44f8f95f75b66757a]
 on [SPARK-13339|https://issues.apache.org/jira/browse/SPARK-13339]
cc [~srowen]

> Spark documentation should be more precise about the algebraic properties of 
> functions in various transformations
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12844
>                 URL: https://issues.apache.org/jira/browse/SPARK-12844
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation
>            Reporter: Jimmy Lin
>            Priority: Minor
>
> Spark documentation should be more precise about the algebraic properties of 
> functions in various transformations. The way the current documentation is 
> written is potentially confusing. For example, in Spark 1.6, the scaladoc for 
> reduce in RDD says:
> > Reduces the elements of this RDD using the specified commutative and 
> > associative binary operator.
> This is precise and accurate. In the documentation of reduceByKey in 
> PairRDDFunctions, on the other hand, it says:
> > Merge the values for each key using an associative reduce function.
> To be more precise, this function must also be commutative in order for the 
> computation to be correct. Writing commutative for reduce and not reduceByKey 
> gives the false impression that the function in the latter does not need to 
> be commutative.
> The same applies to aggregateByKey. To be precise, both seqOp and combOp need 
> to be associative (mentioned) AND commutative (not mentioned) in order for 
> the computation to be correct. It would be desirable to fix these 
> inconsistencies throughout the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to