[
https://issues.apache.org/jira/browse/SPARK-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-12844.
-------------------------------
Resolution: Duplicate
> Spark documentation should be more precise about the algebraic properties of
> functions in various transformations
> -----------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-12844
> URL: https://issues.apache.org/jira/browse/SPARK-12844
> Project: Spark
> Issue Type: Documentation
> Components: Documentation
> Reporter: Jimmy Lin
> Priority: Minor
>
> Spark documentation should be more precise about the algebraic properties of
> functions in various transformations. The way the current documentation is
> written is potentially confusing. For example, in Spark 1.6, the scaladoc for
> reduce in RDD says:
> > Reduces the elements of this RDD using the specified commutative and
> > associative binary operator.
> This is precise and accurate. In the documentation of reduceByKey in
> PairRDDFunctions, on the other hand, it says:
> > Merge the values for each key using an associative reduce function.
> To be more precise, this function must also be commutative in order for the
> computation to be correct. Writing commutative for reduce and not reduceByKey
> gives the false impression that the function in the latter does not need to
> be commutative.
> The same applies to aggregateByKey. To be precise, both seqOp and combOp need
> to be associative (mentioned) AND commutative (not mentioned) in order for
> the computation to be correct. It would be desirable to fix these
> inconsistencies throughout the documentation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]