[
https://issues.apache.org/jira/browse/SPARK-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
François Garillot updated SPARK-9819:
-------------------------------------
Comment: was deleted
(was: https://github.com/apache/spark/pull/8103)
> reduceBy(KeyAnd)Window should specify which is the accumulator argument in
> invReduceFunc
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-9819
> URL: https://issues.apache.org/jira/browse/SPARK-9819
> Project: Spark
> Issue Type: Improvement
> Components: Documentation, Streaming
> Affects Versions: 1.4.1
> Environment: All
> Reporter: François Garillot
> Priority: Minor
> Labels: documentation, streaming
>
> {{reduceByWindow}} has an optional {{invReduceFunc}} argument which allows
> the reduction to be performed incrementally.
> The incremental reduction [performed in
> {{ReducedWindowedDStream}}|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala#L157]
> only depends on the reduction and its inverse function being associative (as
> shown by the reduce applied to {{oldValues}}), but does not require those
> functions to be commutative.
> In particular, if the inverse reduction is the non-commutative, but
> associative substraction (e.g. what you're computing is a running sum), it's
> necessary to know that the intermediate result (to be substracted from) is
> the first argument of {{invReduceFunc}} and that the second argument is the
> old value to substract.
> It's only in the commutative case that we don't care which is which.
> The Scaladoc for the various overloads of {{reduceByWindow}} should let the
> user know which is the accumulator, and which is the old value. A concise,
> unambiguous way to state this is to write an inversion law in the Scaladoc:
> {{invReduceFunc(reduceFunc(x, y), y) = x}}
> We should also remind the user that he should use associative reduction (&
> inverse reduction) functions, since the computation makes that assumption.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]