[
https://issues.apache.org/jira/browse/SPARK-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
François Garillot updated SPARK-9819:
-------------------------------------
Description:
{{reduceByWindow}} has an optional {{invReduceFunc}} argument which allows the
reduction to be performed incrementally.
The incremental reduction [performed in
{{ReducedWindowedDStream}}|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala#L157]
only depends on the reduction being associative (as shown by the reduce
applied to {{oldValues}}), but does not require those functions to be
commutative.
In particular, if the inverse reduction is the non-commutative, non-associative
substraction (e.g. what you're computing is a running sum), it's necessary to
know that the intermediate result (to be substracted from) is the first
argument of {{invReduceFunc}} and that the second argument is the old value to
substract.
It's only in the commutative case that we don't care which is which.
The Scaladoc for the various overloads of {{reduceByWindow}} should let the
user know which is the accumulator, and which is the old value. A concise,
unambiguous way to state this is to write an inversion law in the Scaladoc:
{{invReduceFunc(reduceFunc(x, y), y) = x}}
was:
{{reduceByWindow}} has an optional {{invReduceFunc}} argument which allows the
reduction to be performed incrementally.
The incremental reduction [performed in
{{ReducedWindowedDStream}}|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala#L157]
only depends on the reduction and its inverse function being associative (as
shown by the reduce applied to {{oldValues}}), but does not require those
functions to be commutative.
In particular, if the inverse reduction is the non-commutative, but associative
substraction (e.g. what you're computing is a running sum), it's necessary to
know that the intermediate result (to be substracted from) is the first
argument of {{invReduceFunc}} and that the second argument is the old value to
substract.
It's only in the commutative case that we don't care which is which.
The Scaladoc for the various overloads of {{reduceByWindow}} should let the
user know which is the accumulator, and which is the old value. A concise,
unambiguous way to state this is to write an inversion law in the Scaladoc:
{{invReduceFunc(reduceFunc(x, y), y) = x}}
We should also remind the user that he should use associative reduction (&
inverse reduction) functions, since the computation makes that assumption.
> reduceBy(KeyAnd)Window should specify which is the accumulator argument in
> invReduceFunc
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-9819
> URL: https://issues.apache.org/jira/browse/SPARK-9819
> Project: Spark
> Issue Type: Improvement
> Components: Documentation, Streaming
> Affects Versions: 1.4.1
> Environment: All
> Reporter: François Garillot
> Priority: Minor
> Labels: documentation, streaming
>
> {{reduceByWindow}} has an optional {{invReduceFunc}} argument which allows
> the reduction to be performed incrementally.
> The incremental reduction [performed in
> {{ReducedWindowedDStream}}|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala#L157]
> only depends on the reduction being associative (as shown by the reduce
> applied to {{oldValues}}), but does not require those functions to be
> commutative.
> In particular, if the inverse reduction is the non-commutative,
> non-associative substraction (e.g. what you're computing is a running sum),
> it's necessary to know that the intermediate result (to be substracted from)
> is the first argument of {{invReduceFunc}} and that the second argument is
> the old value to substract.
> It's only in the commutative case that we don't care which is which.
> The Scaladoc for the various overloads of {{reduceByWindow}} should let the
> user know which is the accumulator, and which is the old value. A concise,
> unambiguous way to state this is to write an inversion law in the Scaladoc:
> {{invReduceFunc(reduceFunc(x, y), y) = x}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]