Bo Jin created SPARK-12353:
------------------------------

             Summary: wrong output for countByValue and countByValueAndWindow
                 Key: SPARK-12353
                 URL: https://issues.apache.org/jira/browse/SPARK-12353
             Project: Spark
          Issue Type: Bug
          Components: Documentation, Input/Output, PySpark, Streaming
    Affects Versions: 1.5.2
         Environment: Ubuntu 14.04, Python 2.7.6
            Reporter: Bo Jin


http://stackoverflow.com/q/34114585/4698425

Function countByValue and countByValueAndWindow return one single number which 
is the count of distinct elements, instead of a list of (k,v) pairs.

It's inconsistent with the documentation: 

countByValue: When called on a DStream of elements of type K, return a new 
DStream of (K, Long) pairs where the value of each key is its frequency in each 
RDD of the source DStream.

countByValueAndWindow: When called on a DStream of (K, V) pairs, returns a new 
DStream of (K, Long) pairs where the value of each key is its frequency within 
a sliding window. Like in reduceByKeyAndWindow, the number of reduce tasks is 
configurable through an optional argument.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to