Bo Jin created SPARK-12353:
------------------------------
Summary: wrong output for countByValue and countByValueAndWindow
Key: SPARK-12353
URL: https://issues.apache.org/jira/browse/SPARK-12353
Project: Spark
Issue Type: Bug
Components: Documentation, Input/Output, PySpark, Streaming
Affects Versions: 1.5.2
Environment: Ubuntu 14.04, Python 2.7.6
Reporter: Bo Jin
http://stackoverflow.com/q/34114585/4698425
Function countByValue and countByValueAndWindow return one single number which
is the count of distinct elements, instead of a list of (k,v) pairs.
It's inconsistent with the documentation:
countByValue: When called on a DStream of elements of type K, return a new
DStream of (K, Long) pairs where the value of each key is its frequency in each
RDD of the source DStream.
countByValueAndWindow: When called on a DStream of (K, V) pairs, returns a new
DStream of (K, Long) pairs where the value of each key is its frequency within
a sliding window. Like in reduceByKeyAndWindow, the number of reduce tasks is
configurable through an optional argument.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]