[ 
https://issues.apache.org/jira/browse/SPARK-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728710#comment-15728710
 ] 

Saisai Shao commented on SPARK-18743:
-------------------------------------

>From my understanding, {{FileInputDStream}} is not a record based streaming 
>connector, it is a file based connector. How to define an event in file based 
>connector is a question. So in the implementation of Spark Streaming this 
>event number is deliberately set to 0 for this input DStream.

> StreamingContext.textFileStream(directory) has no events shown in Web UI
> ------------------------------------------------------------------------
>
>                 Key: SPARK-18743
>                 URL: https://issues.apache.org/jira/browse/SPARK-18743
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 1.6.0
>         Environment: Cloudera
>            Reporter: Viktor Vojnovski
>            Priority: Minor
>         Attachments: screenshot-1.png
>
>
> StreamingContext.textFileStream input is not reflected in the Web UI, ie. the 
> Input rate stays at 0 events/sec (see attached screenshot).
> Please find below a reproduction scenario, and a link to the same issue being 
> reported on the spark user/developer lists.
> http://mail-archives.apache.org/mod_mbox/spark-user/201604.mbox/%3CCAEko17iCNeeOzEbwqH9vGAkgXEpH3Rw=bwmkdfoozcx30zj...@mail.gmail.com%3E
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Streaming-textFileStream-has-no-events-shown-in-web-UI-td17101.html
> [vvojnovski@machine:~] % cat a.py
> from __future__ import print_function
> from pyspark import SparkContext, SparkConf
> from pyspark.streaming import StreamingContext
> SparkContext.setSystemProperty('spark.executor.instances', '3')
> conf = (SparkConf()
>         .setMaster("yarn-client")
>         .setAppName("My app")
>         .set("spark.executor.memory", "1g"))
> sc = SparkContext(conf=conf)
> ssc = StreamingContext(sc, 5)
> lines = ssc.textFileStream("testin")
> counts = lines.flatMap(lambda line: line.split(" "))\
>               .map(lambda x: (x, 1))\
>               .reduceByKey(lambda a, b: a+b)
> counts.pprint()
> ssc.start()
> ssc.awaitTermination()
> [vvojnovski@machine:~] % cat testin.input 
> 1 2
> 3 4
> 5 6
> 7 8
> 9 10
> 11 12
> [vvojnovski@machine:~] % hdfs dfs –mkdir testin
> [vvojnovski@machine:~] % spark-submit a.py &
> [vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.1
> [vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.2
> [vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to