[ https://issues.apache.org/jira/browse/SPARK-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728710#comment-15728710 ]
Saisai Shao commented on SPARK-18743: ------------------------------------- >From my understanding, {{FileInputDStream}} is not a record based streaming >connector, it is a file based connector. How to define an event in file based >connector is a question. So in the implementation of Spark Streaming this >event number is deliberately set to 0 for this input DStream. > StreamingContext.textFileStream(directory) has no events shown in Web UI > ------------------------------------------------------------------------ > > Key: SPARK-18743 > URL: https://issues.apache.org/jira/browse/SPARK-18743 > Project: Spark > Issue Type: Bug > Components: Web UI > Affects Versions: 1.6.0 > Environment: Cloudera > Reporter: Viktor Vojnovski > Priority: Minor > Attachments: screenshot-1.png > > > StreamingContext.textFileStream input is not reflected in the Web UI, ie. the > Input rate stays at 0 events/sec (see attached screenshot). > Please find below a reproduction scenario, and a link to the same issue being > reported on the spark user/developer lists. > http://mail-archives.apache.org/mod_mbox/spark-user/201604.mbox/%3CCAEko17iCNeeOzEbwqH9vGAkgXEpH3Rw=bwmkdfoozcx30zj...@mail.gmail.com%3E > http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Streaming-textFileStream-has-no-events-shown-in-web-UI-td17101.html > [vvojnovski@machine:~] % cat a.py > from __future__ import print_function > from pyspark import SparkContext, SparkConf > from pyspark.streaming import StreamingContext > SparkContext.setSystemProperty('spark.executor.instances', '3') > conf = (SparkConf() > .setMaster("yarn-client") > .setAppName("My app") > .set("spark.executor.memory", "1g")) > sc = SparkContext(conf=conf) > ssc = StreamingContext(sc, 5) > lines = ssc.textFileStream("testin") > counts = lines.flatMap(lambda line: line.split(" "))\ > .map(lambda x: (x, 1))\ > .reduceByKey(lambda a, b: a+b) > counts.pprint() > ssc.start() > ssc.awaitTermination() > [vvojnovski@machine:~] % cat testin.input > 1 2 > 3 4 > 5 6 > 7 8 > 9 10 > 11 12 > [vvojnovski@machine:~] % hdfs dfs –mkdir testin > [vvojnovski@machine:~] % spark-submit a.py & > [vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.1 > [vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.2 > [vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org