[ https://issues.apache.org/jira/browse/SPARK-13707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182257#comment-15182257 ]
Jatin Kumar commented on SPARK-13707: ------------------------------------- Ideally all 2 sec batches should be linked to the final 120 sec batch and one should be able to browse them from UI but I am not aware of the design decisions taken here as it can get quite complex in case of multiple window operations. I would like to work on a fix for this if we can decide on what the behavior should be :) > Streaming UI tab misleading for window operations > ------------------------------------------------- > > Key: SPARK-13707 > URL: https://issues.apache.org/jira/browse/SPARK-13707 > Project: Spark > Issue Type: Bug > Components: Web UI > Affects Versions: 1.6.0 > Reporter: Jatin Kumar > > 'Streaming' tab on spark UI is misleading when the job has a window operation > which changes the batch duration from original streaming context batch > duration. > For instance consider: > {code:java} > val streamingContext = new StreamingContext(sparkConfig, Seconds(2)) > val totalVideoImps = streamingContext.sparkContext.accumulator(0, > "TotalVideoImpressions") > val totalImps = streamingContext.sparkContext.accumulator(0, > "TotalImpressions") > val stream = KafkaReader.KafkaDirectStream(streamingContext) > stream.map(KafkaAdLogParser.parseAdLogRecord) > .filter(record => { > totalImps += 1 > KafkaAdLogParser.isVideoRecord(record) > }) > .map(record => { > totalVideoImps += 1 > record.url > }) > .window(Seconds(120)) > .count().foreachRDD((rdd, time) => { > println("Timestamp: " + ImpressionAggregator.millsToDate(time.milliseconds)) > println("Count: " + rdd.collect()(0)) > println("Total Impressions: " + totalImps.value) > totalImps.setValue(0) > println("Total Video Impressions: " + totalVideoImps.value) > totalVideoImps.setValue(0) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > Batch Size before window operation is 2 sec and then after window batches are > of 120 seconds each. > -- > Above code printed following for my application whereas the UI showed > different numbers. > {noformat} > Timestamp: 2016-03-06 12:02:56,000 > Count: 362195 > Total Impressions: 16882431 > Total Video Impressions: 362195 > Timestamp: 2016-03-06 12:04:56,000 > Count: 367168 > Total Impressions: 19480293 > Total Video Impressions: 367168 > Timestamp: 2016-03-06 12:06:56,000 > Count: 177711 > Total Impressions: 10196677 > Total Video Impressions: 177711 > {noformat} > whereas the spark UI shows different numbers as attached in the image. Also > if we check the start and end index of kafka partition offsets reported by > subsequent batch entries on UI, they do not result in all overall continuous > range. All numbers are fine if we remove the window operation though. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org