This email, which includes my questions about spark streaming, is forwarded from user@mailing-list. Sorry about this, because I haven't got any reply yet.
thanks, dachuan. ---------- Forwarded message ---------- From: dachuan <hdc1...@gmail.com> Date: Fri, Jan 24, 2014 at 10:28 PM Subject: real world streaming code To: u...@spark.incubator.apache.org Hello, community, I have three questions about spark streaming. 1, I noticed that one streaming example (StatefulNetworkWordCount) has one interesting phenomenon: since this workload only prints the first 10 rows of the final RDD, this means if the data influx rate is fast enough (much faster than hand typing in keyboard), then the final RDD would have more than one partition, assume it's 2 partitions, but the second partition won't be computed at all because the first partition suffice to serve the first 10 rows. However, these two workloads must make checkpoint to that RDD. This would lead to a very time consuming checkpoint process because the checkpoint to the second partition can only start before it is computed. So, is this workload only designed for demonstration purpose, for example, only designed for one partition RDD? (I have attached a figure to illustrate what I've said, please tell me if mailing list doesn't welcome attachment. A short description about the experiment Hardware specs: 4 cores Software specs: spark local cluster, 5 executors (workers), each one has one core, each executor has 1G memory Data influx speed: 3MB/s Data source: one ServerSocket in local file Streaming App's name: StatefulNetworkWordCount Job generation frequency: one job per second Checkpoint time: once per 10s JobManager.numThreads = 2) (And another workload might have the same problem: PageViewStream's slidingPageCounts) 2, Does anybody have a Top-K wordcount streaming source code? 3, Can anybody share your real world streaming example? for example, including source code, and cluster configuration details? thanks, dachuan. -- Dachuan Huang Cellphone: 614-390-7234 2015 Neil Avenue Ohio State University Columbus, Ohio U.S.A. 43210 -- Dachuan Huang Cellphone: 614-390-7234 2015 Neil Avenue Ohio State University Columbus, Ohio U.S.A. 43210