This email, which includes my questions about spark streaming, is forwarded
from user@mailing-list. Sorry about this, because I haven't got any reply
yet.

thanks,
dachuan.

---------- Forwarded message ----------
From: dachuan <hdc1...@gmail.com>
Date: Fri, Jan 24, 2014 at 10:28 PM
Subject: real world streaming code
To: u...@spark.incubator.apache.org


Hello, community,

I have three questions about spark streaming.

1,
I noticed that one streaming example (StatefulNetworkWordCount) has one
interesting phenomenon:
since this workload only prints the first 10 rows of the final RDD, this
means if the data influx rate is fast enough (much faster than hand typing
in keyboard), then the final RDD would have more than one partition, assume
it's 2 partitions, but the second partition won't be computed at all
because the first partition suffice to serve the first 10 rows. However,
these two workloads must make checkpoint to that RDD. This would lead to a
very time consuming checkpoint process because the checkpoint to the second
partition can only start before it is computed. So, is this workload only
designed for demonstration purpose, for example, only designed for one
partition RDD?

(I have attached a figure to illustrate what I've said, please tell me if
mailing list doesn't welcome attachment.
A short description about the experiment
Hardware specs: 4 cores
Software specs: spark local cluster, 5 executors (workers), each one has
one core, each executor has 1G memory
Data influx speed: 3MB/s
Data source: one ServerSocket in local file
Streaming App's name: StatefulNetworkWordCount
Job generation frequency: one job per second
Checkpoint time: once per 10s
JobManager.numThreads = 2)



(And another workload might have the same problem:
PageViewStream's slidingPageCounts)

2,
Does anybody have a Top-K wordcount streaming source code?

3,
Can anybody share your real world streaming example? for example, including
source code, and cluster configuration details?

thanks,
dachuan.

-- 
Dachuan Huang
Cellphone: 614-390-7234
2015 Neil Avenue
Ohio State University
Columbus, Ohio
U.S.A.
43210



-- 
Dachuan Huang
Cellphone: 614-390-7234
2015 Neil Avenue
Ohio State University
Columbus, Ohio
U.S.A.
43210

Reply via email to