thanks, Mayur. I will read this workload's code first.

On Mon, Jan 27, 2014 at 12:37 PM, Mayur Rustagi <mayur.rust...@gmail.com>wrote:

> Now add these jars to the lib folder of the streaming project as well as in
> the sparkstreamingcontext object jar list:
> https://www.dropbox.com/sh/00sy9mv8qsefwc1/vsEXF0aHsJ
> These are algebird jars.
>
> This also contains the algebird scala for streaming uniques:
> https://www.dropbox.com/s/ydyn7kd75hhnnpo/Algebird.scala
>
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Mon, Jan 27, 2014 at 11:00 PM, Mayur Rustagi <mayur.rust...@gmail.com
> >wrote:
>
> > I can help you setup streaming with algebird for Uniques.. I suppose you
> > can extend that to top K using algebird functions.
> > First why dont you setup spark streaming on your machine using this
> guide:
> >
> >
> http://docs.sigmoidanalytics.com/index.php/Running_A_Simple_Streaming_Job_in_Local_Machine
> > Then lemme rummage around for my algebird codebase.
> > Regards
> > Mayur
> >
> >
> > Mayur Rustagi
> > Ph: +919632149971
> > h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> > https://twitter.com/mayur_rustagi
> >
> >
> >
> > On Mon, Jan 27, 2014 at 10:52 PM, dachuan <hdc1...@gmail.com> wrote:
> >
> >> This email, which includes my questions about spark streaming, is
> >> forwarded from user@mailing-list. Sorry about this, because I haven't
> >> got any reply yet.
> >>
> >> thanks,
> >> dachuan.
> >>
> >>
> >> ---------- Forwarded message ----------
> >> From: dachuan <hdc1...@gmail.com>
> >> Date: Fri, Jan 24, 2014 at 10:28 PM
> >> Subject: real world streaming code
> >> To: u...@spark.incubator.apache.org
> >>
> >>
> >> Hello, community,
> >>
> >> I have three questions about spark streaming.
> >>
> >> 1,
> >> I noticed that one streaming example (StatefulNetworkWordCount) has one
> >> interesting phenomenon:
> >> since this workload only prints the first 10 rows of the final RDD, this
> >> means if the data influx rate is fast enough (much faster than hand
> typing
> >> in keyboard), then the final RDD would have more than one partition,
> assume
> >> it's 2 partitions, but the second partition won't be computed at all
> >> because the first partition suffice to serve the first 10 rows. However,
> >> these two workloads must make checkpoint to that RDD. This would lead
> to a
> >> very time consuming checkpoint process because the checkpoint to the
> second
> >> partition can only start before it is computed. So, is this workload
> only
> >> designed for demonstration purpose, for example, only designed for one
> >> partition RDD?
> >>
> >> (I have attached a figure to illustrate what I've said, please tell me
> if
> >> mailing list doesn't welcome attachment.
> >> A short description about the experiment
> >> Hardware specs: 4 cores
> >> Software specs: spark local cluster, 5 executors (workers), each one has
> >> one core, each executor has 1G memory
> >> Data influx speed: 3MB/s
> >> Data source: one ServerSocket in local file
> >> Streaming App's name: StatefulNetworkWordCount
> >> Job generation frequency: one job per second
> >> Checkpoint time: once per 10s
> >> JobManager.numThreads = 2)
> >>
> >>
> >>
> >> (And another workload might have the same problem:
> >> PageViewStream's slidingPageCounts)
> >>
> >> 2,
> >> Does anybody have a Top-K wordcount streaming source code?
> >>
> >> 3,
> >> Can anybody share your real world streaming example? for example,
> >> including source code, and cluster configuration details?
> >>
> >> thanks,
> >> dachuan.
> >>
> >> --
> >> Dachuan Huang
> >> Cellphone: 614-390-7234
> >> 2015 Neil Avenue
> >> Ohio State University
> >> Columbus, Ohio
> >> U.S.A.
> >> 43210
> >>
> >>
> >>
> >> --
> >> Dachuan Huang
> >> Cellphone: 614-390-7234
> >> 2015 Neil Avenue
> >> Ohio State University
> >> Columbus, Ohio
> >> U.S.A.
> >> 43210
> >>
> >
> >
>



-- 
Dachuan Huang
Cellphone: 614-390-7234
2015 Neil Avenue
Ohio State University
Columbus, Ohio
U.S.A.
43210

Reply via email to