How state is stored in Spouts? How can I tell when I'm done?

Eric Frazer Sat, 31 Oct 2015 08:38:44 -0700

I've seen a ton of examples for Storm so far (I'm a noob)... but what I don't 
understand is how the spouts do parallelism. Suppose I want to process a giant 
file in Storm, each source has to read and process 64MB of the input file. I 
can't envision a topology like this yet (because I'm ignorant). Q1: How does 
each spout know which part of the giant input file to read? Q2: How does each 
spout get told which file to read? Q3: how do I know when the input file is 
completely processed? In the final bolts' emit logic, can they all communicate 
to one final bolt and tell them which piece of the source they've processed, 
and the final bolt checks off all the done messages and when done, does - ? How 
can it signal the topology owner it's done? Is there a online forum that is 
easier to use than this email list server thing, where I can ask and browse 
questions? This email list server is so early 1990's, it's shocking...


All the online examples I've read about Storm have spouts that produce 
essentially random information forever. They are essentially near-useless 
examples, to me. Processing a giant file, or processing data from a live 
generator of actual data, are much better. I hope I find some decent ones this 
weekend.

Thanks!

How state is stored in Spouts? How can I tell when I'm done?

Reply via email to