I've seen a ton of examples for Storm so far (I'm a noob)... but what I don't understand is how the spouts do parallelism. Suppose I want to process a giant file in Storm, each source has to read and process 64MB of the input file. I can't envision a topology like this yet (because I'm ignorant). Q1: How does each spout know which part of the giant input file to read? Q2: How does each spout get told which file to read? Q3: how do I know when the input file is completely processed? In the final bolts' emit logic, can they all communicate to one final bolt and tell them which piece of the source they've processed, and the final bolt checks off all the done messages and when done, does - ? How can it signal the topology owner it's done? Is there a online forum that is easier to use than this email list server thing, where I can ask and browse questions? This email list server is so early 1990's, it's shocking...
All the online examples I've read about Storm have spouts that produce essentially random information forever. They are essentially near-useless examples, to me. Processing a giant file, or processing data from a live generator of actual data, are much better. I hope I find some decent ones this weekend. Thanks!
