Hi Nathan, yes that is about right; the detector readout is designed to handle up to 40 Tbits/second, so the rest of the system needs to be able to cope with this as well. Our tolerance for data loss in the network system is basically at the couple of permille level; our total data loss due to dead time and other problems throughout the system is (needs to be) 1-2% in stable operation.
Hope that's useful. Thanks, Vava On Jun 23, 2015, at 3:34 PM, Nathan Leung <[email protected]> wrote: > Forgive me if my math is off, but are you pushing 4TB of raw data every > second? I can't say whether storm would work or not, but I'm pretty sure > that if it does the configuration will look very different from what most > people are using. What are your tolerances for data loss? > > On Jun 23, 2015 3:56 AM, "Tim Head" <[email protected]> wrote: > Hello Storm devs, > > I work for one of the big four experiments at CERN. We are currently > thinking about "the future of our real time data processing". I tweeted > about it <https://twitter.com/betatim/status/611503830161862657> and > @ptgoetz replied saying "post to dev/usr". This is that post. > > Below some brief introduction to what LHCb is, how we process data, and > some constraints/context. I tried to think of all the relevant info and > invariably will have forgotten some... > > We are currently surveying the landscape of real time stream processing. > The question we are trying to answer is: "What toolkit could we use as > starting point for the LHCb experiment's real time data processing if we > started from scratch today?" > > Storm looks attractive, however I know nobody who has real world experience > with it. So my first question would be: is it worth investigating Storm > further for this kind of use-case? > > LHCb is one of the four big experiments at CERN < > http://home.web.cern.ch/about>, the home of the Large Hadron Collider (you > might have heard of us when we discovered the Higgs boson). > > A brief outline of what LHCb does when processing data: > > The LHCb trigger reduces/filters the stream of messages (or events) from > the experimental hardware. It runs on a large farm of about 27000 physical > cores with about 1-2GB of RAM per core. Its task is to decide which > messages are worth recording to disk and which to reject. Most > messages/events are rejected. > > Each message contains low level features (which pixels in the silicon > sensor were hit, how much energy was deposited in read out channel X, etc). > Part of the decision process is to construct higher level features from > these. Usually the more high level a feature the more expensive it is to > compute and the more discriminative it is. We routinely use machine > learning algorithms like random forests/gradient boosted decision trees/NNs > as part of the decision making. > > In the current setup each message is delivered to exactly one computing > node, which processes it in full. If the message is accepted it is sent > onwards for storing to disk. > > One avenue we are investigating is if it is feasible to instead pass a > message around several "specialised" nodes that each do part of the > processing. This would be attractive as it means you could easily integrate > GPUs into the flow, or spin up more instances for a certain part of the > processing if it is taking too long. Such a scheme would need extremely > cheap serialisation. > > A good (longer) introduction written up for a NIPS workshop [pdf]: > > <https://www.dropbox.com/s/fcq3v28oxdvp6iw/RealTimeAnalysisLHC.pdf?dl=0> > > Some boundary conditions/context: > > - input rate of ~40million messages per second > - 100kB per message, without higher level features > - ~O(60) cores per box (extrapolating the tick-tock of Intel's roadmap to > 2020) > - ~O(2)GB RAM per core (very hard to tell where this will be in 2020) > - network design is throughput driven, bound to the connectivity > requirements. Events are distributed for filtering with a one-to-one > communication, usually implemented with a "cheap" CLOS-topology like > networking. The topology doesn't matter that much as long as it is > non-blocking, that is, all nodes can send to the next stage of the pipeline > guaranteeing a certain required throughput. > - every messages has to be processed, no duplication (at the output of the > processing chain) of messages > - most messages are rejected, the output rate is (much) lower than the > input rate > - fault tolerance, when the LHC is delivering collisions to the experiment > we have to be ready, otherwise valuable beam time is wasted, there is no > pausing or going back > - need to be able to re-run the processing pipeline on a single machine > after the fact for debugging > > If this isn't the right place for discussing this or some other medium is > more efficient, let me know. Did I miss some obvious constraint/bit of > context? > > Thanks a lot for your time, you'd think LHCb has a lot of experts on this > topic but I suspect there is even more (real world, modern) expertise out > there :-) > > T > ps. I CC'ed some of the other guys from LHCb who are interested in this.
