> i was thinking along the lines of spreading the workload of reading packets and/or computing the aggregate stats (timer,counter, ..) for all metric keys (stataccum) across different cores, because in my experience those parts are most cpu intense and could (in theory) be fairly easily parallelized (esp the latter), but anyway no point in going deeper into this without doing benchmarks of current code first.
Be careful here. There was a post on goroutines a while back that said unless the work to be done is many milliseconds of time, then the setup and teardown will outweigh the benefit of actually using a goroutine. This might have changed, but for processing UDP, there's no way. Parallelizing the UDP receive is almost certainly not going to be faster in a concurrent setup. The number of UDP packets you'd need to be processing per second to make this worthwhile is staggeringly high. Putting those parsed data sets onto a channel for math to be done might show some benefit, if only to allow the network code to receive packets more quickly, giving the appearance of lower latency. But, I suspect the cost of disassembling packets is higher than the math you're going to do. This is made more true by the fact that you need shared (locked) structures for your counts. Were these connections to be TCP, there would be more value. But since it's UDP, which has no concept of a stream, I suspect it'll be a mostly wasted effort. Better to speed up the net library than try to distribute the work and potentially have to drag a SK_buf across NUMA nodes. > > > > Sorry to not be much help, > > thanks either way! > > _______________________________________________ > Heka mailing list > [email protected] > https://mail.mozilla.org/listinfo/heka
_______________________________________________ Heka mailing list [email protected] https://mail.mozilla.org/listinfo/heka

