> i was thinking along the lines of spreading the workload of reading
packets and/or computing the aggregate stats (timer,counter, ..) for all
metric keys (stataccum) across different cores, because in my experience
those parts are most cpu intense and could (in theory) be fairly easily
parallelized (esp the latter), but anyway no point in going deeper into
this without doing benchmarks of current code first.

Be careful here. There was a post on goroutines a while back that said
unless the work to be done is many milliseconds of time, then the setup and
teardown will outweigh the benefit of actually using a goroutine. This
might have changed, but for processing UDP, there's no way. Parallelizing
the UDP receive is almost certainly not going to be faster in a concurrent
setup. The number of UDP packets you'd need to be processing per second to
make this worthwhile is staggeringly high.

Putting those parsed data sets onto a channel for math to be done might
show some benefit, if only to allow the network code to receive packets
more quickly, giving the appearance of lower latency. But, I suspect the
cost of disassembling packets is higher than the math you're going to do.
This is made more true by the fact that you need shared (locked) structures
for your counts.

Were these connections to be TCP, there would be more value. But since it's
UDP, which has no concept of a stream, I suspect it'll be a mostly wasted
effort. Better to speed up the net library than try to distribute the work
and potentially have to drag a SK_buf across NUMA nodes.

>
>
> > Sorry to not be much help,
>
> thanks either way!
>
> _______________________________________________
> Heka mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/heka
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to