On Wed, Jul 27, 2011 at 4:13 PM, Jesse Gross <[email protected]> wrote: > On Wed, Jul 27, 2011 at 2:24 PM, Pravin Shelar <[email protected]> wrote: >> On Wed, Jul 27, 2011 at 12:17 PM, Jesse Gross <[email protected]> wrote: >>> On Wed, Jul 27, 2011 at 11:14 AM, Ethan Jackson <[email protected]> wrote: >>>>> One strategy that I have considered is to be able to ask only for flows >>>>> that have a non-zero packet count. That would help with the common case >>>>> where, when there is a large number of flows, they are caused by a port >>>>> scan or some other activity with 1-packet flows. It wouldn't help at >>>>> all in your case. >>>> >>>> You could also have the kernel pass down to userspace what logically >>>> amounts to a list of the flows which have had their statistics change >>>> in the past 10 seconds. A bloom filter would be a sensible approach. >>>> Again, probably won't help at all in Simon's case, and may or may-not >>>> be a useful optimization above simply not pushing down statistics for >>>> flows which have a zero packet count. >>> >>> I don't think that you could implement a Bloom filter like this in a >>> manner that wouldn't cause cache contention. Probably you would still >>> need to iterate over every flow in the kernel, you would just be >>> comparing last used time to current time - 10 instead of packet count >>> not equal to zero. >>> >> cpu cache contention can be fixed by partitioning all flow by >> something (e.g. port no) and assigning cache replacement processing >> to a cpu. replacement algo could simple as active and inactive LRU >> list. this is how kernel page cache replacement looks like from high >> level. > > This isn't really a cache replacement problem though. Maybe that's > the high level goal that's being solved but I wouldn't want to make > that assumption in the kernel as it would likely impose too many > restrictions on what userspace can do if it wants to implement > something completely different in the future. Anything the kernel > provides should just be a simple primitive, potentially analogous to > the referenced bit that you would find in a page table. > > You also can't impose a CPU partitioning scheme on flows because we > don't control the CPU that packets are being processed on. That's > determined by the originator of the packet (such as RSS on the NIC) > and then we just handle it on the same CPU. However, you can use a > per-CPU data structure to store information regardless of flow and > then merge them later. This actually works well enough for something > like a Bloom filter because you can superimpose the results on top of > each other without a problem.
I am not sure why packet CPU can not be controlled by using interrupt affinity/RSS. I think partitioning on basis of cpu or port number is good for scalability. Thanks, Pravin. _______________________________________________ dev mailing list [email protected] http://openvswitch.org/mailman/listinfo/dev
