On 17 Jan 2012, at 17:41, Коньков Евгений wrote:
> Loads only netisr3.
> and question: ip works over ethernet. How you can distinguish ip and ether???
netstat -Q is showing you per-protocol (layer) processing statistics. An IP
packet arriving via ethernet will typically be counted twice: once for ethernet
input/decapsulation, and once for IP-layer processing. Netisr dispatch serves a
number of purposes, not least preventing excessive stack depth/recursion and
load balancing.
There has been a historic tension between deferred (queued) dispatch to a
separate worker and direct dispatch ("process to completion"). The former
offers more opportunities for parallelism and reduces latency during
interrupt-layer processing. However, the latter reduces overhead and overall
packet latency for higher-level parallelism by avoiding queueing/scheduling
overheads, as well as avoiding packets migration between caches, reducing cache
coherency traffic. Our general experience is that many common configurations,
especially lower-end systems *and* systems with multi-queue 10gbps cards,
prefer direct dispatch. However, there are forwarding scenarios or ones in
which CPU count significantly outnumbers NIC input queue count, where queuing
to additional workers can markedly improve performance.
In FreeBSD 9.0 we've attempted to improve the vocabulary of expressible
policies in netisr so that we can explore which work best in various scenarios,
giving users more flexibility but also attempting to determine a better
longer-term model. Ideally, as with the VM system, these features would be to
some extent self-tuning, but we don't have enough information and experience to
decide how best to do that yet.
> NETISR_POLICY_FLOW netisr should maintain flow ordering as defined by
> the mbuf header flow ID field. If the protocol
> implements nh_m2flow, then netisr will query the
> protocol in the event that the mbuf doesn't have a
> flow ID, falling back on source ordering.
>
> NETISR_POLICY_CPU netisr will entirely delegate all work placement
> decisions to the protocol, querying nh_m2cpuid for
> each packet.
>
> _FLOW: description says that cpuid discovered by flow.
> _CPU: here decision to choose CPU is deligated to protocol. maybe it
> will be clear to name it as: NETISR_POLICY_PROTO ???
The name has to do with the nature of the information returned by the netisr
protocol handler -- in the former case, the protocol returns a flow identifier,
which is used by netisr to calculate an affinity. In the latter case, the
protocol returns a CPU affinity directly.
> and BIG QUESTION: why you allow to somebody (flow, proto) to make any
> decisions??? That is wrong: because of bad their
> implementation/decision may cause to schedule packets only to some CPU.
> So one CPU will overloaded (0%idle) other will be free. (100%idle)
I think you're confusing policy and mechanism. The above KPIs are about
providing the mechanism to implement a variety of policies. Many of the
policies we are interested in are not yet implemented, or available only as
patches. Keep in mind that workloads and systems are highly variable, with
variable costs for work dispatch, etc. We run on high-end Intel servers, where
individual CPUs tend to be very powerful but not all that plentiful, but also
embedded multi-threadd MIPS devices with many threads, each individually quite
weak. Deferred dispatch is a better choice for the latter, where there are
optimised handoff primitives to help avoid queueing overhead, whereas in the
former case you really want NIC-backed work dispatch, which will generally mean
you want direct dispatch with multiple ithreads (one per queue) rather than
multiple netisr threads. Using deferred dispatch in Intel-style environments is
generally unproductive, since high-end configurations will support multi-queue
input already, and CPUs are quite powerful.
>> * Enforcing ordering limits the opportunity for concurrency, but maintains
>> * the strong ordering requirements found in some protocols, such as TCP.
> TCP do not require strong ordering requiremets!!! Maybe you mean UDP?
I think most people would disagree with this. Reordering TCP segments leads to
extremely poor TCP behaviour -- there is an extensive research literature on
this, and maintaining ordering for TCP flows is a critical network stack design
goal.
> To get full concurency you must put new flowid to free CPU and
> remember cpuid for that flow.
Stateful assignment of flows to CPUs is of significant interest to use,
although currently we only support hash-based assignment without state. In
large part, that decision is a good one, as multi-queue network cards are
highly variable in terms of the size of their state tables for offloading
flow-specific affinity policies. For example, lower-end 10gbps cards may
support state tables with 32 entries. High-end cards may support state tables
with tens of thousands of entries.
> Just hash packetflow to then number of thrreads: net.isr.numthreads
> nws_array[flowid]= hash( flowid, sourceid, ifp->if_index, source )
> if( cpuload( nws_array[flowid] )>99 )
> nws_array[flowid]++; //queue packet to other CPU
>
> that will be just ten lines of conde instead of 50 in your case.
We support a more complex KPI because we need to support future policies that
are more complex. For example, there are out-of-tree changes that align
TCP-level and netisr-level per-CPU data structures and affinity with NIC RSS
support. The algorithm you've suggested above explicitly introduces reordering,
which would significant damage network performance, even though it appears to
balance CPU load better.
> Also nitice you have:
> /*
> * Utility routines for protocols that implement their own mapping of flows
> * to CPUs.
> */
> u_int
> netisr_get_cpucount(void)
> {
>
> return (nws_count);
> }
>
> but you do not use it! that break incapsulation.
This is a public symbol for use outside of the netisr framework -- for example,
in the uncommitted RSS code.
> Also I want to ask you: help me please where I can find documention
> about scheduling netisr and full packetflow through kernel:
> packetinput->kernel->packetoutput
> but more description what is going on with packet while it is passing
> router.
Unfortunately, this code is currently largely self-documenting. The Stevens'
books are getting quite outdated, as are McKusick/Neville-Neil -- however, they
at least offer structural guides which may be of use to you. Refreshes of these
books would be extremely helpful.
Robert_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "[email protected]"