Vlad, Please pull the following patches from git.openfabrics.org/~mmarciniszyn/scm/linux-2.6.to_ofed
The kernel.org patches are just now hitting the lists. Thanks! Mike commit a7f7c501f9cf0a1454323c5f6f8668be2ddf2b1d Author: Mike Marciniszyn <[email protected]> Date: Fri Sep 23 12:04:24 2011 -0400 IB/qib: Add logic for affinity hint Call irq_set_affinity_hint to provide user mode programs like irqbalance the information to be able to distribute qib interrupts appropriately. The logic allocates all non-receive interrupts to the first CPU local to the HCA. Receive interrupts are allocated round robin starting with the second CPU local to the HCA with potential wrap back to the second CPU. Signed-off-by: Mike Marciniszyn <[email protected]> commit 96ecdac8d78e2d20b32eb4ec50643bcb1c03f828 Author: Mike Marciniszyn <[email protected]> Date: Fri Sep 23 11:10:49 2011 -0400 IB/qib: Add irq name refinements This patch adds a refinement to the name registered with MSIX interrupts so that user level scripts can determine the device associated with the IRQs when there are multiple HCAs with a potentially different set of local CPUs. Signed-off-by: Mike Marciniszyn <[email protected]> commit d3d7ea0a142f1545f9bfbcd9e3bd47b7b2f9ce79 Author: Mike Marciniszyn <[email protected]> Date: Fri Sep 23 10:59:03 2011 -0400 IB/qib: remove s_lock around header validation Observation in qib_ruc_check_hdr() shows that the s_lock is not required in the normal case. The r_lock is held in all cases, and protects the qp fields that are read. The s_lock will be needed to around the call to qib_migrate_qp() to insure that the send engine sees a consistent set of fields. Signed-off-by: Mike Marciniszyn <[email protected]> commit 6aeceea336753dfc6b2987594b228e65b3a46982 Author: Mike Marciniszyn <[email protected]> Date: Fri Sep 23 10:46:13 2011 -0400 IB/qib: memcpy optimizations The default memcpy used by qib_copy_sge() ends up being a rep movsb on x86_64, which is pretty slow. This fix adds an x86_64 specific routine that 1) probes for X86_FEATURE_REP_GOOD and 2) uses an inline asm routine builton rep movsq that testing has shown is better than the builtin memcpy for all cases up to 4K. The probing routine is now called when the qib module is loaded to enable the optimization. When X86_FEATURE_REP_GOOD is not set, the routine uses the kernel's unrolled __memcpy when the length is more than 64 and the builtin memcpy otherwise. This patch also adds the cache bypass copies from older releases. Testing has shown that AMD cpus benefit with a 40% improvement in netperf/ipoib. The cache_bypass_copy module parameter can be used to enable on non-AMD CPUs. The qib_verbs_send_dma() and qib_copy_from_sge are also changed to use the memcpy_string_op() to improve packet delivery performance to the send engine. The existing copy as well as a new stub probe routine are maintained as weak symbols for other architectures. Signed-off-by: Mike Marciniszyn <[email protected]> commit 9ecf3abd1d880255e2f9be0e9714dcd49c97fede Author: Mike Marciniszyn <[email protected]> Date: Fri Sep 23 09:44:07 2011 -0400 IB/qib: precompute timeout jiffies to optimize latency A new field is added to qib_qp called timeout_jiffies. It is initialized upon create and modify. The field is now used vs. the computation based on qp->timeout. Signed-off-by: Mike Marciniszyn <[email protected]> commit b718e7f17cc9fc9d31148ec046bf071bdd2c105d Author: Mike Marciniszyn <[email protected]> Date: Thu Sep 22 14:43:48 2011 -0400 IB/qib: qpn lookup optimizations The heavy weight spinlock in qib_lookup_qpn() is replaced with the RCU locking mechanism. The hash list itself is now accessed via jhash functions vs. the mod. The changes should benefit multiple receive contexts in different processors by not contending for the lock to just read the hash structures. The patch also adds a lookaside_qp (pointer) and a lookaside_qpn in the context. The interrupt handler will test the current packet's qpn against lookaside_qpn if the lookaside_qp pointer is non-null. The pointer is NULL'ed when the interrupt handler exits. Signed-off-by: Mike Marciniszyn <[email protected]> commit dd2e48e7c02c0925a5e603d1a9752d20b7e15ed3 Author: Mike Marciniszyn <[email protected]> Date: Thu Sep 22 11:50:04 2011 -0400 IB/qib: Eliminate divide/mod in converting idx to egr buf pointer The context init now saves a shift from rcvegrbufs_perchunk rcvegrbufs_perchunk_shift using ilog2. A BUG_ON protects the power of 2 assumption. Signed-off-by: Mike Marciniszyn <[email protected]> commit 3eef5499db4d04aae5d10580b3735696957cd99d Author: Mike Marciniszyn <[email protected]> Date: Thu Sep 22 11:40:13 2011 -0400 IB/qib: decode path mtu optimization Store both the encoded and decoded mtu in the qp structure as a minor optimization UC/RC receive routines. Signed-off-by: Mike Marciniszyn <[email protected]> commit 2871fc8a45df95c52cac9707e886383ec712a20e Author: Mike Marciniszyn <[email protected]> Date: Thu Sep 22 11:24:39 2011 -0400 IB/qib: Optimize RC/UC code by IB operation The memset for zeroing the work completion had been unconditional. This patch removes the memset and moves the zeroing into the work completion with a more explicit field by field set. With this patch, non-ONLY/non-LAST packets will avoid the overhead since they will not generate a completion. Signed-off-by: Mike Marciniszyn <[email protected]> This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. _______________________________________________ ewg mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
