Hi Don, Thanks for your test!
On Thu, Mar 01, 2018 at 04:18:17PM +0000, Don Brace wrote: > > -----Original Message----- > > From: Ming Lei [mailto:[email protected]] > > Sent: Tuesday, February 27, 2018 4:08 AM > > To: Jens Axboe <[email protected]>; [email protected]; Christoph > > Hellwig <[email protected]>; Mike Snitzer <[email protected]> > > Cc: [email protected]; Hannes Reinecke <[email protected]>; Arun Easi > > <[email protected]>; Omar Sandoval <[email protected]>; Martin K . > > Petersen <[email protected]>; James Bottomley > > <[email protected]>; Christoph Hellwig <[email protected]>; > > Don Brace <[email protected]>; Kashyap Desai > > <[email protected]>; Peter Rivera <[email protected]>; > > Laurence Oberman <[email protected]>; Ming Lei > > <[email protected]>; Meelis Roos <[email protected]> > > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue > > > > EXTERNAL EMAIL > > > > > > From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs), > > one msix vector can be created without any online CPU mapped, then one > > command's completion may not be notified. > > > > This patch setups mapping between cpu and reply queue according to irq > > affinity info retrived by pci_irq_get_affinity(), and uses this mapping > > table to choose reply queue for queuing one command. > > > > Then the chosen reply queue has to be active, and fixes IO hang caused > > by using inactive reply queue which doesn't have any online CPU mapped. > > > > Cc: Hannes Reinecke <[email protected]> > > Cc: Arun Easi <[email protected]> > > Cc: "Martin K. Petersen" <[email protected]>, > > Cc: James Bottomley <[email protected]>, > > Cc: Christoph Hellwig <[email protected]>, > > Cc: Don Brace <[email protected]> > > Cc: Kashyap Desai <[email protected]> > > Cc: Peter Rivera <[email protected]> > > Cc: Laurence Oberman <[email protected]> > > Cc: Meelis Roos <[email protected]> > > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs") > > Signed-off-by: Ming Lei <[email protected]> > > I am getting some issues that need to be tracked down: I check the patch one more time, not find odd thing, and the only one is that inside hpsa_do_reset(), wait_for_device_to_become_ready() is called to send 'test unit ready' always by the reply queue 0. Do you know if something bad may happen if other non-zero reply queue is used? Could you share us how you reproduce this issue? Looks you can boot successfully, so could you please provide the following output? 1) what is your server type? We may find one in our lab, so that I can try to reproduce it. 2) lscpu 3) irq affinity info, and you need to pass the 1st column of 'lspci' of your hpsa PCI device to this script: #!/bin/sh if [ $# -ge 1 ]; then PCID=$1 else PCID=`lspci | grep "Non-Volatile memory" | cut -c1-7` fi PCIP=`find /sys/devices -name *$PCID | grep pci` IRQS=`ls $PCIP/msi_irqs` echo "kernel version: " uname -a echo "PCI name is $PCID, dump its irq affinity:" for IRQ in $IRQS; do CPUS=`cat /proc/irq/$IRQ/smp_affinity_list` echo "\tirq $IRQ, cpu list $CPUS" done Thanks, Ming
