It is observed that IOPS can be improved much by simply making hw queue per NUMA node on null_blk, so this patch applies the introduced .host_tagset for improving performance.
In reality, .can_queue is quite big, and NUMA node number is often small, so each hw queue's depth should be high enough to saturate device. Cc: Arun Easi <[email protected]> Cc: Omar Sandoval <[email protected]>, Cc: "Martin K. Petersen" <[email protected]>, Cc: James Bottomley <[email protected]>, Cc: Christoph Hellwig <[email protected]>, Cc: Don Brace <[email protected]> Cc: Kashyap Desai <[email protected]> Cc: Peter Rivera <[email protected]> Cc: Laurence Oberman <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Mike Snitzer <[email protected]> Signed-off-by: Ming Lei <[email protected]> --- drivers/scsi/hpsa.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 3a9eca163db8..0747751b7e1c 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -978,6 +978,7 @@ static struct scsi_host_template hpsa_driver_template = { .shost_attrs = hpsa_shost_attrs, .max_sectors = 1024, .no_write_same = 1, + .host_tagset = 1, }; static inline u32 next_command(struct ctlr_info *h, u8 q) @@ -5761,6 +5762,11 @@ static int hpsa_scsi_host_alloc(struct ctlr_info *h) static int hpsa_scsi_add_host(struct ctlr_info *h) { int rv; + /* 256 tags should be high enough to saturate device */ + int max_queues = DIV_ROUND_UP(h->scsi_host->can_queue, 256); + + /* per NUMA node hw queue */ + h->scsi_host->nr_hw_queues = min_t(int, nr_node_ids, max_queues); rv = scsi_add_host(h->scsi_host, &h->pdev->dev); if (rv) { -- 2.9.5
