-static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
-                                    struct scsi_cmnd *scsi_cmnd)
+static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
 {
        int index;
        void *bitmap = hisi_hba->slot_index_tags;
        unsigned long flags;

-       if (scsi_cmnd)
-               return scsi_cmnd->request->tag;
-
        spin_lock_irqsave(&hisi_hba->lock, flags);
        index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
                                   hisi_hba->last_slot_index + 1);

Then you switch to hisi_sas_slot_index_alloc() for allocating the unique
tag via spin_lock & find_next_zero_bit(). Do you think this way is more
efficient than blk-mq's sbitmap?

These are not fast path, as used only for TMF, internal IO, etc.

Having said that, hopefully we can move to scsi_{get,put}_reserved_cmd() when available, so that the LLDD has to stop managing them.


slot_index_alloc() is only used for commands which do _not_ have a tag
(eg internal commands), or for v2 hardware which has weird allocation rules.

But this patch has switched to this allocation unconditionally for all commands:


As Hannes said, v2 had a few bugs which meant that we had to make a specific version of this function for that hw revision, cf. slot_index_alloc_quirk_v2_hw(), and it cannot use request queue tag.

But, indeed, we could consider sbitmap for v2 hw. I'm not sure if it would help, considering the weird rules.

-       if (scsi_cmnd)
-               return scsi_cmnd->request->tag;
-

Otherwise duplicated slot can be used from different blk-mq hw queue.


The worsen thing is that V3's actual max queue depth is (4096 - 96), but
this patch claims that the device can support (4096 - 96) * 32 command
slots

To be clear about the hw, the hw supports max 4096 command tags and has 16 hw queues. The hw queue depth is configurable by software, and we configure it at 4096 per queue - same as max command tags - this is to support possibility of all commands using the same queue simultaneously.

, finally hisi_sas_slot_index_alloc() is used to respect the actual
max queue depth(4000).

Well, this patch is an RFC to demonstrate my idea. Of course the queue
depth should be identical before and after the conversion.

That is why I call it is hard to partition the hostwide tags to MQ.


Big contention is caused on hisi_sas_slot_index_alloc(), meantime huge> memory 
is wasted for request pool.

See above. That allocation is only used if no blk tag is available.

This patch switches the allocation for all commands.


Or, at least, that was the idea :-)

Agree, :-)


thanks,
Ming

.



Reply via email to