On 3/26/24 06:53, Bart Van Assche wrote:
> On 3/24/24 21:44, Damien Le Moal wrote:
>> +/*
>> + * Per-zone write plug.
>> + */
>> +struct blk_zone_wplug {
>> + struct hlist_node node;
>> + struct list_head err;
>> + atomic_t ref;
>> + spinlock_t lock;
>> + unsigned int flags;
>> + unsigned int zone_no;
>> + unsigned int wp_offset;
>> + struct bio_list bio_list;
>> + struct work_struct bio_work;
>> +};
>
> Please document what 'lock' protects. Please also document the unit of
> wp_offset.
>
> Since there is an atomic reference count in this data structure, why is
> the flag BLK_ZONE_WPLUG_FREEING required? Can that flag be replaced by
> checking whether or not 'ref' is zero?
Nope, we cannot. The reason is that BIO issuing and zone reset/finish can be
concurrently processed and we need to be ready for a user doing really stupid
things like resetting or finishing a zone while BIOs for that zone are being
issued. When zone reset/finish is processed, the plug is removed from the hash
table, but disk_get_zone_wplug_locked() may still get a reference to it because
we do not have the plug locked yet. Hence the flag, to prevent reusing the plug
for the reset/finished zone that was already removed from the hash table. This
is mentioned with a comment in disk_get_zone_wplug_locked():
/*
* Check that a BIO completion or a zone reset or finish
* operation has not already flagged the zone write plug for
* freeing and dropped its reference count. In such case, we
* need to get a new plug so start over from the beginning.
*/
The reference count dropping to 0 will then be the trigger for actually freeing
the plug, after all in-flight or plugged BIOs are completed (most likely
failed).
>> -void disk_free_zone_bitmaps(struct gendisk *disk)
>> +static bool disk_insert_zone_wplug(struct gendisk *disk,
>> + struct blk_zone_wplug *zwplug)
>> +{
>> + struct blk_zone_wplug *zwplg;
>> + unsigned long flags;
>> + unsigned int idx =
>> + hash_32(zwplug->zone_no, disk->zone_wplugs_hash_bits);
>> +
>> + /*
>> + * Add the new zone write plug to the hash table, but carefully as we
>> + * are racing with other submission context, so we may already have a
>> + * zone write plug for the same zone.
>> + */
>> + spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
>> + hlist_for_each_entry_rcu(zwplg, &disk->zone_wplugs_hash[idx], node) {
>> + if (zwplg->zone_no == zwplug->zone_no) {
>> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> + return false;
>> + }
>> + }
>> + hlist_add_head_rcu(&zwplug->node, &disk->zone_wplugs_hash[idx]);
>> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +
>> + return true;
>> +}
>
> Since this function inserts an element into disk->zone_wplugs_hash[],
> can it happen that another thread removes that element from the hash
> list before this function returns?
No, that cannot happen. Both insertion and deletion of plugs in the hash table
are serialized with disk->zone_wplugs_lock. See disk_remove_zone_wplug().
--
Damien Le Moal
Western Digital Research