On 3/26/24 06:53, Bart Van Assche wrote:
> On 3/24/24 21:44, Damien Le Moal wrote:
>> +/*
>> + * Per-zone write plug.
>> + */
>> +struct blk_zone_wplug {
>> +    struct hlist_node       node;
>> +    struct list_head        err;
>> +    atomic_t                ref;
>> +    spinlock_t              lock;
>> +    unsigned int            flags;
>> +    unsigned int            zone_no;
>> +    unsigned int            wp_offset;
>> +    struct bio_list         bio_list;
>> +    struct work_struct      bio_work;
>> +};
> 
> Please document what 'lock' protects. Please also document the unit of
> wp_offset.
> 
> Since there is an atomic reference count in this data structure, why is
> the flag BLK_ZONE_WPLUG_FREEING required? Can that flag be replaced by
> checking whether or not 'ref' is zero?

Nope, we cannot. The reason is that BIO issuing and zone reset/finish can be
concurrently processed and we need to be ready for a user doing really stupid
things like resetting or finishing a zone while BIOs for that zone are being
issued. When zone reset/finish is processed, the plug is removed from the hash
table, but disk_get_zone_wplug_locked() may still get a reference to it because
we do not have the plug locked yet. Hence the flag, to prevent reusing the plug
for the reset/finished zone that was already removed from the hash table. This
is mentioned with a comment in disk_get_zone_wplug_locked():

        /*
         * Check that a BIO completion or a zone reset or finish
         * operation has not already flagged the zone write plug for
         * freeing and dropped its reference count. In such case, we
         * need to get a new plug so start over from the beginning.
         */

The reference count dropping to 0 will then be the trigger for actually freeing
the plug, after all in-flight or plugged BIOs are completed (most likely 
failed).

>> -void disk_free_zone_bitmaps(struct gendisk *disk)
>> +static bool disk_insert_zone_wplug(struct gendisk *disk,
>> +                               struct blk_zone_wplug *zwplug)
>> +{
>> +    struct blk_zone_wplug *zwplg;
>> +    unsigned long flags;
>> +    unsigned int idx =
>> +            hash_32(zwplug->zone_no, disk->zone_wplugs_hash_bits);
>> +
>> +    /*
>> +     * Add the new zone write plug to the hash table, but carefully as we
>> +     * are racing with other submission context, so we may already have a
>> +     * zone write plug for the same zone.
>> +     */
>> +    spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
>> +    hlist_for_each_entry_rcu(zwplg, &disk->zone_wplugs_hash[idx], node) {
>> +            if (zwplg->zone_no == zwplug->zone_no) {
>> +                    spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +                    return false;
>> +            }
>> +    }
>> +    hlist_add_head_rcu(&zwplug->node, &disk->zone_wplugs_hash[idx]);
>> +    spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
>> +
>> +    return true;
>> +}
> 
> Since this function inserts an element into disk->zone_wplugs_hash[],
> can it happen that another thread removes that element from the hash
> list before this function returns?

No, that cannot happen. Both insertion and deletion of plugs in the hash table
are serialized with disk->zone_wplugs_lock. See disk_remove_zone_wplug().

-- 
Damien Le Moal
Western Digital Research


Reply via email to