Dne 3.4.2018 v 06:07 Dennis Yang napsal(a):
Recently we have came across an issue that dm-thin pool will be
switched to READ_ONLY mode because dm_pool_alloc_data_block() returns
-ENOSPC. AFAIK, this should not happen since alloc_data_block() will
check if there is any free space (and commit metadata if it first
reports no free space) before it allocates pool block. In addition,
total virtual space of all thin volumes is smaller than the pool
physical space in my testing environment which makes pool impossible
to run out of space.
This issue could be easily reproduced by the following steps.
1) Create a thin pool and a slightly smaller thin volume
sudo dmsetup create meta --table "0 40000000 linear /dev/sdf 0"
sudo dmsetup create data --table "0 10240000 linear /dev/md125 0"
sudo dd if=/dev/zero of=/dev/mapper/meta bs=1M count=1
sudo dmsetup create pool --table "0 10240000 thin-pool /dev/mapper/meta
/dev/mapper/data 1024 0 2 skip_block_zeroing error_if_no_space"
sudo dmsetup message pool 0 "create_thin 0"
sudo dmsetup create thin --table "0 10238976 thin /dev/mapper/pool 0"
2) Make a filesystem and mount it
sudo mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/thin
sudo mount /dev/mapper/thin /mnt
3) Write a file to mount point until it takes all the space
sudo dd if=/dev/zero of=/mnt/zero.img bs=1M
4) Remove this file and trim mount point
sudo rm /mnt/zero.img
sudo fstrim /mnt
Repeat step 3 and 4 multiple times and the pool will be switched to
READ_ONLY mode and need_checks flag will be set. Kernel message shows
the following messages.
[ 3952.723937] device-mapper: thin: 252:2: metadata operation
'dm_pool_alloc_data_block' failed: error = -28
[ 3952.723940] device-mapper: thin: 252:2: aborting current metadata transaction
[ 3952.725860] device-mapper: thin: 252:2: switching pool to read-only mode
This root cause of this issue is that dm-thin will first remove
mapping and increase corresponding blocks' reference count to prevent
them from being reused before DISCARD bios get processed by the
underlying layers. However. increasing blocks' reference count could
also increase the nr_allocated_this_transaction in struct sm_disk
which makes smd->old_ll.nr_allocated +
smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks.
In this case, alloc_data_block() will never commit metadata to reset
the begin pointer of struct sm_disk, because sm_disk_get_nr_free()
always return an underflow value.
If you need more information, please feel free to let me know.
Just forgotten to mention - tracked through this BZ:
dm-devel mailing list