On 2019/4/26 下午2:32, Qu Wenruo wrote:
>
>
> On 2019/4/26 上午11:32, robbieko wrote:
> [snipped]
>>>
>>> But now, before creating new snapshot, we will flush all data.
>>>
>>> This means at this time, the data will be written back *NOCOW*, just as
>>> expected.
>>>
>>>> will_be_snapshotted will be set, and then flush
>>>
>>> will_be_snapshotted will also be gone.
>>>
>>> Thanks,
>>> Qu
>>>
>>
>> Why will will_be_snapshotted disappear?
>>
>> If will_be_snapshotted is removed, when volume is full,
>> how do we ensure that snapshot and write are both
>> simultaneous data not loss ?
>>
>> How do you avoid the write data between btrfs_start_delalloc_snapshot
>> and create_pending_snapshot during this time ?
>
> Oh, I forgot that buffered write doesn't rely on transaction, thus it
> has the race window between above 2 functions.
>
>
> So will_be_snapshotted is to ensure buffered write at above race window
> always reserve space for COW.
>
> So your patch changes the behavior from:
>                    /------------------------ inc(will_be_snapshotted)
>                    |
> -------------------|-------------------------------------
> reserve: NOCOW     | reserve: COW
> delalloc: NOCOW    | delalloc: COW
>
> That reserve NOCOW->delalloc COW is causing ENOSPC.
>
> To new the behavior:
>
>                    /------------------------ inc(will_be_snapshotted)
>                    |                     /-- inc(snapshot_force_cow)
> -------------------|---------------------|---------------
> reserve: NOCOW     | reserve: COW        | reserve: COW
> delalloc: NOCOW    | delalloc: NOCOW     | delalloc: COW
>
> And now reserve->delalloc is either NOCOW->NOCOW or COW->COW, avoiding
> the possible ENOSPC.
>
> Then it makes sense now. (although way more complex)
>
> Although I still have some questions about all these
> will_be_snapshotted/snapshot_force_cow code.
>
> 1) Does that btrfs_start_delalloc_snapshot() in create_snapshot() call
>    still make sense?
>    We now have btrfs_start_delalloc_snapshot() in
>    btrfs_start_delalloc_flush().
>    Can we remove that call in create_snapshot() and:
>
> 2) Can't we make the race window smaller? (If we can remove the call in
>    mentioned in above quiestion)
>    E.g. move the inc(will_be_snapshotted) before
>    btrfs_start_delalloc_snapshot() in btrfs_start_delalloc_flush(),
>    and move inc(snapshot_force_cow) after that
>    btrfs_start_delalloc_snapshot() call?
>
>    As you also mentioned, the real race window is between
>    btrfs_start_delalloc_snapshot() and create_pending_snapshot().

And another (unrelated) question:

3) Why we use such complex way to prevent race? Why not just go per-
   subvolume freeze like various volume manager?

   Current way keeps buffered write still going, but does it worth?

   If we freeze/block write when creating snapshot, it just falls back
   to traditional volume manager behavior, shouldn't cause much problem,
   and we can get rid of all these snapshot specific fixes.

Thanks,
Qu

>
> Thanks,
> Qu
>
>>
>> E.g. chattr +C /mnt
>> Fallocate -l 4096 /mnt/small_file
>> Fallocate -l $((64*1024*1024)) /mnt/large_file
>>
>> First process:
>> While true; do
>> dd if=/dev/urandom of=/mnt/small_file bs=4k count=1 conv=notrunc
>> Done
>>
>> Second process:
>> While true; do
>> dd if=/dev/urandom of=/mnt/large_file bs=64k count=1024 conv=notrunc
>> Done
>>
>> Third process: create snapshot.
>

Reply via email to