On 2018/5/8 11:31, Jaegeuk Kim wrote:
> On 05/08, Chao Yu wrote:
>> On 2018/5/8 4:46, Jaegeuk Kim wrote:
>>> On 04/27, Chao Yu wrote:
>>>> On 2018/4/27 0:36, Jaegeuk Kim wrote:
>>>>> On 04/26, Chao Yu wrote:
>>>>>> On 2018/4/26 23:48, Jaegeuk Kim wrote:
>>>>>>> On 04/26, Chao Yu wrote:
>>>>>>>> Thread A                               Thread B
>>>>>>>> - f2fs_ioc_commit_atomic_write
>>>>>>>>  - commit_inmem_pages
>>>>>>>>   - f2fs_submit_merged_write_cond
>>>>>>>>   : write data
>>>>>>>>                                        - write_checkpoint
>>>>>>>>                                         - do_checkpoint
>>>>>>>>                                         : commit all node within CP
>>>>>>>>                                         -> SPO
>>>>>>>>   - f2fs_do_sync_file
>>>>>>>>    - file_write_and_wait_range
>>>>>>>>    : wait data writeback
>>>>>>>>
>>>>>>>> In above race condition, data/node can be flushed in reversed order 
>>>>>>>> when
>>>>>>>> coming a checkpoint before f2fs_do_sync_file, after SPOR, it results in
>>>>>>>> atomic written data being corrupted.
>>>>>>>
>>>>>>> Wait, what is the problem here? Thread B could succeed checkpoint, 
>>>>>>> there is
>>>>>>> no problem. If it fails, there is no fsync mark where we can recover 
>>>>>>> it, so
>>>>>>
>>>>>> Node is flushed by checkpoint before data, with reversed order, that's 
>>>>>> the problem.
>>>>>
>>>>> What do you mean? Data should be in disk, in order to proceed checkpoint.
>>>>
>>>> 1. thread A: commit_inmem_pages submit data into block layer, but haven't 
>>>> waited
>>>> it writeback.
>>>> 2. thread A: commit_inmem_pages update related node.
>>>> 3. thread B: do checkpoint, flush all nodes to disk
>>>
>>> How about, in block_operations(),
>>>
>>>     down_read_trylock(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>>>     if (fail)
>>>             wait_on_all_pages_writeback(F2FS_WB_DATA);
>>>     else
>>>             up_read(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>>
>> I sent one patch for that, could you check it?
>>
>> Adding wait_on_all_pages_writeback in block_operations() can make 
>> checkpoint()
>> wait pages writeback one more time, which break IO flow, so what's your 
>> concern
>> here?
> 
> Performance. And I can see wait_on_all_pages_writeback() waits only for
> F2FS_WB_CP_DATA in checkpoint()?

Oh, you mean wait all F2FS_WB_DATA pages writeback, what about just treating
atomic write page as F2FS_WB_CP_DATA, and we can wait atomic pages in
wait_on_all_pages_writeback() in do_checkpoitn().

Thanks,

> 
> 
>>
>> Thanks,
>>
>>>
>>>
>>>> 4. SPOR
>>>>
>>>> Then, atomic file becomes corrupted since nodes is flushed before data.
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>> we can just ignore the last written data as nothing.
>>>>>>>
>>>>>>>>
>>>>>>>> This patch adds f2fs_wait_on_page_writeback in __revoke_inmem_pages() 
>>>>>>>> to
>>>>>>>> keep data and node of atomic file being flushed orderly.
>>>>>>>>
>>>>>>>> Signed-off-by: Chao Yu <[email protected]>
>>>>>>>> ---
>>>>>>>>  fs/f2fs/file.c    | 4 ++++
>>>>>>>>  fs/f2fs/segment.c | 3 +++
>>>>>>>>  2 files changed, 7 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>>>> index be7578774a47..a352804af244 100644
>>>>>>>> --- a/fs/f2fs/file.c
>>>>>>>> +++ b/fs/f2fs/file.c
>>>>>>>> @@ -217,6 +217,9 @@ static int f2fs_do_sync_file(struct file *file, 
>>>>>>>> loff_t start, loff_t end,
>>>>>>>>  
>>>>>>>>        trace_f2fs_sync_file_enter(inode);
>>>>>>>>  
>>>>>>>> +      if (atomic)
>>>>>>>> +              goto write_done;
>>>>>>>> +
>>>>>>>>        /* if fdatasync is triggered, let's do in-place-update */
>>>>>>>>        if (datasync || get_dirty_pages(inode) <= 
>>>>>>>> SM_I(sbi)->min_fsync_blocks)
>>>>>>>>                set_inode_flag(inode, FI_NEED_IPU);
>>>>>>>> @@ -228,6 +231,7 @@ static int f2fs_do_sync_file(struct file *file, 
>>>>>>>> loff_t start, loff_t end,
>>>>>>>>                return ret;
>>>>>>>>        }
>>>>>>>>  
>>>>>>>> +write_done:
>>>>>>>>        /* if the inode is dirty, let's recover all the time */
>>>>>>>>        if (!f2fs_skip_inode_update(inode, datasync)) {
>>>>>>>>                f2fs_write_inode(inode, NULL);
>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>> index 584483426584..9ca3d0a43d93 100644
>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>> @@ -230,6 +230,8 @@ static int __revoke_inmem_pages(struct inode 
>>>>>>>> *inode,
>>>>>>>>  
>>>>>>>>                lock_page(page);
>>>>>>>>  
>>>>>>>> +              f2fs_wait_on_page_writeback(page, DATA, true);
>>>>>>>> +
>>>>>>>>                if (recover) {
>>>>>>>>                        struct dnode_of_data dn;
>>>>>>>>                        struct node_info ni;
>>>>>>>> @@ -415,6 +417,7 @@ static int __commit_inmem_pages(struct inode 
>>>>>>>> *inode)
>>>>>>>>                /* drop all uncommitted pages */
>>>>>>>>                __revoke_inmem_pages(inode, &fi->inmem_pages, true, 
>>>>>>>> false);
>>>>>>>>        } else {
>>>>>>>> +              /* wait all committed IOs writeback and release them 
>>>>>>>> from list */
>>>>>>>>                __revoke_inmem_pages(inode, &revoke_list, false, false);
>>>>>>>>        }
>>>>>>>>  
>>>>>>>> -- 
>>>>>>>> 2.15.0.55.gc2ece9dc4de6
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> .
> 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to