Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio

Nanzhe Zhao Fri, 04 Apr 2025 16:20:12 -0700

Thank you for your prompt and patient response!
>
> The challenge with that is that iomap does not support all the
> functionality that f2fs requires.  The iomap data structure could
> be duplicated inside f2fs, but then we hit the problem that f2fs
> currently stores other information in folio->private.  So we'd need
> to add a flags field to iomap_folio_state to store that information
> instead.
>
> See the part of f2fs.h from PAGE_PRIVATE_GET_FUNC to the end of
> clear_page_private_all().
Thank you for pointing out that specific piece of code. It really
helped me see some issues I hadn't noticed before and gave me a lot of
insights.
Based on my understanding after studying the code related to F2FS's
use of the private field of the page structure, it appears that F2FS
employs this field in a specific way. If the private field is not
interpreted as a pointer, it seems it could be used to store
additional flag bits. A key observation is that these functions seem
to apply to tail pages as well. Therefore, as you mentioned, if we are
using folios to manage multiple pages, it seems reasonable to consider
adding a similar field within the iomap_folio_state structure. This
would be analogous to how it currently tracks the uptodate and dirty
states for each subpage, allowing us to track the state of these
private fields for each subpage as well. Because it looks just like
F2FS is utilizing the private field as a way to extend the various
state flags of a page in memory. Perhaps it would be more appropriate
to directly name this new structure f2fs_folio_state? This is because
I'm currently unsure whether it will interact with existing iomap APIs
or if we will need to develop F2FS-specific APIs for it.
>
> You're right that f2fs needs per-block dirty tracking if it is to
> support large folios.

I feel that we need to consider more than just this aspect. In fact,
it might be because we are still in the early stages of F2FS folio
support,so it leaves me the impression that the current F2FS folio
implementation is essentially just replacing struct page at the
interface level. It effectively acts just like a single page, or in
other words, a folio of order 0.
Just now, I perform a simple test:
static int open_and_read(struct file** file_ptr_ref,char*
file_path,int flag,char** read_buffer_ref,size_t read_size,loff_t
read_pos)
{
    /*open the file in custom module*/
    struct file*file_ptr = filp_open(file_path,flag, 0644);
    mapping_set_large_folios(file_ptr->f_mapping);/*Intentionaly set
large order of folio for test*/
    printk(KERN_EMERG "min order folio of file %s is
%d",file_path,mapping_min_folio_order(file_ptr->f_mapping));
    printk(KERN_EMERG "max order folio of file %s is
%d",file_path,mapping_max_folio_order(file_ptr->f_mapping));
    *file_ptr_ref=file_ptr;
    /*...
    file_ptr error handle code
   */
    char* read_buffer=kmalloc(read_size,GFP_KERNEL);
     /*...
   read_buffer error handle code
    */
    int bytes_read=0;
    bytes_read = kernel_read(file_ptr, read_buffer, read_size, &read_pos);
   /*...
   bytes_read error handle code
   */
    *read_buffer_ref=read_buffer;
    return bytes_read;
}

In my custom module code, which I use for experiments and testing, I
used this function to attempt a buffer read of 512 * PAGE_SIZE from
F2FS. The result was a segmentation fault. The reason is quite simple:
static int f2fs_mpage_readpages(struct inode *inode,
struct readahead_control *rac, struct folio *folio)
{
struct bio *bio = NULL;
sector_t last_block_in_bio = 0;
struct f2fs_map_blocks map;
#ifdef CONFIG_F2FS_FS_COMPRESSION
/*...compress ctx*/
#endif
unsigned nr_pages = rac ? readahead_count(rac) : 1;
unsigned max_nr_pages = nr_pages;
int ret = 0;

/*...init f2fs_map_blocks*/

for (; nr_pages; nr_pages--) {/*Error ! Only decre 1 a time when
submit one folio to read*/
if (rac) {
folio = readahead_folio(rac);/*Iterate to next folio*/
prefetchw(&folio->flags);
}

#ifdef CONFIG_F2FS_FS_COMPRESSION
index = folio_index(folio);/*Error!!Deref NULL ptr!!*/

if (!f2fs_compressed_file(inode))
goto read_single_page;
/*compress file pages read logic*/
goto next_page;
read_single_page:
#endif
ret = f2fs_read_single_page(inode, folio, max_nr_pages, &map,
&bio, &last_block_in_bio, rac);
/*Error!! Only  first page of current folio is subimited for read!*/
if (ret) {
#ifdef CONFIG_F2FS_FS_COMPRESSION
set_error_page:
#endif
folio_zero_segment(folio, 0, folio_size(folio));
folio_unlock(folio);
}
#ifdef CONFIG_F2FS_FS_COMPRESSION
next_page:
#endif

#ifdef CONFIG_F2FS_FS_COMPRESSION
/*... last page handle logic*/
#endif
}
if (bio)
f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA);
return ret;
}

As you can see in f2fs_mpage_readpages, after each folio is processed
in the loop, the nr_pages counter is only decremented by 1. Therefore,
it's clear that when the allocated folios in the page cache are all
iterated through, nr_pages still has remaining value, and the loop
continues. This naturally leads to a segmentation fault at index =
folio_index(folio); due to dereferencing a null pointer. Furthermore,
only the first page of each folio is submitted for I/O; the remaining
pages are not filled with data from disk.

This isn't the only place. Actually, when I was previously studying
the implementation of f2fs_dirty_folio, I also noticed:
static bool f2fs_dirty_data_folio(struct address_space *mapping,
struct folio *folio)
{
/*...other code*/
if (filemap_dirty_folio(mapping, folio)) {
f2fs_update_dirty_folio(inode, folio);
return true;
}
return false;
}
void f2fs_update_dirty_folio(struct inode *inode, struct folio *folio)
{
/*...other code*/
spin_lock(&sbi->inode_lock[type]);
if (type != FILE_INODE || test_opt(sbi, DATA_FLUSH))
__add_dirty_inode(inode, type);
inode_inc_dirty_pages(inode);/*Only incre inode->dirty_pages by one*/
spin_unlock(&sbi->inode_lock[type]);

set_page_private_reference(&folio->page);
}

In f2fs_update_dirty_folio, inode_inc_dirty_pages(inode) only
increments the inode's dirty page count by 1. This is quite confusing,
as it's unclear why it doesn't increment by nr_pages (assuming we are
not yet tracking dirty status at per block level). This observation
further strengthens my suspicion that the current folio implementation
might just be fixed at order 0.I haven't check whether other piece of
code in f2fs share the same problem.

I am planning to prepare patches to address these issues and submit
them soon. I noticed you recently submitted a big bunch of patches on
folio. I would like to debug and test based on your patch.Therefore, I
was wondering if it would be possible for you to share your modified
F2FS code directly, or perhaps provide a link to your Git repository?
Manually copying and applying so many patches from the mailing list
would be quite cumbersome.
Best regards.

Matthew Wilcox <wi...@infradead.org> 于2025年3月31日周一 11:43写道：
>
> On Sun, Mar 30, 2025 at 10:38:37AM +0800, Nanzhe Zhao wrote:
> > I have been considering potential solutions to address this. Two
> > approaches I've explored are:
> >  Either modifying the f2fs dirty page writeback function to manually
> > mark individual sub-pages within a folio as dirty, rather than relying
> > on the folio-level dirty flag.
>
> Just so you know, the per-page dirty flag is not in fact per page.
> If you call SetPageDirty() on a tail page, it will set the dirty flag
> on the head page (ie the same bit that is used by folio_set_dirty()).
> This is intentional as we do not intend for there to be a per-page flags
> field in the future.
>
> > Or utilizing the per-block dirty state tracking feature introduced in
> > kernel 6.6 within the iomap framework. This would involve using the
> > iomap_folio_state structure to track the dirty status of each block
> > within a folio.
>
> The challenge with that is that iomap does not support all the
> functionality that f2fs requires.  The iomap data structure could
> be duplicated inside f2fs, but then we hit the problem that f2fs
> currently stores other information in folio->private.  So we'd need
> to add a flags field to iomap_folio_state to store that information
> instead.
>
> See the part of f2fs.h from PAGE_PRIVATE_GET_FUNC to the end of
> clear_page_private_all().
>
> You're right that f2fs needs per-block dirty tracking if it is to
> support large folios.

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio

Reply via email to