Thank you for your prompt and patient response! > > The challenge with that is that iomap does not support all the > functionality that f2fs requires. The iomap data structure could > be duplicated inside f2fs, but then we hit the problem that f2fs > currently stores other information in folio->private. So we'd need > to add a flags field to iomap_folio_state to store that information > instead. > > See the part of f2fs.h from PAGE_PRIVATE_GET_FUNC to the end of > clear_page_private_all(). Thank you for pointing out that specific piece of code. It really helped me see some issues I hadn't noticed before and gave me a lot of insights. Based on my understanding after studying the code related to F2FS's use of the private field of the page structure, it appears that F2FS employs this field in a specific way. If the private field is not interpreted as a pointer, it seems it could be used to store additional flag bits. A key observation is that these functions seem to apply to tail pages as well. Therefore, as you mentioned, if we are using folios to manage multiple pages, it seems reasonable to consider adding a similar field within the iomap_folio_state structure. This would be analogous to how it currently tracks the uptodate and dirty states for each subpage, allowing us to track the state of these private fields for each subpage as well. Because it looks just like F2FS is utilizing the private field as a way to extend the various state flags of a page in memory. Perhaps it would be more appropriate to directly name this new structure f2fs_folio_state? This is because I'm currently unsure whether it will interact with existing iomap APIs or if we will need to develop F2FS-specific APIs for it. > > You're right that f2fs needs per-block dirty tracking if it is to > support large folios.
I feel that we need to consider more than just this aspect. In fact, it might be because we are still in the early stages of F2FS folio support,so it leaves me the impression that the current F2FS folio implementation is essentially just replacing struct page at the interface level. It effectively acts just like a single page, or in other words, a folio of order 0. Just now, I perform a simple test: static int open_and_read(struct file** file_ptr_ref,char* file_path,int flag,char** read_buffer_ref,size_t read_size,loff_t read_pos) { /*open the file in custom module*/ struct file*file_ptr = filp_open(file_path,flag, 0644); mapping_set_large_folios(file_ptr->f_mapping);/*Intentionaly set large order of folio for test*/ printk(KERN_EMERG "min order folio of file %s is %d",file_path,mapping_min_folio_order(file_ptr->f_mapping)); printk(KERN_EMERG "max order folio of file %s is %d",file_path,mapping_max_folio_order(file_ptr->f_mapping)); *file_ptr_ref=file_ptr; /*... file_ptr error handle code */ char* read_buffer=kmalloc(read_size,GFP_KERNEL); /*... read_buffer error handle code */ int bytes_read=0; bytes_read = kernel_read(file_ptr, read_buffer, read_size, &read_pos); /*... bytes_read error handle code */ *read_buffer_ref=read_buffer; return bytes_read; } In my custom module code, which I use for experiments and testing, I used this function to attempt a buffer read of 512 * PAGE_SIZE from F2FS. The result was a segmentation fault. The reason is quite simple: static int f2fs_mpage_readpages(struct inode *inode, struct readahead_control *rac, struct folio *folio) { struct bio *bio = NULL; sector_t last_block_in_bio = 0; struct f2fs_map_blocks map; #ifdef CONFIG_F2FS_FS_COMPRESSION /*...compress ctx*/ #endif unsigned nr_pages = rac ? readahead_count(rac) : 1; unsigned max_nr_pages = nr_pages; int ret = 0; /*...init f2fs_map_blocks*/ for (; nr_pages; nr_pages--) {/*Error ! Only decre 1 a time when submit one folio to read*/ if (rac) { folio = readahead_folio(rac);/*Iterate to next folio*/ prefetchw(&folio->flags); } #ifdef CONFIG_F2FS_FS_COMPRESSION index = folio_index(folio);/*Error!!Deref NULL ptr!!*/ if (!f2fs_compressed_file(inode)) goto read_single_page; /*compress file pages read logic*/ goto next_page; read_single_page: #endif ret = f2fs_read_single_page(inode, folio, max_nr_pages, &map, &bio, &last_block_in_bio, rac); /*Error!! Only first page of current folio is subimited for read!*/ if (ret) { #ifdef CONFIG_F2FS_FS_COMPRESSION set_error_page: #endif folio_zero_segment(folio, 0, folio_size(folio)); folio_unlock(folio); } #ifdef CONFIG_F2FS_FS_COMPRESSION next_page: #endif #ifdef CONFIG_F2FS_FS_COMPRESSION /*... last page handle logic*/ #endif } if (bio) f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA); return ret; } As you can see in f2fs_mpage_readpages, after each folio is processed in the loop, the nr_pages counter is only decremented by 1. Therefore, it's clear that when the allocated folios in the page cache are all iterated through, nr_pages still has remaining value, and the loop continues. This naturally leads to a segmentation fault at index = folio_index(folio); due to dereferencing a null pointer. Furthermore, only the first page of each folio is submitted for I/O; the remaining pages are not filled with data from disk. This isn't the only place. Actually, when I was previously studying the implementation of f2fs_dirty_folio, I also noticed: static bool f2fs_dirty_data_folio(struct address_space *mapping, struct folio *folio) { /*...other code*/ if (filemap_dirty_folio(mapping, folio)) { f2fs_update_dirty_folio(inode, folio); return true; } return false; } void f2fs_update_dirty_folio(struct inode *inode, struct folio *folio) { /*...other code*/ spin_lock(&sbi->inode_lock[type]); if (type != FILE_INODE || test_opt(sbi, DATA_FLUSH)) __add_dirty_inode(inode, type); inode_inc_dirty_pages(inode);/*Only incre inode->dirty_pages by one*/ spin_unlock(&sbi->inode_lock[type]); set_page_private_reference(&folio->page); } In f2fs_update_dirty_folio, inode_inc_dirty_pages(inode) only increments the inode's dirty page count by 1. This is quite confusing, as it's unclear why it doesn't increment by nr_pages (assuming we are not yet tracking dirty status at per block level). This observation further strengthens my suspicion that the current folio implementation might just be fixed at order 0.I haven't check whether other piece of code in f2fs share the same problem. I am planning to prepare patches to address these issues and submit them soon. I noticed you recently submitted a big bunch of patches on folio. I would like to debug and test based on your patch.Therefore, I was wondering if it would be possible for you to share your modified F2FS code directly, or perhaps provide a link to your Git repository? Manually copying and applying so many patches from the mailing list would be quite cumbersome. Best regards. Matthew Wilcox <wi...@infradead.org> 于2025年3月31日周一 11:43写道: > > On Sun, Mar 30, 2025 at 10:38:37AM +0800, Nanzhe Zhao wrote: > > I have been considering potential solutions to address this. Two > > approaches I've explored are: > > Either modifying the f2fs dirty page writeback function to manually > > mark individual sub-pages within a folio as dirty, rather than relying > > on the folio-level dirty flag. > > Just so you know, the per-page dirty flag is not in fact per page. > If you call SetPageDirty() on a tail page, it will set the dirty flag > on the head page (ie the same bit that is used by folio_set_dirty()). > This is intentional as we do not intend for there to be a per-page flags > field in the future. > > > Or utilizing the per-block dirty state tracking feature introduced in > > kernel 6.6 within the iomap framework. This would involve using the > > iomap_folio_state structure to track the dirty status of each block > > within a folio. > > The challenge with that is that iomap does not support all the > functionality that f2fs requires. The iomap data structure could > be duplicated inside f2fs, but then we hit the problem that f2fs > currently stores other information in folio->private. So we'd need > to add a flags field to iomap_folio_state to store that information > instead. > > See the part of f2fs.h from PAGE_PRIVATE_GET_FUNC to the end of > clear_page_private_all(). > > You're right that f2fs needs per-block dirty tracking if it is to > support large folios. _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel