Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio

Nanzhe Zhao Tue, 08 Apr 2025 09:14:39 -0700

Dear Mr.Matthew,

I am writing to sincerely apologize for my significant
misunderstanding of iomap and f2fs in my previous email. I incorrectly
assumed that iomap was exclusively designed for extent-based file
systems. In fact, for indirect pointer-based file systems, in the
worst-case scenario, a single mapped data block can be represented as
an iomap structure with a length of one file system block.
Furthermore, after re-examining the f2fs_map_blocks function, I
realized that f2fs actually attempts to merge contiguous file system
blocks into a single f2fs_map_block structure. Therefore,  it is
entirely feasible to implement readahead for f2fs using iomap.


> The concept of "extent" for files simply does not exist in f2fs. (And f2fs's 
> extent cache is also not the same concept as extent, which might be a point of
> confusion).

I also want to apologize for my previous assertion in the email that
f2fs "completely lacks extents, and its extent cache is different from
extents." It is true that f2fs does not organize file blocks using
extents in the same direct manner as XFS and Btrfs, nor does it have a
dedicated extent file like ext4. However, as I mentioned earlier, the
f2fs_map_blocks function does attempt to merge consecutive file block
pre-read requests, and it caches this f2fs_map_blocks information in
its extent cache.It acts quite like building an extent tree ”online“
(I must admit that I am still not completely certain about this last
point, as I have not yet conducted an in-depth study of the f2fs
extent cache code.)

> Next,I would like to discuss the design of the disign of the extended
> iomap strcture, assuming we make some extensions to the fields in
> iomap_folio_state (for example, we might add f2fs's page_private
> flags, sorry I haven't fully figured out the specific design yet), we
> would not be able to directly use iomap's various ifs APIs (such as
> ifs_alloc) with this extended structure. I am wondering if we could
> write some adaptation layer APIs? For example, could we process this
> extended iomap_folio_state structure in adpater function and then
> delegate the operations to iomap's APIs?


Regarding the challenge of extending iomap_folio_state to address the
page->private field issue in f2fs, I have been developing some initial
ideas, although they are not yet fully formed. I am going to opening a
new thread on the f2fs mailing list in the near future to discuss this
matter further. I would greatly appreciate it if you could follow the
discussion there.

Nanzhe Zhao <nzzhao.si...@gmail.com> 于2025年4月5日周六 11:10写道：


Nanzhe Zhao <nzzhao.si...@gmail.com> 于2025年4月5日周六 11:10写道：
>
> Thank you for your prompt and patient response!
> >
> > At this point, f2fs has no concept of head/tail pages.  Because it
> > doesn't tell the VFS that it can handle large folios, it will only see
> > order-0 pages.  The page->private member will go away, so filesystems
> > cannot depend on being able to access it.  They only get folio->private,
> > and it's recommended (but not required) that they use that to point to
> > their own private per-folio struct.
> Yes, I understand that we should treat all pages represented by a
> folio as a whole. The folio structure itself acts as the head page.
> Operations and flags applied to the folio are effectively applied to
> all pages within it, except for those operations that need to track
> page-specific attributes, such as whether a page is dirty or uptodate.
> I was just previously a bit concern about whether special flags used
> in private within f2fs needed to be tracked on a per-page basis,
> otherwise information might be lost. Let me give a specific example.
> For instance, PAGE_PRIVATE_ONGOING_MIGRATION indicates that a page is
> undergoing block migration during garbage collection. Initially, I was
> a bit worried about what would happen if some pages in a folio were in
> garbage collection while others were not. However, after further
> consideration and looking at how this flag is used in the f2fs code,
> it seems that it's sufficient for the folio's private field to know
> that it is in the migration phase of garbage collection. For
> PAGE_PRIVATE_INLINE_INODE, just from the name of this enumeration, we
> can tell that it will only be used for metadata pages. Therefore, we
> can currently fix the folio order for metadata folios to 0.
>
> > I do think the best approach is to extend iomap and then have f2fs use
> > iomap, but I appreciate that is several large jobs.  It's worth it
> > because it completely insulates f2fs from having to deal with
> > pages/folios (except for metadata)
>
> Well for iomap, I have several questions.
> First of all,how should we define "having f2fs using iomap"? Does it
> mean rewriting the address_space_operations using iomap-based APIs?
> Let me take buffered read as a specific example. The only difference
> between traditional buffered read and iomap-based buffered read is
> whether to use mpage_readpages or iomap_readahead function during
> aops->readahead. If using iomap in f2fs implies using the
> iomap_readahead function, I am wondering if iomap_readahead supports
> files based on indirect pointers?
>
> I personally believe this question is very important. Because I
> recently looked at the code related to iomap in buffered read for xfs
> and the buffered read in the ext4 large folios patch. My conclusion is
> that the current implementation of iomap_readahead in the mainline
> kernel is entirely based on the assumption that the file's data block
> allocation is extent-based. (The author of ext4 large folios patch
> explicitly restricted the iomap buffered read path to extent-based
> files).It seems to completely lack support for files with data blocks
> allocated using indirect pointers. I am not sure if I have missed
> something crucial or if my understanding of iomap's readahead logic is
> not deep enough. I would like to confirm this with you. This is
> because f2fs is a file system entirely based on indirect pointers (and
> even with an additional layer of NAT table). The concept of "extent"
> for files simply does not exist in f2fs. (And f2fs's extent cache is
> also not the same concept as extent, which might be a point of
> confusion). This is different from XFS, Btrfs, and even ext4. If there
> are currently no iomap APIs that support indirect pointers, then using
> iomap to support folios in f2fs in the short term is almost completely
> infeasible. I also sent you an email previously to discuss this
> matter. 
> https://lore.kernel.org/linux-f2fs-devel/CAMLCH1FThw2hH3pNm_dYxDPRbQ=mpxxadzsgsxhpa4obzk8...@mail.gmail.com/T/#t
> I have listened to the Linux Foundation talk "Challenges and Ideas in
> Transitioning EXT* and other FS to iomap". The talk mentioned that
> iomap is being optimized for mapping performance of files based on
> indirect pointers. I am curious to know if there are any patches in
> iomap currently that address the handling of indirect pointer
> mappings?
>
> Next,I would like to discuss the design of the disign of the extended
> iomap strcture, assuming we make some extensions to the fields in
> iomap_folio_state (for example, we might add f2fs's page_private
> flags, sorry I haven't fully figured out the specific design yet), we
> would not be able to directly use iomap's various ifs APIs (such as
> ifs_alloc) with this extended structure. I am wondering if we could
> write some adaptation layer APIs? For example, could we process this
> extended iomap_folio_state structure in adpater function and then
> delegate the operations to iomap's APIs?
>
> If iomap indeed does not support indirect based files ,then regarding
> how to enable large folio support in f2fs at this stage, I believe
> that in the short term, making f2fs's traditional buffered read API
> support large folios would be a more appropriate and pragmatic interim
> solution. (I haven't yet deeply studied other
> address_space_operations, so let's put them aside for now.)
> Furthermore, I think we may need to embed calls to iomap APIs and
> iomap data structures within these functions. For example, directly
> using the extended iomap_folio_state structure and related APIs in
> f2fs_mpage_readpages. I understand that iomap was not designed for
> this kind of usage. But  I feel it might be difficult to avoid doing
> so in the short term. To illustrate, besides buffered I/O, the garbage
> collection process in f2fs also generates a significant amount of I/O
> that interacts with the page cache. Moreover, garbage collection has
> its own APIs for interacting with the page cache. Completely
> refactoring them to directly follow the framework provided by iomap
> might also be challenging.
> If this approach might cause interface pollution for the future
> migration of f2fs to iomap, then I think our current focus could be
> prioritized on enabling large folio support for f2fs's traditional
> buffered read and buffered write, as well as garbage collection, using
> the solution I proposed. This should not interfere with iomap, as
> iomap uses a completely separate set of interfaces for buffered read
> and buffered write. If you have a better solution, I would be very
> grateful if you could share your insights.
>
> > Ah, you need a tool called b4.  Your distro may have it packaged,
> > or you can get it from:
> > https://git.kernel.org/pub/scm/utils/b4/b4.git
> Thanks for recommendation.I think I've learned a lot with this
> tool.Well it seems that when using the combination of b4 am and git am
> commands to apply patches, issues can sometimes occur where patches
> don't apply cleanly. It appears that each patch heavily relies on the
> patch author's own kernel tree and their previous patches. The ext4
> large folio support patch seems to be the case. So, sometimes it might
> still be necessary to manually resolve code conflicts?
> I apologize for the length of this reply. It also seems that this
> discussion has drifted somewhat from the original subject of this
> thread. If you think it would be better to start a new thread, please
> let me know.
> Best regards.
> Matthew Wilcox <wi...@infradead.org> 于2025年4月2日周三 11:10写道：
>
>
> Matthew Wilcox <wi...@infradead.org> 于2025年4月2日周三 11:10写道：
> >
> > On Tue, Apr 01, 2025 at 10:17:42PM +0800, Nanzhe Zhao wrote:
> > > Based on my understanding after studying the code related to F2FS's
> > > use of the private field of the page structure, it appears that F2FS
> > > employs this field in a specific way. If the private field is not
> > > interpreted as a pointer, it seems it could be used to store
> > > additional flag bits. A key observation is that these functions seem
> > > to apply to tail pages as well. Therefore, as you mentioned, if we are
> > > using folios to manage multiple pages, it seems reasonable to consider
> > > adding a similar field within the iomap_folio_state structure. This
> > > would be analogous to how it currently tracks the uptodate and dirty
> > > states for each subpage, allowing us to track the state of these
> > > private fields for each subpage as well. Because it looks just like
> > > F2FS is utilizing the private field as a way to extend the various
> > > state flags of a page in memory. Perhaps it would be more appropriate
> > > to directly name this new structure f2fs_folio_state? This is because
> > > I'm currently unsure whether it will interact with existing iomap APIs
> > > or if we will need to develop F2FS-specific APIs for it.
> >
> > At this point, f2fs has no concept of head/tail pages.  Because it
> > doesn't tell the VFS that it can handle large folios, it will only see
> > order-0 pages.  The page->private member will go away, so filesystems
> > cannot depend on being able to access it.  They only get folio->private,
> > and it's recommended (but not required) that they use that to point to
> > their own private per-folio struct.
> >
> > I do think the best approach is to extend iomap and then have f2fs use
> > iomap, but I appreciate that is several large jobs.  It's worth it
> > because it completely insulates f2fs from having to deal with
> > pages/folios (except for metadata)
> >
> > > > You're right that f2fs needs per-block dirty tracking if it is to
> > > > support large folios.
> > >
> > > I feel that we need to consider more than just this aspect. In fact,
> > > it might be because we are still in the early stages of F2FS folio
> > > support,so it leaves me the impression that the current F2FS folio
> > > implementation is essentially just replacing struct page at the
> > > interface level. It effectively acts just like a single page, or in
> > > other words, a folio of order 0.
> >
> > Right, that's the current approach.  We're taking it because the page
> > APIs are being removed.  The f2fs developers have chosen to work on other
> > projects instead of supporting large folios (which is their right),
> > but they can't hold up the conversion of the entire filesystem stack
> > from pages to folios, so they're getting the minimal conversion and can
> > work on large folios when they have time.
> >
> > > As you can see in f2fs_mpage_readpages, after each folio is processed
> > > in the loop, the nr_pages counter is only decremented by 1. Therefore,
> > > it's clear that when the allocated folios in the page cache are all
> > > iterated through, nr_pages still has remaining value, and the loop
> > > continues. This naturally leads to a segmentation fault at index =
> > > folio_index(folio); due to dereferencing a null pointer. Furthermore,
> > > only the first page of each folio is submitted for I/O; the remaining
> > > pages are not filled with data from disk.
> >
> > Yes, there are lots of places in f2fs that assume a folio only has a
> > single page.
> >
> > > I am planning to prepare patches to address these issues and submit
> > > them soon. I noticed you recently submitted a big bunch of patches on
> > > folio. I would like to debug and test based on your patch.Therefore, I
> > > was wondering if it would be possible for you to share your modified
> > > F2FS code directly, or perhaps provide a link to your Git repository?
> > > Manually copying and applying so many patches from the mailing list
> > > would be quite cumbersome.
> >
> > Ah, you need a tool called b4.  Your distro may have it packaged,
> > or you can get it from:
> >
> > https://git.kernel.org/pub/scm/utils/b4/b4.git


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio

Reply via email to