Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio

Nanzhe Zhao Fri, 04 Apr 2025 20:12:46 -0700

Thank you for your prompt and patient response!
>
> At this point, f2fs has no concept of head/tail pages.  Because it
> doesn't tell the VFS that it can handle large folios, it will only see
> order-0 pages.  The page->private member will go away, so filesystems
> cannot depend on being able to access it.  They only get folio->private,
> and it's recommended (but not required) that they use that to point to
> their own private per-folio struct.
Yes, I understand that we should treat all pages represented by a
folio as a whole. The folio structure itself acts as the head page.
Operations and flags applied to the folio are effectively applied to
all pages within it, except for those operations that need to track
page-specific attributes, such as whether a page is dirty or uptodate.
I was just previously a bit concern about whether special flags used
in private within f2fs needed to be tracked on a per-page basis,
otherwise information might be lost. Let me give a specific example.
For instance, PAGE_PRIVATE_ONGOING_MIGRATION indicates that a page is
undergoing block migration during garbage collection. Initially, I was
a bit worried about what would happen if some pages in a folio were in
garbage collection while others were not. However, after further
consideration and looking at how this flag is used in the f2fs code,
it seems that it's sufficient for the folio's private field to know
that it is in the migration phase of garbage collection. For
PAGE_PRIVATE_INLINE_INODE, just from the name of this enumeration, we
can tell that it will only be used for metadata pages. Therefore, we
can currently fix the folio order for metadata folios to 0.

> I do think the best approach is to extend iomap and then have f2fs use
> iomap, but I appreciate that is several large jobs.  It's worth it
> because it completely insulates f2fs from having to deal with
> pages/folios (except for metadata)

Well for iomap, I have several questions.
First of all,how should we define "having f2fs using iomap"? Does it
mean rewriting the address_space_operations using iomap-based APIs?
Let me take buffered read as a specific example. The only difference
between traditional buffered read and iomap-based buffered read is
whether to use mpage_readpages or iomap_readahead function during
aops->readahead. If using iomap in f2fs implies using the
iomap_readahead function, I am wondering if iomap_readahead supports
files based on indirect pointers?

I personally believe this question is very important. Because I
recently looked at the code related to iomap in buffered read for xfs
and the buffered read in the ext4 large folios patch. My conclusion is
that the current implementation of iomap_readahead in the mainline
kernel is entirely based on the assumption that the file's data block
allocation is extent-based. (The author of ext4 large folios patch
explicitly restricted the iomap buffered read path to extent-based
files).It seems to completely lack support for files with data blocks
allocated using indirect pointers. I am not sure if I have missed
something crucial or if my understanding of iomap's readahead logic is
not deep enough. I would like to confirm this with you. This is
because f2fs is a file system entirely based on indirect pointers (and
even with an additional layer of NAT table). The concept of "extent"
for files simply does not exist in f2fs. (And f2fs's extent cache is
also not the same concept as extent, which might be a point of
confusion). This is different from XFS, Btrfs, and even ext4. If there
are currently no iomap APIs that support indirect pointers, then using
iomap to support folios in f2fs in the short term is almost completely
infeasible. I also sent you an email previously to discuss this
matter. 
https://lore.kernel.org/linux-f2fs-devel/CAMLCH1FThw2hH3pNm_dYxDPRbQ=mpxxadzsgsxhpa4obzk8...@mail.gmail.com/T/#t
I have listened to the Linux Foundation talk "Challenges and Ideas in
Transitioning EXT* and other FS to iomap". The talk mentioned that
iomap is being optimized for mapping performance of files based on
indirect pointers. I am curious to know if there are any patches in
iomap currently that address the handling of indirect pointer
mappings?

Next,I would like to discuss the design of the disign of the extended
iomap strcture, assuming we make some extensions to the fields in
iomap_folio_state (for example, we might add f2fs's page_private
flags, sorry I haven't fully figured out the specific design yet), we
would not be able to directly use iomap's various ifs APIs (such as
ifs_alloc) with this extended structure. I am wondering if we could
write some adaptation layer APIs? For example, could we process this
extended iomap_folio_state structure in adpater function and then
delegate the operations to iomap's APIs?

If iomap indeed does not support indirect based files ,then regarding
how to enable large folio support in f2fs at this stage, I believe
that in the short term, making f2fs's traditional buffered read API
support large folios would be a more appropriate and pragmatic interim
solution. (I haven't yet deeply studied other
address_space_operations, so let's put them aside for now.)
Furthermore, I think we may need to embed calls to iomap APIs and
iomap data structures within these functions. For example, directly
using the extended iomap_folio_state structure and related APIs in
f2fs_mpage_readpages. I understand that iomap was not designed for
this kind of usage. But  I feel it might be difficult to avoid doing
so in the short term. To illustrate, besides buffered I/O, the garbage
collection process in f2fs also generates a significant amount of I/O
that interacts with the page cache. Moreover, garbage collection has
its own APIs for interacting with the page cache. Completely
refactoring them to directly follow the framework provided by iomap
might also be challenging.
If this approach might cause interface pollution for the future
migration of f2fs to iomap, then I think our current focus could be
prioritized on enabling large folio support for f2fs's traditional
buffered read and buffered write, as well as garbage collection, using
the solution I proposed. This should not interfere with iomap, as
iomap uses a completely separate set of interfaces for buffered read
and buffered write. If you have a better solution, I would be very
grateful if you could share your insights.

> Ah, you need a tool called b4.  Your distro may have it packaged,
> or you can get it from:
> https://git.kernel.org/pub/scm/utils/b4/b4.git
Thanks for recommendation.I think I've learned a lot with this
tool.Well it seems that when using the combination of b4 am and git am
commands to apply patches, issues can sometimes occur where patches
don't apply cleanly. It appears that each patch heavily relies on the
patch author's own kernel tree and their previous patches. The ext4
large folio support patch seems to be the case. So, sometimes it might
still be necessary to manually resolve code conflicts?
I apologize for the length of this reply. It also seems that this
discussion has drifted somewhat from the original subject of this
thread. If you think it would be better to start a new thread, please
let me know.
Best regards.
Matthew Wilcox <wi...@infradead.org> 于2025年4月2日周三 11:10写道：

Matthew Wilcox <wi...@infradead.org> 于2025年4月2日周三 11:10写道：
>
> On Tue, Apr 01, 2025 at 10:17:42PM +0800, Nanzhe Zhao wrote:
> > Based on my understanding after studying the code related to F2FS's
> > use of the private field of the page structure, it appears that F2FS
> > employs this field in a specific way. If the private field is not
> > interpreted as a pointer, it seems it could be used to store
> > additional flag bits. A key observation is that these functions seem
> > to apply to tail pages as well. Therefore, as you mentioned, if we are
> > using folios to manage multiple pages, it seems reasonable to consider
> > adding a similar field within the iomap_folio_state structure. This
> > would be analogous to how it currently tracks the uptodate and dirty
> > states for each subpage, allowing us to track the state of these
> > private fields for each subpage as well. Because it looks just like
> > F2FS is utilizing the private field as a way to extend the various
> > state flags of a page in memory. Perhaps it would be more appropriate
> > to directly name this new structure f2fs_folio_state? This is because
> > I'm currently unsure whether it will interact with existing iomap APIs
> > or if we will need to develop F2FS-specific APIs for it.
>
> At this point, f2fs has no concept of head/tail pages.  Because it
> doesn't tell the VFS that it can handle large folios, it will only see
> order-0 pages.  The page->private member will go away, so filesystems
> cannot depend on being able to access it.  They only get folio->private,
> and it's recommended (but not required) that they use that to point to
> their own private per-folio struct.
>
> I do think the best approach is to extend iomap and then have f2fs use
> iomap, but I appreciate that is several large jobs.  It's worth it
> because it completely insulates f2fs from having to deal with
> pages/folios (except for metadata)
>
> > > You're right that f2fs needs per-block dirty tracking if it is to
> > > support large folios.
> >
> > I feel that we need to consider more than just this aspect. In fact,
> > it might be because we are still in the early stages of F2FS folio
> > support,so it leaves me the impression that the current F2FS folio
> > implementation is essentially just replacing struct page at the
> > interface level. It effectively acts just like a single page, or in
> > other words, a folio of order 0.
>
> Right, that's the current approach.  We're taking it because the page
> APIs are being removed.  The f2fs developers have chosen to work on other
> projects instead of supporting large folios (which is their right),
> but they can't hold up the conversion of the entire filesystem stack
> from pages to folios, so they're getting the minimal conversion and can
> work on large folios when they have time.
>
> > As you can see in f2fs_mpage_readpages, after each folio is processed
> > in the loop, the nr_pages counter is only decremented by 1. Therefore,
> > it's clear that when the allocated folios in the page cache are all
> > iterated through, nr_pages still has remaining value, and the loop
> > continues. This naturally leads to a segmentation fault at index =
> > folio_index(folio); due to dereferencing a null pointer. Furthermore,
> > only the first page of each folio is submitted for I/O; the remaining
> > pages are not filled with data from disk.
>
> Yes, there are lots of places in f2fs that assume a folio only has a
> single page.
>
> > I am planning to prepare patches to address these issues and submit
> > them soon. I noticed you recently submitted a big bunch of patches on
> > folio. I would like to debug and test based on your patch.Therefore, I
> > was wondering if it would be possible for you to share your modified
> > F2FS code directly, or perhaps provide a link to your Git repository?
> > Manually copying and applying so many patches from the mailing list
> > would be quite cumbersome.
>
> Ah, you need a tool called b4.  Your distro may have it packaged,
> or you can get it from:
>
> https://git.kernel.org/pub/scm/utils/b4/b4.git

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] [DISCUSSION]:f2fs:Approachs to address write amplification in current aops->dirty_folio

Reply via email to