Dear f2fs developers,

My name is [Your Name], and I am an undergraduate student from Nanjing
University of Technology in China. I am a newcomer to kernel
development, and I am currently learning about f2fs and working on
enabling large folio support.

After studying the concept of folios, especially after listening to
Mr. Matthew Wilcox's talk on folios, if I understand correctly, the
most crucial requirement for folios is that the constituent pages must
be contiguous in both physical and virtual memory.  Furthermore, the
indices of these constituent pages within the page cache must also be
consecutive. For example, an order-2 folio, which comprises four
pages, must have page cache indices of 0, 1, 2, and 3.

Currently, the f2fs garbage collection code, primarily the
`gc_data_segment` and `gc_node_segment` functions, still interacts
with the page cache using the `page` structure directly. However,
refactoring them to interact with the page cache using folios
encounters a significant challenge.  Taking `gc_node_segment` as an
example, in phase 1, it pre-reads all node pages from a victim segment
into the page cache:

/* f2fs/gc.c */
/* line: 1024 */
static int gc_node_segment(struct f2fs_sb_info *sbi,
struct f2fs_summary *sum, unsigned int segno, int gc_type)
{
struct f2fs_summary *entry;
block_t start_addr;
int off;
int phase = 0;
bool fggc = (gc_type == FG_GC);
int submitted = 0;
unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno);

start_addr = START_BLOCK(sbi, segno);

next_step:
entry = sum;

/* .... other code .... */

if (phase == 0) {
/* read ahead NAT block */
}

if (phase == 1) {
/* line: 1063 */
f2fs_ra_node_page(sbi, nid);
continue;
}

/* phase == 2: get node page, perform move node page, let page cache
* write them back */

}

if (++phase < 3)
goto next_step;

if (fggc)
atomic_dec(&sbi->wb_sync_req[NODE]);
return submitted;
}

The issue is that `f2fs_ra_node_page` directly inserts a key-value
pair into the page cache, using the nid as the index (key) and a page
as the value. However, within a segment, the nids recorded in the
`f2fs_summary_block` are not guaranteed to be contiguous. This
directly violates the folio requirement for consecutive page cache
indices. Although folios can represent a single page (if order is 0),
which means it's technically feasible to transition the interface from
`page` to `folio`. I have noticed Mr. Wilcox's recent patches working
on supporting folios in some GC-related functions.  However, this
non-contiguous index issue prevents us from allocating higher-order
folios during the GC process, thus limiting the potential benefits of
large folios for LRU reclaim and other aspects.

Now, I would like to propose some potential solutions that I have been
considering to address this issue.  I have two approaches in mind: one
is a more conservative approach, and the other is a more aggressive
one. Let's continue to use the pre-reading of node pages in phase 1 of
`gc_node_segment` as an example.

The more conservative approach would be to, before pre-reading node
pages, first retrieve all nids from the current summary block. Then,
we could de-duplicate these nids, sort them, and divide them into
clusters of as contiguous nids as possible.  For each cluster, we
could then allocate a folio with an order that best fits the cluster
size and place it into the page cache.

To illustrate the conservative approach with a more concrete example,
let's say a summary block contains the following nids: `[5, 3, 6, 1,
4, 9, 12, 9]`. We can process this list as follows:
 1. **De-duplication and Sorting:** First, we de-duplicate and sort
the nids, resulting in: `[1, 3, 4, 5, 6, 9, 12]`.
 2. **Clustering and Folio Allocation:** We can then identify
contiguous clusters. In this example, `[3, 4, 5, 6]` forms a
contiguous sequence. Therefore, we could allocate an order-2 folio to
accommodate this cluster of four nids. For the remaining nids `[1, 9,
12]`, we would allocate order-0 folios.

The more aggressive approach would be to remap the nids in the summary
block during the GC process, mapping them to contiguous integer
indices. This could potentially allow for better allocation of
higher-order folios. However, this approach would introduce
significant additional workload and overhead in multiple dimensions.
It would require maintaining an index remapping table in memory. Given
the potentially large number of nids, this might lead to considerable
memory consumption.

Furthermore, the additional lookup overhead would also be substantial,
especially considering that f2fs is often used in mobile devices,
which have stringent real-time performance requirements.  Moreover,
this approach would involve extensive code modifications.  Using
[https://elixir.bootlin.com/linux](https://elixir.bootlin.com/linux),
I found that `f2fs_get_node_page` is called in over 20 locations.
Implementing this aggressive approach would mean that all these calls
to `f2fs_get_node_page` would need to perform an extra lookup to
translate from the nid to the remapped index.

Therefore, I personally question whether introducing such significant
changes and incurring such substantial overhead is truly worthwhile,
especially just for the sake of higher-order folios.  Another unique
issue with this aggressive approach is its lack of generality compared
to the more conservative one.  The index discontinuity problem we
discussed actually exists for pre-reading NAT blocks, node blocks, and
data blocks in the GC process. However, NAT blocks, node blocks, and
data blocks have different numbering schemes and are managed by
different address_space page caches.  I have been using node page
pre-reading as an example for analysis, which implies that this index
remapping scheme I've conceived might not be suitable for data blocks
or NAT blocks.

In terms of next steps, I plan to focus on implementing the
conservative approach and submitting a patch for your review. As a
newcomer to kernel development, writing robust test cases is still a
challenge for me. Any pointers or best practices on testing in this
area would be immensely helpful. My main purpose in writing to the
list is to seek your expert opinions on two key questions: Firstly,
from your perspective, is the current inability of GC to allocate
higher-order folios a significant concern for f2fs? Secondly, I would
be very grateful for any feedback, especially critical feedback, on my
proposed solutions. I understand they are just initial ideas and
likely have room for improvement. Thank you for your time, patience,
and expertise. I am keen to learn from your feedback and contribute to
the f2fs community.
Best regards.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to