Dear f2fs developers, My name is [Your Name], and I am an undergraduate student from Nanjing University of Technology in China. I am a newcomer to kernel development, and I am currently learning about f2fs and working on enabling large folio support.
After studying the concept of folios, especially after listening to Mr. Matthew Wilcox's talk on folios, if I understand correctly, the most crucial requirement for folios is that the constituent pages must be contiguous in both physical and virtual memory. Furthermore, the indices of these constituent pages within the page cache must also be consecutive. For example, an order-2 folio, which comprises four pages, must have page cache indices of 0, 1, 2, and 3. Currently, the f2fs garbage collection code, primarily the `gc_data_segment` and `gc_node_segment` functions, still interacts with the page cache using the `page` structure directly. However, refactoring them to interact with the page cache using folios encounters a significant challenge. Taking `gc_node_segment` as an example, in phase 1, it pre-reads all node pages from a victim segment into the page cache: /* f2fs/gc.c */ /* line: 1024 */ static int gc_node_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, unsigned int segno, int gc_type) { struct f2fs_summary *entry; block_t start_addr; int off; int phase = 0; bool fggc = (gc_type == FG_GC); int submitted = 0; unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno); start_addr = START_BLOCK(sbi, segno); next_step: entry = sum; /* .... other code .... */ if (phase == 0) { /* read ahead NAT block */ } if (phase == 1) { /* line: 1063 */ f2fs_ra_node_page(sbi, nid); continue; } /* phase == 2: get node page, perform move node page, let page cache * write them back */ } if (++phase < 3) goto next_step; if (fggc) atomic_dec(&sbi->wb_sync_req[NODE]); return submitted; } The issue is that `f2fs_ra_node_page` directly inserts a key-value pair into the page cache, using the nid as the index (key) and a page as the value. However, within a segment, the nids recorded in the `f2fs_summary_block` are not guaranteed to be contiguous. This directly violates the folio requirement for consecutive page cache indices. Although folios can represent a single page (if order is 0), which means it's technically feasible to transition the interface from `page` to `folio`. I have noticed Mr. Wilcox's recent patches working on supporting folios in some GC-related functions. However, this non-contiguous index issue prevents us from allocating higher-order folios during the GC process, thus limiting the potential benefits of large folios for LRU reclaim and other aspects. Now, I would like to propose some potential solutions that I have been considering to address this issue. I have two approaches in mind: one is a more conservative approach, and the other is a more aggressive one. Let's continue to use the pre-reading of node pages in phase 1 of `gc_node_segment` as an example. The more conservative approach would be to, before pre-reading node pages, first retrieve all nids from the current summary block. Then, we could de-duplicate these nids, sort them, and divide them into clusters of as contiguous nids as possible. For each cluster, we could then allocate a folio with an order that best fits the cluster size and place it into the page cache. To illustrate the conservative approach with a more concrete example, let's say a summary block contains the following nids: `[5, 3, 6, 1, 4, 9, 12, 9]`. We can process this list as follows: 1. **De-duplication and Sorting:** First, we de-duplicate and sort the nids, resulting in: `[1, 3, 4, 5, 6, 9, 12]`. 2. **Clustering and Folio Allocation:** We can then identify contiguous clusters. In this example, `[3, 4, 5, 6]` forms a contiguous sequence. Therefore, we could allocate an order-2 folio to accommodate this cluster of four nids. For the remaining nids `[1, 9, 12]`, we would allocate order-0 folios. The more aggressive approach would be to remap the nids in the summary block during the GC process, mapping them to contiguous integer indices. This could potentially allow for better allocation of higher-order folios. However, this approach would introduce significant additional workload and overhead in multiple dimensions. It would require maintaining an index remapping table in memory. Given the potentially large number of nids, this might lead to considerable memory consumption. Furthermore, the additional lookup overhead would also be substantial, especially considering that f2fs is often used in mobile devices, which have stringent real-time performance requirements. Moreover, this approach would involve extensive code modifications. Using [https://elixir.bootlin.com/linux](https://elixir.bootlin.com/linux), I found that `f2fs_get_node_page` is called in over 20 locations. Implementing this aggressive approach would mean that all these calls to `f2fs_get_node_page` would need to perform an extra lookup to translate from the nid to the remapped index. Therefore, I personally question whether introducing such significant changes and incurring such substantial overhead is truly worthwhile, especially just for the sake of higher-order folios. Another unique issue with this aggressive approach is its lack of generality compared to the more conservative one. The index discontinuity problem we discussed actually exists for pre-reading NAT blocks, node blocks, and data blocks in the GC process. However, NAT blocks, node blocks, and data blocks have different numbering schemes and are managed by different address_space page caches. I have been using node page pre-reading as an example for analysis, which implies that this index remapping scheme I've conceived might not be suitable for data blocks or NAT blocks. In terms of next steps, I plan to focus on implementing the conservative approach and submitting a patch for your review. As a newcomer to kernel development, writing robust test cases is still a challenge for me. Any pointers or best practices on testing in this area would be immensely helpful. My main purpose in writing to the list is to seek your expert opinions on two key questions: Firstly, from your perspective, is the current inability of GC to allocate higher-order folios a significant concern for f2fs? Secondly, I would be very grateful for any feedback, especially critical feedback, on my proposed solutions. I understand they are just initial ideas and likely have room for improvement. Thank you for your time, patience, and expertise. I am keen to learn from your feedback and contribute to the f2fs community. Best regards. _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel