On Thu, Jan 8, 2026 at 4:10 AM Gao Xiang <[email protected]> wrote: > > > > On 2026/1/8 10:32, Gao Xiang wrote: > > Hi Sheng, > > > > On 2026/1/8 10:26, Sheng Yong wrote: > >> On 1/7/26 01:05, Gao Xiang wrote: > >>> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking > >>> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel > >>> stack overflow when stacking an unlimited number of EROFS on top of > >>> each other. > >>> > >>> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes > >>> (and such setups are already used in production for quite a long time). > >>> > >>> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH > >>> from 2 to 3, but proving that this is safe in general is a high bar. > >>> > >>> After a long discussion on GitHub issues [1] about possible solutions, > >>> one conclusion is that there is no need to support nesting file-backed > >>> EROFS mounts on stacked filesystems, because there is always the option > >>> to use loopback devices as a fallback. > >>> > >>> As a quick fix for the composefs regression for this cycle, instead of > >>> bumping `s_stack_depth` for file backed EROFS mounts, we disallow > >>> nesting file-backed EROFS over EROFS and over filesystems with > >>> `s_stack_depth` > 0. > >>> > >>> This works for all known file-backed mount use cases (composefs, > >>> containerd, and Android APEX for some Android vendors), and the fix is > >>> self-contained. > >>> > >>> Essentially, we are allowing one extra unaccounted fs stacking level of > >>> EROFS below stacking filesystems, but EROFS can only be used in the read > >>> path (i.e. overlayfs lower layers), which typically has much lower stack > >>> usage than the write path. > >>> > >>> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more > >>> stack usage analysis or using alternative approaches, such as splitting > >>> the `s_stack_depth` limitation according to different combinations of > >>> stacking. > >>> > >>> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for > >>> file-backed mounts") > >>> Reported-by: Dusty Mabe <[email protected]> > >>> Reported-by: Timothée Ravier <[email protected]> > >>> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1] > >>> Reported-by: "Alekséi Naidénov" <[email protected]> > >>> Closes: > >>> https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0qxwcpec9s...@mail.gmail.com > >>> Acked-by: Amir Goldstein <[email protected]> > >>> Cc: Alexander Larsson <[email protected]> > >>> Cc: Christian Brauner <[email protected]> > >>> Cc: Miklos Szeredi <[email protected]> > >>> Cc: Sheng Yong <[email protected]> > >>> Cc: Zhiguo Niu <[email protected]> > >>> Signed-off-by: Gao Xiang <[email protected]> > >>> --- > >>> v2: > >>> - Update commit message (suggested by Amir in 1-on-1 talk); > >>> - Add proper `Reported-by:`. > >>> > >>> fs/erofs/super.c | 18 ++++++++++++------ > >>> 1 file changed, 12 insertions(+), 6 deletions(-) > >>> > >>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c > >>> index 937a215f626c..0cf41ed7ced8 100644 > >>> --- a/fs/erofs/super.c > >>> +++ b/fs/erofs/super.c > >>> @@ -644,14 +644,20 @@ static int erofs_fc_fill_super(struct super_block > >>> *sb, struct fs_context *fc) > >>> * fs contexts (including its own) due to self-controlled RO > >>> * accesses/contexts and no side-effect changes that need to > >>> * context save & restore so it can reuse the current thread > >>> - * context. However, it still needs to bump `s_stack_depth` to > >>> - * avoid kernel stack overflow from nested filesystems. > >>> + * context. > >>> + * However, we still need to prevent kernel stack overflow due > >>> + * to filesystem nesting: just ensure that s_stack_depth is 0 > >>> + * to disallow mounting EROFS on stacked filesystems. > >>> + * Note: s_stack_depth is not incremented here for now, since > >>> + * EROFS is the only fs supporting file-backed mounts for now. > >>> + * It MUST change if another fs plans to support them, which > >>> + * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH. > >>> */ > >>> if (erofs_is_fileio_mode(sbi)) { > >>> - sb->s_stack_depth = > >>> - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1; > >>> - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) { > >>> - erofs_err(sb, "maximum fs stacking depth exceeded"); > >>> + inode = file_inode(sbi->dif0.file); > >>> + if (inode->i_sb->s_op == &erofs_sops || > >> > >> Hi, Xiang > >> > >> In Android APEX scenario, apex images formatted as EROFS are packed in > >> system.img which is also EROFS format. As a result, it will always fail > >> to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops' > >> is true. > >> Any thoughts to handle such scenario? > > > > Sorry, I forgot this popular case, I think it can be simply resolved > > by the following diff: > > > > diff --git a/fs/erofs/super.c b/fs/erofs/super.c > > index 0cf41ed7ced8..e93264034b5d 100644 > > --- a/fs/erofs/super.c > > +++ b/fs/erofs/super.c > > @@ -655,7 +655,7 @@ static int erofs_fc_fill_super(struct super_block *sb, > > struct fs_context *fc) > > */ > > if (erofs_is_fileio_mode(sbi)) { > > inode = file_inode(sbi->dif0.file); > > - if (inode->i_sb->s_op == &erofs_sops || > > + if ((inode->i_sb->s_op == &erofs_sops && > > !sb->s_bdev) || > > Sorry it should be `!inode->i_sb->s_bdev`, I've > fixed it in v3 RESEND:
A RESEND implies no changes since v3, so this is bad practice. > https://lore.kernel.org/r/[email protected] > Ouch! If the erofs maintainer got this condition wrong... twice... Maybe better using the helper instead of open coding this non trivial check? if ((inode->i_sb->s_op == &erofs_sops && erofs_is_fileio_mode(EROFS_I_SB(inode))) Thanks, Amir.
