https://bugzilla.kernel.org/show_bug.cgi?id=219586

            Bug ID: 219586
           Summary: Unable to find file after unicode change
           Product: File System
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: blocking
          Priority: P3
         Component: f2fs
          Assignee: filesystem_f...@kernel-bugs.kernel.org
          Reporter: ha...@vivo.com
        Regression: No

Hi everybody,
The f2fs filesystem is unable to read some files with special characters,
such as ❤️, after the kernel was updated with the following patch:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=18b5f47e7da46d3a0d7331e48befcaf151ed2ddf

We can reproduce this in the following steps:
1、First, we need to roll back the unicode-related changes above and create
the special character file or folder:
./tools/mkfs.f2fs -f -O casefold -C utf8 f2fs.img
mount f2fs.img f2fs_dir/
mkdir Picture
./f2fs_io setflags casefold Picture
cd Picture
touch ❤️

2、Then we apply the above unicode patch, and after mounting the filesystem,
we get a message that the special character file was not found.
mount f2fs.img f2fs_dir/
cd Picture
ls -alh
ls: cannot access '❤️': No such file or directory
total 8
drwxr-xr-x 2 root root 3488 Dec 10 06:11 .
drwxr-xr-x 3 root root 4096 Dec  9 10:21 ..
-????????? ? ?    ?       ?            ? ❤️

Here are the conclusions of my preliminary analysis.
In casefole-enabled f2fs filesystems, file names are converted to lowercase
by the utf8_casefold function when querying for a file, and then the hash is
calculated based on the lowercase filename and stored on disk. The path to
the function is:
f2fs_lookup
    f2fs_prepare_lookup
        __f2fs_setup_filename
            f2fs_init_casefolded_name
                utf8_casefold
            f2fs_hash_filename
    __f2fs_find_entry

For some files that contain special characters, such as ❤️. We found that the
length of the output characters changed after the utf8_casefold function
converted
them to lowercase before and after the patch, which ultimately led to a change
in the
calculated hash. Files created before patch are not readable after path is
enabled.

I think we need to modify the f2fs filesystem to be compatible with unicode
related changes.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to