Bug#1057843: linux: ext4 data corruption in 6.1.64-1

2023-12-12 Thread BW
Online check of ext4:

If your filesystem is located on a logical volume (LVM) then I assume you
can make a snapshot and do a check of that.

Make SS:
lvcreate --snapshot --size 1G --name lv_root_SS --chunksize 4k
/dev/VG1/lv_root

EXT4 check:
e2fsck -f /dev/dm-3

Remove SS:
lvremove --yes VG1/lv_root_SS


Bug#1057843: linux: ext4 data corruption in 6.1.64-1

2023-12-12 Thread helios . solaris

Will a file system check detect the corruptions?
Can it be done online?

Thank you.



Bug#1057843: linux: ext4 data corruption in 6.1.64-1

2023-12-11 Thread Salvatore Bonaccorso
As there were some questions along in this thread let me summarize
some points:

The issue affects fs/ext4 code, so no other filesystems are affected
(e.g. btrfs).

The issue affects all kernels which have the commit 91562895f803
("ext4: properly sync file size update after O_SYNC direct IO") from
6.7-rc1 (which is present in 6.6.3, 6.5.13 and 6.1.64) but when commit
936e114a245b ("iomap: update ki_pos a little later in
iomap_dio_complete") from 6.5-rc1 is missing (which was backported to
5.15.142 and 6.1.66 additionally).

The only upstream combination where that reverse and missing commit
happened was 6.1.64 and 6.1.65. 

Debian is affected as per 6.1.64-1 upload which was the kernel aimed
for 12.3 point release.

The issue affects file corruption when direct IO writes are involved.
O_DIRECT writes did not properly update current file position after
the write so data and file was getting mangled.

While this does not affect every write ever happend on the system on a
ext4 filesystem with a broken kernel, O_DIRECT writes might be quite
common in in programms trying to get high performance. It might be
argued that it is not that common, but it's not inexistant.

TTOMK, such file corruptions cannot be easily detected. Candidates to
check are every modified file written since booted with the broken
kernel 6.1.64-1.

Poeple still not having booted into 6.1.66-1 are urged to do so.

Regards,
Salvatore



Bug#1057843: linux: ext4 data corruption in 6.1.64-1

2023-12-11 Thread Dennis Grevenstein
On Mon, 11 Dec 2023 10:38:40 +0100 helios.sola...@gmx.ch wrote:
> I have been running debian 12.3 with kernel 6.1.64-1 for a few hours,
> how can I find out whether the file system has been corrupted?

yes, I would also appreciate an explanation who could be affected,
how to diagnose the problem, and what needs to be done.
Please note that not all the users of Debian stable are kernel hackers
who will be able to look at the filesystem code and understand the
full extent of the problem.

thanks,
Dennis



Bug#1057843: linux: ext4 data corruption in 6.1.64-1

2023-12-11 Thread helios . solaris

I have been running debian 12.3 with kernel 6.1.64-1 for a few hours,
how can I find out whether the file system has been corrupted?



Bug#1057843: linux: ext4 data corruption in 6.1.64-1

2023-12-09 Thread Salvatore Bonaccorso
Hi,

On Sat, Dec 09, 2023 at 03:07:37PM +0100, Salvatore Bonaccorso wrote:
> Source: linux
> Version: 6.1.64-1
> Severity: grave
> Tags: upstream
> Justification: causes non-serious data loss
> X-Debbugs-Cc: debian-rele...@lists.debian.org, car...@debian.org, 
> a...@debian.org
> 
> Hi
> 
> I'm filling this for visibility.
> 
> There might be a ext4 data corruption issue with the kernel released
> in the 12.3 bookworm point release (which is addressed in 6.1.66
> upstream already).
> 
> The report about the regression and some details:
> 
> https://lore.kernel.org/stable/20231205122122.dfhhoaswsfscuhc3@quack3/

6.1.66 upstream fixes the issue:

# uname -a
Linux bookworm-amd64 6.1.0-15-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.66-1 
(2023-12-06) x86_64 GNU/Linux
# LTP_SINGLE_FS_TYPE=ext4 LTP_DEV_FS_TYPE=ext4 ./preadv03_64
tst_device.c:96: TINFO: Found free device 0 '/dev/loop0'
tst_test.c:1690: TINFO: LTP version: 20230929-194-g5c096b2cf
tst_test.c:1574: TINFO: Timeout per run is 0h 00m 30s
tst_supported_fs_types.c:149: TINFO: WARNING: testing only ext4
tst_supported_fs_types.c:90: TINFO: Kernel supports ext4
tst_supported_fs_types.c:55: TINFO: mkfs.ext4 does exist
tst_test.c:1650: TINFO: === Testing on ext4 ===
tst_test.c:1105: TINFO: Formatting /dev/loop0 with ext4 opts='' extra opts=''
mke2fs 1.47.0 (5-Feb-2023)
tst_test.c:1119: TINFO: Mounting /dev/loop0 to /tmp/LTP_preGGYjTj/mntpoint 
fstyp=ext4 flags=0
preadv03.c:102: TINFO: Using block size 512
preadv03.c:87: TPASS: preadv(O_DIRECT) read 512 bytes successfully with content 
'a' expectedly
preadv03.c:87: TPASS: preadv(O_DIRECT) read 512 bytes successfully with content 
'a' expectedly
preadv03.c:87: TPASS: preadv(O_DIRECT) read 512 bytes successfully with content 
'b' expectedly

Summary:
passed   3
failed   0
broken   0
skipped  0
warnings 0

Regards,
Salvatore



Bug#1057843: linux: ext4 data corruption in 6.1.64-1

2023-12-09 Thread Salvatore Bonaccorso
Running the single test with ext4:

# LTP_SINGLE_FS_TYPE=ext4 LTP_DEV_FS_TYPE=ext4 ./preadv03_64
tst_device.c:96: TINFO: Found free device 0 '/dev/loop0'
tst_test.c:1690: TINFO: LTP version: 20230929-194-g5c096b2cf
tst_test.c:1574: TINFO: Timeout per run is 0h 00m 30s
tst_supported_fs_types.c:149: TINFO: WARNING: testing only ext4
tst_supported_fs_types.c:90: TINFO: Kernel supports ext4
tst_supported_fs_types.c:55: TINFO: mkfs.ext4 does exist
tst_test.c:1650: TINFO: === Testing on ext4 ===
tst_test.c:1105: TINFO: Formatting /dev/loop0 with ext4 opts='' extra opts=''
mke2fs 1.47.0 (5-Feb-2023)
tst_test.c:1119: TINFO: Mounting /dev/loop0 to /tmp/LTP_preWBHd7l/mntpoint 
fstyp=ext4 flags=0
preadv03.c:102: TINFO: Using block size 512
preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
preadv03.c:77: TFAIL: Buffer wrong at 0 have 62 expected 61
preadv03.c:66: TFAIL: preadv(O_DIRECT) read 0 bytes, expected 512

Summary:
passed   0
failed   3
broken   0
skipped  0
warnings 0



Bug#1057843: linux: ext4 data corruption in 6.1.64-1

2023-12-09 Thread Salvatore Bonaccorso
Source: linux
Version: 6.1.64-1
Severity: grave
Tags: upstream
Justification: causes non-serious data loss
X-Debbugs-Cc: debian-rele...@lists.debian.org, car...@debian.org, 
a...@debian.org

Hi

I'm filling this for visibility.

There might be a ext4 data corruption issue with the kernel released
in the 12.3 bookworm point release (which is addressed in 6.1.66
upstream already).

The report about the regression and some details:

https://lore.kernel.org/stable/20231205122122.dfhhoaswsfscuhc3@quack3/

Regards,
Salvatore