#2944: FAT data corruption during unmount()
Reporter: Sebastian Huber | Owner: chrisj@…
Type: defect | Status: new
Priority: normal | Milestone: 4.12
Component: filesystem | Version: 4.11
Severity: normal | Keywords:
In msdos_shut_down ( msdos_fsunmount.c ) there is a call to
fat_file_close( .. ) which attempts to close a file
descriptor and write a range of metadata to that file's director entry
located in another cluster:
The problem is that this is the root node, and of course doesn't have a
corresponding parent directory entry.
In addition, the "parent directory entry" cluster number is initialised to
which is not working according to the FAT specification (cluster numbering
starts at 2).
This actually creates a critical bug that overwrites random data to above
sectors, because 2 is subtracted from 1
to calculate the sector number of the cluster -> through a series of
function calls -> leads to a sector number at
the end of FAT2 (just below the start of the cluster region). The driver
believes this is a FAT region (in fat_buf_release),
writes the sector to what it "thinks" is FAT1, proceeds to copy the
changes to FAT2 -> adds FAT_LENGTH (8161) to sector,
leading to a write well into the cluster region, randomly overwriting
The three function calls above lead to fsck complaining about disk
fsck from util-linux 2.27.1
fsck.fat 3.0.28 (2015-05-16)
0x41: Dirty bit is set. Fs was not properly unmounted and some data may be
1) Remove dirty bit
2) No action
There are differences between boot sector and its backup.
This is mostly harmless. Differences: (offset:original/backup)
1) Copy original to backup
2) Copy backup to original
3) No action
Truncating second to 0 bytes because first is FAT32 root dir.
File size is 4096 bytes, cluster chain length is 0 bytes.
Truncating file to 0 bytes.
Perform changes ? (y/n) n
/dev/sdm1: 14 files, 1600/1044483 clusters
In particular the "shared cluster" problem is caused by
fat_file_write_first_cluster_num, which adds a directory
entry to the root directory cluster pointing at itself; e.g. there is a
directory entry in cluster 2 pointing to
a file in cluster 2. (Note: this occurs because we have fixed the "point
to cluster # 1 issue" by reading the relative
location of the root cluster node from the FAT volume info strcture).
Removing the function call in msdos_shut_down ( .. ) to close the root
file descriptor solves the problem perfectly
(clean fsck). However, we're a bit unsure about the intent behind closing
the root directory.
Ticket URL: <http://devel.rtems.org/ticket/2944>
RTEMS Project <http://www.rtems.org/>
bugs mailing list