#2944: FAT data corruption during unmount() -----------------------------+----------------------- Reporter: Sebastian Huber | Owner: chrisj@… Type: defect | Status: new Priority: normal | Milestone: 4.12 Component: filesystem | Version: 4.11 Severity: normal | Resolution: Keywords: | -----------------------------+-----------------------
Comment (by slemstick): Replying to [comment:1 Chris Johns]: > Replying to [ticket:2944 Sebastian Huber]: > > Removing the function call in msdos_shut_down ( .. ) to close the root file descriptor solves the problem perfectly (clean fsck). > > I assume you mean fat_file_close? Yes. > > > However, we're a bit unsure about the intent behind closing the root directory. > > There is memory allocated in fat_file_open which you would leak. We fixed this issue by creating a special "root file close" function, by removing the call to fat_file_update() in fat_file_close() (which caused the corruption). > > I see the fat_file_close calls fat_buf_release and if the fs_info cache is not empty it is holding a bdbuf buffer so this would cause a leak of buffers. > > What about the fat_file_close calls in the msdos init call on error? Would they also cause the same problem? Yes, these will cause the same issues. To update / summarise this ticket a bit here: We originally attempted a fix to this problem by setting the hard-coded root directory cluster number to 2, as well as the above (remove corruption caused by fat_file_update() in fat_file_close() on unmount). However, our attempt to fix the broken root cluster numbering breaks a hashing mechanism in fat_file_open(..). This mechanism indexes open file descriptors based on 1) parent directory cluster number and 2) offset into that directory structure. The issue is that the root directory, and the file pointed to by the first directory entry in the root directory, may construct their hashes based on the same indexes: > Root directory: cluster number 2, offset 0 > First file in root directory: cluster number 2, offset 0 Before, this was not a problem of course, as the root directory had the hard-coded cluster number of 1, and the keys were therefore always unique. But this can actually cause a number of new issues. The fix to this problem is to set the hard-coded root cluster directory number back to 1, instead of drastically changing the key hashing method function calls and data structures, and trusting that removing calls to fat_file_update(on_root_node) are sufficient to avoid the data corruption issue described above. However, there are two other places in msdos_misc.c where the hardcoded root directory cluster number - FAT_ROOTDIR_CLUSTER_NUM - is used: > msdos_get_name_node() > msdos_get_dotdot_dir_info_cluster_num() Like this: if ( (MSDOS_EXTRACT_CLUSTER_NUM(dotdot_node)) == 0) { /* * we handle root dir for all FAT types in the same way with the * ordinary directories ( through fat_file_* calls ) */ fat_dir_pos_init(dir_pos); dir_pos->sname.cln = FAT_ROOTDIR_CLUSTER_NUM; } Which, to my understanding, will never occur as you should never have a cluster number below 2 in a compliant (msdos) FAT file system. Does anyone know the intent behind this? -- Ticket URL: <http://devel.rtems.org/ticket/2944#comment:2> RTEMS Project <http://www.rtems.org/> RTEMS Project
_______________________________________________ bugs mailing list bugs@rtems.org http://lists.rtems.org/mailman/listinfo/bugs