Hi Joe and Bernd,
thank you for your analysis!
If I understand you correctly, DOSLFN somehow writes directly
to directory data on the disk, bypassing DOS kernel BUFFERS,
but it has to explicitly invalidate those buffers through
an invocation of a "drive reset" DOS call each time. One
such invocation had been forgotten, creating the risk of
DOSLFN causing drive corruption, which got fixed by adding
the extra reset call, right?
On the other hand, this all sounds horribly inefficient
because it keeps flushing the BUFFERS way too often and
impacting their ability to improve I/O speed by caching.
Note that DOS caches may flush on "drive reset" as well,
depending on whether they explicitly monitor that call
and depending on whether the kernel itself does some BIOS
disk reset calls as reaction to the "drive reset" calls
from DOSLFN in case of BIOS-oriented caches like LBACACHE
and Jack's caching disk drivers.
Of course our CD/DVD caches are a different topic, but
CD/DVD do not use FAT LFN and DOSLFN does not write them.
So... What could be done? DOSLFN could somehow make sure
that disk writes can not happen without updating cached
copies of the affected data in BUFFERS. In other words,
it has to use DOS disk driver calls instead of BIOS calls.
Maybe it already does that, I have not checked?
In addition, DOSLFN may have to flush some other DOS
kernel state data, such as file tables or file handles.
One of those, or maybe something completely different,
should be the reason why DOSLFN has to flood the kernel
with drive reset calls around LFN write activities.
I hope there is a performance-friendly way to do this,
which ALSO ensures consistent, non-corrupted metadata.
Regards, Eric
Hi guys,
the 0.42-WIP version of DOSLFN provided by Jason Hood will hopefully
resolve
the filesystem corruption bug that was discussed earlier on the
mailing list.
How does it do that? I had assumed that there was a kernel problem,
but it also is plausible that DOSLFN manipulates directors cluster
and sector data directly, because the kernel would not do LFN itself,
so the corruption was some sort of caching or buffers conflict? So
is the fix cache-related, kernel-related, or DOSLFN-itself-related?
Apparently, the fix was to have DOSLFN perform a "drive reset". While
the corruption problem no longer seems to appear, I do not know if
this actually “fixed” the problem itself. Or, if it just makes an
underlying problem (maybe in the kernel) not occur.
While I think it is very imported that this fix prevents filesystem
from getting corrupted, I agree with you that further investigation
should be made to ensure that there is not a serious bug elsewhere
that this fix is covering over.
I also think it's a kernel bug.
In doslfn.asm, a call to "install_long_filename" or
"install_long_filename_noflagtest" was preceded by a call to "ResetDrv"
in "lfn_mkdir", "lfn_move". The patch added this call in "lfn_creat",
too. There's still no such call in "Tunnel_Save2", though. In
"lfn_mkdir", there's a comment above the call to "ResetDrv" which, in
English, says "should be appended to cache (here not yet)" so, yes, it's
definitely cache-related.
The "ResetDrv" procedure can be disabled (patched with an early RET)
with the "/i-" option. The default is enabled.
That's what I could find out by comparing "doslfn.asm" of the DOSLFN
0.41f and 0.42-wip sources. But what cache is the comment about: A)
DOSLFN's own caches, B) DOS's internal cache or C) a disk cache
(SmartDrv etc.)?
A) is impossible as DOSLFN's sector caches are currently disabled, see
the comment "FastOpen is now disabled by default, since it doesn't
recognise disc changes and doesn't seem to impact performance (at least
with modern DOS)" and parts (structures "TDI", "TSC" and "TFO") disabled
with "IF [USEFASTOPEN|0] [...] ENDIF".
C) is unlikely because such disk caches should work transparently; also,
it's easy to test: if you don't load a disk cache, the problem should go
away; and there's the comment "sector data is held in cache because
MS-DOS doesn't cache anything, even when SmartDrv is loaded, when direct
disc access is detected" (which was written when the sector caches were
enabled by default).
What remains is B) which suggests that FreeDOS kernel's own sector cache
is doing something wrong. Resetting the drive (INT 21h, AH=0Dh) flushes
them out which makes the problem disappear.
Joe
Bernd wrote:
Hi,
Am 01.08.2025 um 11:54 schrieb Eric Auer via Freedos-devel
<freedos-devel@lists.sourceforge.net>:
How does it do that? I had assumed that there was a kernel problem,
but it also is plausible that DOSLFN manipulates directors cluster
and sector data directly, because the kernel would not do LFN itself,
so the corruption was some sort of caching or buffers conflict? So
is the fix cache-related, kernel-related, or DOSLFN-itself-related?
I wrote to Jason about this. The specific change he made is at [1].
Not being in the mood to study the DOSLFN source code in detail I ask
Jason if he is sure that the additional reset he performs fixes a bug in
DOSLFN itself, or if it instead works around a potential kernel bug. He
is not sure about this. So with the current knowledge we CANNOT rule out
a kernel bug.
[1]:
https://github.com/adoxa/doslfn/commit/58ecded0bb15c67331b5ca2cd44e3856bfd9b5a4
Bernd
_______________________________________________
Freedos-devel mailing list
Freedos-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-devel