Hi Tom!
> > Hi, I think it is now clear that Turbo C 2 cannot optimize for 386.
> > HOWEVER, the NASM parts can still be optimized for 386.
> ... but then save a few byte by using a different longDIVMODMUL
> implementation doesn't make much sense.
This was not only about space, it is also about speed. 8086 code is
slow with mul, div, shift of 32 bit numbers, because - at least in
TC - it is implemented in bit-by-bit loops. Not sure how often 32 bit
numbers are used in speed-critical way, though. Somebody (Johnson? Lucho?)
reported long ago that 386 optimization of the WHOLE kernel mostly reduces
size (I think Bart gave a nice overview of FAT32 cost / 386 savings size-
wise) but does not change speed considerably anyway.
So okay, if you prefer, make 386 compile attempts fail completely on TC2.
Remember that TCC calls 286 optimizations "-1", not "-2", though ;-).
> > By the way, I recently saw something about 32bit and 64bit multiply / divide
> there's no 64 code anywhere.
You are right. This was misleading /* comments */ in the old source...
We have LDIVMODU and LMULU, and they do:
dx:ax = dx:ax / cx:bx, cx:bx = dx:ax / cx:bx and dx:ax = dx:ax * cx:bx
I see that those are placed as __U4M and __U4D in Watcom...
By the way, what is this? -->
> MATH_EXTRACT=*H_LDIV *H_LLSH *H_LURSH *F_LXMUL
> MATH_INSERT=+H_LDIV +H_LLSH +H_LURSH +F_LXMUL
MATH_EXTRACT=*aflmul *aFlshl *aFNauldi *aFulrem *aFulshr *aFuldiv *aFlrem *aFlldiv
MATH_INSERT= +aflmul +aFlshl +aFNauldi +aFulrem +aFulshr +aFuldiv +aFlrem +aFldiv
MATH_EXTRACT=*LDIV *LXMUL *LURSH *LLSH *LRSH
MATH_INSERT=+LDIV +LXMUL +LURSH +LLSH +LRSH
MATH_EXTRACT=*H_LDIV *H_LLSH *H_LURSH *F_LXMUL
MATH_INSERT=+H_LDIV +H_LLSH +H_LURSH +F_LXMUL
...?
It might sound stupid, but I cannot find any place in the sources
where LDIVMODU is actually used?
Neither for LMULU... probably it means that our Asm code replaces something
which would be normally present in the compiler as library function?
The __U4M / __U4D (and U4M / U4D etc.) are nowhere visible in source either.
One of the few things where 32 bit division happens is a 32:16 = 16 modulo:
init_LBA_to_CHS ... -> unsigned hsrem = (unsigned)(LBA_address % hs);
where LBA_address is ULONG, and then: LBA_address /= hs; where hs is unsigned
(sectors * head, so the result of the division can be potentially above
16 bit range, but:
chs->Cylinder = LBA_address >= 0x10000ul ? 0xffffu : (unsigned)LBA_address;
So we explicitly disallowed "LBA_address / hs" being > 64k! You could rewrite
that as:
chs->Cylinder = (LBA_address>>16 >= hs) ? 0xffffu : (unsigned)(LBA_address/hs);
... and hope that the compiler correctly implements the division as 32:16=16bit
division!
Similar code exists in LBA_to_CHS - this has different arguments and range
checks, but the core "problem" is the same: An unneeded 32:16=32 division
which can be replaced by a 32:16=16 one (actually the code even complains
already for results > 1023, one could arguably add support for special
"high cylinder count" extensions like:
- int 13.ee / 13.ef ("add N to cylinder value of next int 13.2/13.3")
- putting 2 or 4 extra cylinder number bits in the "head" register when
calling int 13.2/13.3
but that would require detection of compatible BIOSes in the first place.
While looking for other 32bit variables, I found a problem in the "dmatch"
structure: dm_dircluster should be of type "CLUSTER", not of type "if FAT32
then ULONG else ...". Same for dpb_xrootclst and dpb_xcluster!
Not sure if you can change this:
> struct lfn_inode
> UNICODE name[261];
> struct dirent l_dir; /* this file's dir entry image */
> ULONG l_diroff; /* offset of the dir entry */
^^^^^ if you use the slot number instead, 16 bit will be enough.
As far as I know, FAT32 directories cannot have > 64k entries each.
Another dmatch issue: dm_size is LONG but should be ULONG at least for FAT32.
Potential bug cause for processing of files > 2 GB.
In bpb, bpb_xrootclst should be CLUSTER, not ULONG.
Actually, dirent already uses ULONG for dir_size, so at least that is okay.
Even fcb_size and fcb_rndm in FCBs are ULONG - dunno if FCBs are for > 2 GB...
The f_node f_offset is ULONG, too. Nice. Would it be okay to change
f_count from UWORD to UBYTE? Or can files be used more than 255 times at a
time? How about f_mode and f_flags, both 16 bit...? An f_node also contains
a dirent, the slot number of the dir entry (16 bit), the dir starting
cluster (unless it is root of FAT1x - hopefully FAT32 root is special-cased)
and both the f_cluster (current) and f_cluster_offset (relative cluster
number within file), both of type CLUSTER. Finally there is f_offset, which
is ULONG, as told. Note that f_cluster_offset cannot be calculated from
f_cluster - at most you could try to relate it to f_offset, not sure...
f_cluster_offset will never have more than 23 bits used, because FAT32 is
not specified for files above 4 GB size, even though cluster numbers CAN
be up to 28 bits.
If you should have wondered: f_count is incremented only in get_f_node.
(other places set it to 0 or 1 or decrement it)
Flags which can be in f_mode are: RDWR, O_ACCMODE, the dos_open flags
(can be 0..2) are masked by that (it has the value 3), the sharing mode
(in 0x00f0 bits), O_NOINHERIT (0x80), O_OPEN, O_TRUNC, O_LEGACY, O_LARGEFILE,
O_NOCRIT, O_SYNC, O_FCB - in other words, all 16 bits are in use!
Not sure, is our O_LARGEFILE support complete??
The O_OPEN/O_TRUNC/O_CREAT/O_LEGACY bits are all FreeDOS specific!
Comments say that O_LARGEFILE is "not implemented yet" and that the dos_open
flag 3 (O_EXECCASE, preserve case for redir. exec) value is not implemented
yet either, as is "flags & 7 = 4 means: readonly without modifying file access
time", not implemented either.
O_SYNC (commit all writes at once) and O_NOCRIT (do not invoke "int23" (** is
this a typo? int24 I assume? **) ...) are also described as "not implemented
yet".
Big question: Why do we have f_mode at all???
It can *only* contain the two O_ACCMODE bits in the current kernel at all,
and those bits are already present in SFT anyway.
The dm_flags structure of dmatch, which seemingly can be used for f_node
f_flags as well, contains only 4 valid bits: f_dmod, f_droot, f_dnew and
f_ddir. This is *mirrored* in the defines F_DMOD 1, F_DDIR 2, F_DDATE 4
to *some* extend. As f_droot is the same bit as F_DDIR and f_dnew is the
same as F_DDATE, but nothing relates to f_ddir! The F_DDIR define therefore
should probably be 8, *not* 2 !?!?
Luckily this POTENTIAL bug is triggered nowhere: only the words f_dmod,
F_DMOD, F_DDATE and F_DDIR are used to describe access to dm_flags and
f_flags. Conclusion: dm_flags should be defined as a BYTE with the bits
f_dmod, f_ddir and f_ddate (!), and the F_DMOD, F_DDIR and F_DDATE defines
should be kept in sync with the actual bit assignment. The names f_dnew,
f_droot and f_ddir are never used in 2035, although F_DDIR is a name for
what is now erroneously called f_droot, and F_DDATE is a name for what is
also called f_dnew for some reason.
Confusing! But can definitely be optimized to use only 3 bits of f_flags.
The f_mode field can probably be removed completely from f_node - or move
it to 2 of the remaining bits of the suggested f_flags BYTE.
Did I already mention that dm_dircluster should be CLUSTER typed? Probably.
(for some reason, dmatch always reserves 32 bits, probably it is supposed
to be of constant size? But it must be a FreeDOS-internal thing, why else
would it be allowed to contain the dm_flags bits which are only for f_nodes?
Strange!)
(Possible answer: the DTA can contain dmatch structures, e.g. for findfirst,
therefore dmatch *MUST* be of FindFirst data block (RBIL 61, table 01626)
structure: the dm_dircluster is - for MS DOS 5 - a WORD followed by 4
reserved bytes. The high 16 bits of the cluster (or an reserved word),
plus dm_flags inflated to a WORD as well, ensure that dm_attr_fnd is at offset
0x15 (dm_dircluster is at 0x0f). The RBIL tells that dm_size should be ULONG!
(actually DWORD but for some obscure reason FreeDOS has DWORD defined as signed
data type...) - In other words: dmatch is a FindFirst data block, *but* that
is alas not mentioned in the source code :-(. It should be. I recommend to
use 1 byte for dm_flags, followed by 1 reserved byte, *instead* of using the
bit field syntax. That, and using the F_DMOD / ... #define lines, ensures
*consistency* between dm_flags and f_flags, which should both be bytes...
And do not forget the "remove f_mode or merge to f_flags" idea...!)
Summary: Possible to reduce f_node size from 16 (general) + 32 (dirent) +
6_or_12 (cluster type values) = 54_or_60 bytes each by at least 4 bytes
(UBYTE f_count, merge f_mode and f_flags to single byte or remove f_mode
and use single byte for f_flags). You can save a 5th byte in FAT32 case
by evil tricks with f_cluster_offset being used as 3 byte value and a
6th byte in FAT32 case by even more evil tricks to squeeze mode and flags
into the top 4 bits of the cluster number, remember that they are reserved.
The reduction 54_or_60 -> 50_or_56 is straightforward and safe, while the
reduction to 50_or_54 is very evil.
Having tighter f_node and dmatch structures can also make the code easier
to maintain and points in the right direction, towards removing f_nodes
completely eventually.
In the SFT (typedef struct { ... } sft;) we have LONG for sft_size and
sft_posit. This will not work for file sizes above 2 GB, should be ULONG,
but take care to do the correct case distinction for "normal 2 GB" and
"LARGEFILE" file open modes - at least SftSeek and the functions called
by it, remote_lseek and dos_lseek, should be reviewed.
Especially regarding range checks and things, please check RBIL. I think
in non-LARGEFILE mode it was possible to seek to negative offsets for
some purpose. Relative seeks are in theory either signed or unsigned
depending on mode but I think it makes no difference in practice???
dos_lseek would have to change in return type to ULONG, and foffset argument
should be ULONG, too. Problem is that DE_INVLDFUNC and DE_INVLDHNDL can be
returned as well, so files of that "size" would cause troubles. Suggested
solution: Use the return value only as error flag, and change the foffset
argument to type ULONG * ...!
There are quite a few other places where a data type "FOFFS" would help
people to read code compared to "ULONG" (and sometimes "LONG", but I suggest
to get rid of the "LONG" cases!).
The whole xstructs.h file should only be used for FAT32 kernels, and use
CLUSTER instead of ULONG (and sometimes even DWORD, for nfreeclst and
cluster, because of "-1" handling there, same reason and symptom for
newrootclst and oldrootclst there...) ... Imagine the rest.
This will also help to check for FAT32-freeness of FAT16 kernels.
Hm... just noticing that GetBiosKey could HLT to save energy, but
then you would have to check for HLT being functional... And we
already had that discussion for "kernel FDAPM-APMDOS" earlier.
Only used in config processing anyway.
The DosGetFree function (remember my bug report...) calls
flush_buffers if called from FatGetDrvData (int 21.1c?), maybe this is
where it crashed? Otherwise it also is int 21.36 ...
Plus, how do we know that this does not get a NULL pointer? -->
if (*nc != 0xffff)
*navc = (COUNT) dos_free(dpbp);
That fake for 64k cluster size looks evil, too.
while (ntotal > FAT_MAGIC16 && cluster_size < 0x8000)
{ SHL cluster size and spc, SHR ntotal and nfree }
... seems to be a cluster size determination loop which should probably
NOT be used if the real cluster size is known! Especially for FAT32, I
am not sure if there should be a "simulation of bigger clusters" just to
keep programs which do not understand FAT32 happy for longer. I think for
FAT32, this call should use REAL cluster sizes, IF AVAILABLE, because:
*nc = ntotal > FAT_MAGIC16 ? FAT_MAGIC16 : (UCOUNT) ntotal;
... already clips the returned value anyway (although in a different way
than described in RBIL, I think). *navc is a clipped nfree, sounds right.
dmp->dm_size = (LONG) SearchDir.dir_size;
would have to be changed to ULONG or even better to no typecast but just ULONG.
COUNT DosLockUnlock(COUNT hndl, LONG pos, LONG len, COUNT unlock)
---- ----
... same for remote_lock_unlock and share_lock_unlock ... but how about
if (s->sft_shroff < 0) ...?
/* dos_getfsize for the file time */
wrong comment
/* dos_setfsize for the file time */
wrong comment
BOOL dos_setfsize(COUNT fd, LONG size)
should be ULONG, as it already is for dos_getfsize!
COUNT lfn_setup_inode(COUNT handle, ULONG dirstart, UWORD diroff)
should be CLUSTER dirstart
VOID dir_init_fnode(f_node_ptr fnp, CLUSTER dirstart);
already does use CLUSTER
Int 21.7403 should also typecast to CLUSTER, not to ULONG.
void far *DynAlloc(char *what, unsigned num, unsigned size)
...
if ((ULONG) total + Dynp->Allocated > 0xffff)
... should be possible to do that without having to use ULONG.
Only DosRWSft updates current_filepos in ... uhm ... list of lists. It is
not clearly stated that this is LoL or LoL-related, it only tells:
/* Globally referenced variables - WARNING: ORDER IS DEFINED IN */
/* KERNAL.ASM AND MUST NOT BE CHANGED. ...
... */
STATIC int LBA_Get_Drive_Parameters(int drive, struct DriveParamS *driveParam)
--> bug:
if (lba_bios_parameters.heads > 0xffff ||
lba_bios_parameters.sectors > 0xffff ||
lba_bios_parameters.totalSectHigh != 0)
{
printf("Drive is too large to handle, using only 1st 8 GB\n"
...
Explanation: totalSectHigh nonzero means either buggy BIOS (happens!) or
disk size above 2 TeraBytes! This does definitely NOT mean that you should:
...
goto StandardBios;
}
... not at all! Just use LBA and clip the useable disk size to 0xFFFFFFFFUL
sectors. The int 13.48 lba_bios_parameters.heads and lba_bios_parameters.sectors
values are NOT used by FreeDOS at all, so if they are wrong it should not
be a reason for FreeDOS to revert to CHS mode.
A warning about having > 64k heads, > 64k sectors per ... or > 4G total
sectors is, of course, appropriate, but reverting to CHS seems to be a
wrong and dangerous reaction to suspicious LBA parameters.
Ooops, I just see that CalculateFATData already uses the same logics
as Win9x, so my suggestion mail the other day was probably obsolete. Sorry.
Can somebody explain - in rwblock - what this does and why it uses UCOUNT?:
if (!(fnp->f_flags & F_DDIR) && /* don't experiment with directories yet */
boff == 0) /* complete sectors only */
{
static ULONG startoffset;
UCOUNT sectors_to_xfer, sectors_wanted;
...
Probably it reads up to 2k sectors (1 MB) at most, so it should be okay...
... it even seems to read at most 64k BYTES, even better ...
sectors_wanted = to_xfer;
...
sectors_wanted /= secsize;
... can this be optimized to use a shift instead? 16bit>>value -> 16bit.
secsize is taken from dpb not too long before it is used at that place.
secsize = fnp->f_dpb->dpb_secsize;
And, same function, same idea:
sector = (UBYTE)(fnp->f_offset / secsize) & fnp->f_dpb->dpb_clsmask;
boff = (UWORD)(fnp->f_offset % secsize);
...
fnp->f_offset += sectors_to_xfer * secsize;
...
fnp->f_offset = startoffset + sectors_to_xfer * secsize;
...
xfr_cnt = sectors_to_xfer * secsize;
... (those are all 16bit*16bit -> 32bit multiplications, could be shifts) ...
I saw the compile time option WRITEZEROS, could that be extended to
be supported by shrink_file (or maybe in rwblock itself?) ...?
[?1;2c[?1;2c[?1;2c
I already asked earlier... how is this seleted by the compiler: --> ?
#if defined(WATCOM) && 0
ULONG ASMCFUNC FAR MULULUS(ULONG mul1, UWORD mul2); /* MULtiply ULong by USh
ort */
ULONG ASMCFUNC FAR MULULUL(ULONG mul1, ULONG mul2); /* MULtiply ULong by ULo
ng */
ULONG ASMCFUNC FAR DIVULUS(ULONG mul1, UWORD mul2); /* DIVide ULong by UShor
t */
ULONG ASMCFUNC FAR DIVMODULUS(ULONG mul1, UWORD mul2, UWORD * rem); /* DIVid
e ULong by UShort */
#endif
Probably not at all, there is an && 0 ...
Reading LBA_Transfer now, very interesting... play_dj (TODO: should call
the int 2f interface which allows to suppress the message!!), int 1e stuff
and disk reset, track wrap avoidance for CHS, DMA wrap avoidance -->
** You should be allowed to disable DMA_max_transfer check for non-floppy!?
** If buffer is 64k aligned, limit should be NOT 63.9k but whole 64k, I think?
Or are DMA transfers limited to 0xffff byte size?
Nice to see that if UMB or HMA is the buffer location, access is pipelined
through the single low disk transfer buffer. I hope people know that they
should not loadhigh things which allocate larger buffers in the hope to gain
speed.
** You should be allowed to disable track wrap protection for non-floppy!?
Suggestion: Have a SYS CONFIG option "smart harddisk BIOS", defaulting to
disabled, which makes harddisk access free from DMA checks and track wrap
checks.
If FreeDOS eventually moves DMA and track wrap checks to an int 13 handler,
the LBA_Transfer function tests should be moved there, otherwise you try to
workaround the same things twice (at least the workaround will trigger only
once per call, so at least there is no danger of deadlocks or double-use of
the disk transfer buffer??? Would also have the ADVANTAGES that disk change
detection can be made more reliable and int 2f.13 can be supported that way.
Wow. That was huge. I hope it also helped to find some potential
bugs and optimize some things! Looks like most but not all 32bit
divisions can be of the simpler 32:16=16 type or replaced by shifts,
but of course such stuff is hard to find by reading C source codes...
One thing would be replacing DWORD/UDWORD by LONG/ULONG respectively to
improve overview, but more important seems the abovementioned consistent
use of ULONG for file sizes and offsets (watch out for exception in lseek)
and CLUSTER for clusters. Even introducing new types for "file size or
offset value" and "sector number value" would be an idea, but I think this
would be a bit pointless because if you were to change that (for support
of bigger disks or files), DOS compatibility would get lost at many places.
So the type can just as well stay hardcoded: ULONG for both file sizes and
offsets and sector numbers. You could conceivably tell "all file sizes and
offset values are ULONGs and all sector numbers are UDWORDS", though ;-).
I hope your compiler has that "unsigned var can never be < 0 in comparison"
warning. Otherwise you can easily mess up things without noticing. But
eventually we *need* the ULONGs, for large file (2..4 GB) support!
If you reached the end of this mail, pat your own shoulders 8-).
PS: DosGetFree/FatGetDrvData: get_cds is NULL if > lastdrive, dos_free
is hopefully only called for formatted disks (and surfs the whole FAT
unless there is buffered knowledge about disk usage), leaving media_check.
This does rqblockio/C_MEDIACHK, if not changed: success, if unknown: yet
another hazardous flush_buffers call, and then or if changed: assume invalid,
do rqblockio/C_BLDBPB and finally bpb_to_dpb.
flush_buffers uses dskxfer for dirty buffers (hey, that could be used for
delayed writes ;-)). And dskxfer FAILS with error 0x201 if there is no
dpb for that disk. AND dskxfer, it seems, ALWAYS can trigger block_error
critical error handling (where abort and fail are "the same" and continue /
ignore is always allowed to PRETEND successful disk access...)
Maybe the DosGetFree/FatGetDrvData handler should manipulate the
pddt->ddt_descflags DF_NOACCESS bit??
STATIC WORD getbpb(ddt * pddt)
...
/* pddt->ddt_descflags |= DF_NOACCESS;
* disabled for now - problems with FORMAT ?? */
/* set drive to not accessible and changed */
if (diskchange(pddt) != M_NOT_CHANGED)
pddt->ddt_descflags |= DF_DISKCHANGE;
ret = RWzero(pddt, LBA_READ);
if (ret != 0)
return (dskerr(ret));
...???
(RWzero just calls LBA_Transfer with DiskTransferBuffer etc.)
STATIC int rqblockio(unsigned char command, struct dpb FAR * dpbp)
...
if (command == C_BLDBPB) /* help USBASPI.SYS & DI1000DD.SYS (TE) */
MediaReqHdr.r_bpfat = (boot FAR *)DiskTransferBuffer;
...!?
rqblockio -> ALWAYS uses critical error handler for execrh problems
(block_error...). Something in there, rdblockio or getbpb or LBA_Transfer
or DosGetFree/FatGetDrvData or ??? should 1. avoid shooting itself in
the foot, by temporarily modifying the DF_NOACCESS value and 2. return
error 7, unknown media type, WITHOUT allowing user interaction, if the
boot sector is found to have non-DOS contents.
Phew. I hope somebody can use all those suggestions / comments / ...!
Thanks :-).
Eric
-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Freedos-kernel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/freedos-kernel