On Fri, May 3, 2013 at 1:48 AM, Eric Auer <e.a...@jpberlin.de> wrote:
> please explain the hack / patch: Is the only thing that
> you changed that the kernel is compiled for those CPUs?
> Are there actually any differences between them? I can
> imagine that OpenWatcom makes 186 and 286 the same and
> everything above 386 the same.

The person to really ask would be Bart, but (IIRC) he's busy these days.

IIRC, wcc.exe is the 16-bit compiler and wcc386.exe is the 32-bit one.
I'm honestly not sure if wcc.exe supports any 386 stuff at all (at
least not 32-bit segments). IIRC, wcc386.exe only "tunes" code for
higher processors, so everything "should" still work for 386, even if
using -6.

A very quick check on a very small file shows no extended (E[ABCD]X)
registers used at all with "wcc -3 -za", only like two instances of
minor stuff, e.g. "shl bx,2" (aka, 80186). Hmmm, I do see a "movzx
ax,byte ptr _blah" in there, too, and "sete al" (both 80386). And some
(extended) "imul" stuff (186? 386?).

I'm no cpu expert, but very little (if anything) is to be gained with
486 or 586 opcodes vs. plain vanilla 386. The only optimizations worth
doing, IMO, are simple register reordering and avoiding pipeline
stalls. In other words, going from 8086 to 186 would show some minor
improvements. 286 is just introducing pmode, so pure calculations
probably won't show any difference unless dependent upon RAM. 386 to
486 to 586 instructions are basically all the same (minus the order of
instructions). Most compilers don't use BSWAP, XADD, CPUID, RDTSC,
etc. 686 adds a fair bit stuff but doesn't always help speed (e.g.
CMOV). Actually, even some 186 stuff (ENTER, LEAVE) or string
instructions (STOSB, SCASB) can be slower than simpler alternatives
after the 486, so it's often (but not always) avoided by compilers,
e.g. GCC. 32-bit registers "usually" do 32-bit math faster than
similar 32-bit calculations in 16-bit. But a lot of other stuff can
affect speed too (calling convention, libraries, OS calls, malloc).
(Besides, too many variants of x86 these days, hard to target any one
effectively. Probably best to not worry about it unless direly
important or prepared to profile heavily.)

BTW, I'm not sure if I consider it wise or worth the effort to compile
a 386 FreeDOS kernel. Actually, IIRC, the RUFUS USB installer uses a
386 build, which although not out of the question (as anything with
USB support is most likely 386+) is probably totally useless (and bad
if copying to older machines, which again ... probably rare, but ....)

> Unless the kernel would
> contain heavy mathematical processing for which it is
> obvious that above-386 optimizes better ;-)

I would highly doubt it, but I've not studied the kernel in depth. You
know 1000x more than I do, Eric. I think most kernels try to avoid
FPU, but here I assume you just meant general integer stuff. Are we
targeting real 8086 or just 8086-compatibles? In other words, what's
faster on an 8086 might be (relatively) slower on a 486 than different
ways of doing the same thing.

> You could tell the compiler to produce Assembly output (instead
> of binary) and compare the text.

wcc.exe only outputs .OBJ directly (for speed?), so you have to use
wdis.exe on the resulting .OBJ for (dis)assembly.

> Or you could use some
> debug, disassembler (ndisasm?) or hex editor to compare
> before you UPX things, but of course that is more work.

BTW, just because it UPXes smaller doesn't mean much. For one, you
can't really predict what will compress better, and it always changes.
Secondly, the main other thing to be concerned with is cluster size
(and waste). So a 33 kb UPX'd kernel using 64 kb of "actual" storage
because of 32 kb clusters would be inefficient, but 45 kb not so much
(or at least harder to fix).

> Thanks for comparing :-) Maybe this is more a topic for
> the kernel list. Note that if "a few" bytes are really
> only 10 or so, all this is probably more an "academic"
> exercise. Things get more exciting once you can save at
> least a cluster of disk space or a paragraph of RAM :-)

BTW, I did an F5 the other day to see how much RAM a clean boot would
use. It still gave me approx. 500 kb (where as normally with only XMS
I get approx. 600-620 kb, depending on TSRs loaded). That's hardly
what I'd call terribly bloated (despite one guy's complaint recently).
I'm sure it could be shrunk more, esp. if you don't need the full .BAT
language (GCOM, anyone?).

>> I hacked the 2041 kernel batch and make files included on the FD 1.1 iso to
>> allow the kernel to be built by OpenWatcom as 8086, 186, 286, 386, 486,
>> 586, or 686.  The resulting 686 kernel boots fine in VirtualBox 4.2.12 in
>> OSX 10.8.3 on my 2012 Mac Book Air 13" 4GB.  The resulting kernel is a few
>> bytes smaller compressed by upx than kernel installed by the FD 1.1 iso.
>>  I'm going to continue testing.  No source changes were made.  Not sure how
>> the changes affect the nasm built files.

There aren't very many NASM files, IIRC, only like two. And since
OpenWatcom doesn't call NASM (nor WASM) for its inline assembly or
"pragma aux", it won't affect those. Depending on your NASM version,
you can't really change the cpu output, but if you find an option
(e.g. "-O9v" or "-Ox" [default in latest]) you want to globally use,
either "set NASM=-O9v" or "set NASMENV=-O9v". (Can't remember which is
latest, probably just plain %NASM%.)

There's nothing preventing anyone from hardcoding higher (e.g. 686)
instructions in the kernel (preferably with CPUID testing so not to
bork on older cpus). But it's not likely to gain much (without heavy
profiling first to find the bottlenecks). Though FPU / SIMD is almost
certainly out of the question. (What else is there? There's too many
instructions these days, e.g. BMI, RDRAND, FMA4. Blech.)

Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
Freedos-user mailing list

Reply via email to