Hi,
On Mon, Jul 23, 2018 at 1:15 PM, Bret Johnson <bretj...@juno.com> wrote:
>
> Multiple issues. First, if you're going to do it in "pure C", you can't
depend on anything like MMX.
MMX is deprecated in lieu of "better" SSE2, though. Of course, that demands
P4/AMD64, but most people have those by now. (I hate to be that guy, but
the war is lost, nobody cares about anything less these days, and most
aren't sympathetic to IA-32 anymore either in lieu of AMD64.)
You can of course have fallbacks for both MMX and generic, whether separate
files (modules) or #ifdef. So you can support both if you do the extra
work. But it's only (usually) worth it for a (very) small number of
routines.
PAQ8 sped up literally twice as fast by rewriting two functions for MMX,
which barely made up a few hundred bytes once assembled. Even including all
i386, i686, i786 versions of those two functions, detecting cpu at runtime,
didn't add noticeably any bigger .EXE size. For me, that was an obvious
"win", where everyone was supported instead of excluding others for vain
speed reasons. (But it was fairly slow overall, I'll give you that. 7-Zip
is a better compromise in speed and efficiency.)
Also, there are "intrinsics" (macros in headers?) that many compilers (e.g.
Intel, GNU, MS) use to allow access to such instructions. Or you could just
use some third-party library that abstracts it all away for you (GMP?).
> You're going to need to virtualize the
multiple-byte-functions-at-the-same-time manually,
> taking advantage of CPU and data storage characteristics (little-endian,
two's complement, etc.).
> That pretty much defeats the purpose of sticking with "pure C".
Pure/portable/strictly conformant C is only good for programs/utilities
that people actually could benefit from using in different environments,
e.g. 7-Zip. If you're targeting DOS or VGA specifically, then being
portable is only good for maybe supporting other DOS compilers (for
whatever minor benefits).
> What you're trying to avoid is (conditional) JMPing and
multiplication/division, since they are costly
> in terms of speed, even though they will work just fine.
Division can sometimes be avoided by doing multiplication of the inverse. I
don't know the math behind it, but many articles and people have talked
about it before, so I assume you recognize what I mean here.
> You are probably also going to want to minimize the number of loops,
since loops are also a type of JMP.
> But, in modern CPU's with caches and branch prediction and pipelining and
similar enhancements,
> loops generally aren't that bad in terms of overall speed.
486s were the usually first ones that had very small internal caches on the
cpu itself. The 486 was pipelined, unlike the 386, so yes it was faster
(but it was very sensitive to code and data alignment). But the Pentium was
the superscalar one (U and V pipes), which could be much faster with
correct scheduling of instructions to pair properly (e.g. GCC 2.8.1). Out
of order instructions didn't come until 686, I believe (4-1-1 micro-ops?),
and even that had to be tuned a certain way. The term "blended" means your
generic code provably works well enough for all target cpus (no obvious
penalties). If that still isn't good enough, you have to determine the
appropriate cpu and run specific code (for a very few select routines that
matter, after profiling) via function pointers. Even GCC itself has
supported "-march=native" since 4.4.0 or such.
> Any kind of speed or size optimization you do in C (whether it's the
compiler doing the optimization or
> you doing it manually) again depends on specific CPU characteristics and
features, and again defeats
> the purpose of using "pure C".
Just assume the compiler sucks (because it probably does). It doesn't mean
they're all bad or that it doesn't have some virtues. But overall compilers
don't know much (or assume wrongly). If you want speed, you have to do it
yourself. It won't be handed to you on a silver platter.
P.S. Avoid (186) ENTER/LEAVE, they are much slower on new machines than the
equivalent 8086 code.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-devel mailing list
Freedos-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-devel