[Freedos-devel] Re: Re: fast memcopy and other optimizations / ideas

Eric Auer Fri, 06 May 2005 06:30:30 -0700

Hi Michael,

> <massive snippage>
> Is there a question in all that?


No. This is why I conclude my mail with:

> maybe we better move the EMM386 optimization stuff to off-list?
(directed to Arkady)

> You have missed the point of proper optimization.  Algorithms optimized 
> first, only where they matter.

Yes. Sure. In particular the allocator might be affected by that. Looping
through 1000+ table entries and checking the 48 byte bit string of a non-
small number of them can take quite a while, in particular if you have to
do it for each of 1000+ "get VCPI 4k page" calls from a DOS extender which
for one reason or another prefers VCPI over XMS for allocation.

But it is hard to tell how many table entries are scanned per alloc call
on average if you call VCPI alloc page 1000 times in a row. Yet again,
that can make the difference between linear and quadratic relation of
"number of VCPI pages allocated" to "time consumed". I believe that my
relatively slow 500 MHz K6-2 does show some noticeable CPU load for alloc
(e.g. small pause / fans going faster), but I do not understand the alloc
algorithm well enough to really tell. This is why I am tempting Arkady the
optimization expert to have a look ;-).

> The optimization is the one that tries to align EDI to an 
> eight-byte boundary before the main REP MOVSD.  And that optimization only 
> makes sense because once you align EDI, you commonly align ESI along with 
> it, at least in the three areas to be optimized.

Very interesting. I think you cannot optimize for that (even though it
would allow fast access bursts) because that would require the move
distance to be a multiple of 8 bytes. However, if the distance is FOUND
to already be a multiple of 8 bytes, extra code (in the EMM386 int1587
handler and in the HIMEM memory copy function) could take care to do up
to 7 MOVSB before doing the main REP MOVSD. Reasonable overhead and even
quite good chances that it will often be used! All allocations are at
multiples of 1kB (XMS, EMS, VCPI), and many programs like DOS extenders
and RAM disks grab and move contents in chunks which are both aligned
to N*16 byte boundaries as well as having a size of M*16 bytes...

But then, that case already DOES use the fastest REP MOVSD without any
extra work from EMM386/HIMEM! So you are right. Optimization would be
much ado about nothing here, because it cannot really boost performance
at all if the caller wants to do a non-aligned-in-all-ways movement /
because performance already is optimal if the caller wants a perfectly
aligned movement anyway :-|. QUESTION related to that: Does the FreeDOS
kernel make sure that BUFFERs and the deblocking buffer (1 sector buffer
in low DOS RAM to avoid having to transfer to/from UMB or HMA) are
nicely aligned?

> Note that carrying around multiple memory copy functioins in EMM386 and 
> testing CPUs with dynamic configuration to the appropriate version of the 
> memory copy isn't worth the hassle and extra EMM386 code.

Almost agreed - but aligning the rep movsd instruction itself to a
multiple (IP wise) of 4 or 8 would still be a good idea, to make sure
that it does not wrap around the edges of a cache line. Affects the VDS
TRANSFER_BUFF, CHK_CHANNEL, the EMM386 SIMULATE_INT1587, the EMS 4.0
(less important) ems4_memory_region, HIMEM xms_move_xms, and would be
easy to implement and test (for whether it improves speed) for all the
TASM owners out there ;-).

I know that chances are relatively low (rep movsd is 3 bytes opcode
if you are in a 16bit CS, but cache lines are N*16 bytes big), but you
know Murphy's Law - what can go wrong will go wrong. 5 explicit ALIGN 4
or ALIGN 8 commands in the source code give nice and straightforward
protection here.

Eric



-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
_______________________________________________
Freedos-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freedos-devel

[Freedos-devel] Re: Re: fast memcopy and other optimizations / ideas

Reply via email to