>The statement was that L4 was getting this right. Neither the L4 kernel nor
the Coyotos kernel use the XMM registers at all aside from saving and
restoring them in context switch. They are compiled to avoid floating point
as well.

 

I didn't specifically say L4 was right just that they were originally mostly
assembly and have removed most of the assembly for speed code  ( eg and
nearly all the ipc code)   , which is allowing more portability  and yet
libs are going the other way. By wrong I meant they were going in opposite
paths yet in both cases performance and portability are important. 

 

"It seems wrong that all the guys working on the L4 series OS have got it to
the point where they removed  ( most of ?) the asm and made it more portable
while the libs are going the other way and we are seeing more assembly."

 

 

 

 

 

>Contrary to what was said earlier in the thread, the block zero
instructions on the scalar unit are quite good on x86/x64.

 

The only zero instruction I'm aware on is on the SSE registers ( though you
can obviously use stosd 0).  For set I  go by 

0-2K  mov with 4* loop unroll

2-32Kb  MMX/SSE with 8* unroll

32kb-2-3M (cache)    rep stosd

2+ or 3Mb+            MMX/SSE with non temporal stores

 

The above are for fastest memory write but the non temporal stores have the
advantage of not filling the cache.

 

This comparable speed non SSE only applies to stores ( which I'm not sure we
discussed we talked more about copy)  , for copy ( read and store) or store
and read ( buffers) SSE is nearly always better especially on Core2 and i7.
And with new 256 bit ymm this will continue to be the case. 

 

Eg for a circ buffer ( 16 byte aligned)  I have ( a store and read) we can
see SSE being better at 52 bytes ( 4 header and 12 ints) .. ( mem copy uses
SSE over a certain size) and significantly (45% reduction)  better at 820
bytes

 

>Though nowadays the SIMD and Float registers are so universally used that
the case for avoiding them in the kernel is pretty weak.

 

Agree.  Though rather than SIMD and Float registers for x86_64 ( not Arm and
PPC)  I think it's better to consider them as 128/256 bit GP registers
(there is only a few things they can't do and these are rare) . The question
whether you need a 128 bit GP register besides load and store is a valid one
, building 16 byte messages in registers , 16 byte flags , for bit ops and
memory scanning ( eg GC)  128 bits are useful and they can do things
standard registers cant eg store byte 11 or short 7 , store via mask etc .
Since the c language doesn't support _128 there is not much opportunity to
try it without asm or intrinsic ( eg high barrier to use)  can/will BitC
down the track  ( and can the compiler understand) ? 

 

Ben 

<<attachment: winmail.dat>>

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to