Somehow the CC of the following never made it to the list. Here it is again.
On Tue, 2005-08-23 at 11:30 +0900, Carsten Haitzler wrote: > On Mon, 22 Aug 2005 19:43:58 +0000 Tiago Victor Gehring > <[EMAIL PROTECTED]> babbled: Lots of people said..... and then raster said: > actually do tests - you may find the unaligned copies not that much slower as > traditionally x86 hw has always done the fixups for unaligned read/writes in > hardware and thus the overhead is fairly small. Tests are needed and so is some discussion. In relation to the above topics: Mornin' all, Okay, here's the deal. I'm going to talk about some stupid shit that everyone already knows and then you guys can to call me an idiot, jerky, or whatever. Raster et. al, please correct me if I am wrong and my apologies for the review to everyone. To review, okay the problem is that the hardware needs to be accounted for. It is impossible to load a byte from RAM into cache just like you can't read a byte from the hard disk. We all know that you can read( ?, ?, 1 ) but we also know that when the kernel gets the call it reads a block, returns the character asked for and holds the rest in a buffer. That is what happens in hardware. The chips have all the wires coming into them and their controller chip watches them and upon the proper signal, ALE, it will respond. The Address Latch Enable (ALE) means that the CPU is asking for the memory at this address and is actually a wire on the cpu/mobo and the voltage hits 5V (in my day, now ~2.5) and that current hits the memory controller chip and causes it to start its cycle (ANDed with the timer wire so it starts at the next clock tick). That memory controller interconnects RAM and L[123]_Cache and they all have to deal with ram in chunks/pages/lines/etc and the number of bytes in a chunk is always a function based on the number of wires internally/externally (ie. 386sx = 32bit chip on 16bit bus). Add all this shit up, and basically what you have is memory that is byte addressable on the software level but much more complex on the hardware level. When the CPU asks for a byte of memory the controller will return a line that contains the byte in question. When the CPU asks for a value that is multiple bytes it will be returned in one or more cache lines of memory. If the CPU can be assured that the data in question is aligned with the same alignment as the storage location is then it can manipulate the wires once to move the data. If it is not so aligned then it must lode the data in 2+ chunks. The plus is mooted by the instruction set (it doesn't handle data types larger than the register size to/from the SIMD core). This data move can be done in a single tick (to/from registers and L1 cache) so it is a waste of a cycle to check for alignment because you could have moved the unaligned data by then. The two cycles are contingent upon the fix-ups that raster mentioned above being efficient and I'm not sure exactly how the hardware does it. It could roll the address of the source wire to achieve alignment or simply take two cycles and move the data in pieces. It is therefore appropriate for us to check once at the beginning of the image. The preferable solution is to guarantee alignment upon entry otherwise we are going to have to use unaligned memory moves or have two pieces of code. In order to achieve an alignment guarantee we need to control how *image is created and ensure that it is a pointer that fits "if ( image % alignment ) then do unaligned_stuff". This is a function of the compiler and other things and can be accomplished with the __align__ operator in C and the .align directive in asm with the GNU tools. I haven't investigated all of the possibilities here so I know that the functions exist but am not entirely positive of the calling convention nor implementation. Anyway, all of the image_load type of functions (or the one image_create ,or whatever it's named, that is called by all others) need to be rewritten to ensure alignment and we would still need to check for the possibility of a user created image that gives an unaligned pointer to the pixels. In order to avoid the sigv we would have to check for alignment and maybe call a function with unaligned moves, re-align the data, or error out in that case. I'm not real familiar with the imlib2 code, and more importantly, how it is used, so that is why I'm mentioning things like this. For those of you that know the internals, what do you propose? the "works for all" solution is to just use unaligned memory accesses. The "faster than all others" is going to need fully aligned memory, pre-fetched caches (already in there), and most of all predictability. Comments, please... Cheers, The River Rat ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel