--- Dennis Gorelik <[EMAIL PROTECTED]> wrote: > Matt, > > > Using pointers saves memory but sacrifices speed. Random memory access is > > slow due to cache misses. By using a matrix, you can perform vector > > operations very fast in parallel using SSE2 instructions on modern > processors, > > or a GPU. > > I doubt it. > http://en.wikipedia.org/wiki/SSE2 - doesn't even mention "parallel" or > "matrix".
It also doesn't mention that one instruction performs 8 16-bit signed multiply accumulates in parallel, or various other operations: 16 x 8 bits, 8 x 16 bits, 4 x 32 bits (int or float), or 2 x 64 bit (double) in 128 bit registers. To implement the neural network code in the PAQ compressor I wrote vector dot product code in MMX (4 x 16 bit for older processors) that is 6 times faster than optimized C/C++. There is an SSE2 version too. > Actual difference in size would be 10 times, since your matrix is only > 10% filled. For a 64K by 64K matrix, each pointer is 16 bits, or 1.6 bits per element. I think for neural networks of that size you could use 1 bit weights. -- Matt Mahoney, [EMAIL PROTECTED] ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244&id_secret=71210692-be60c4
