--- Dennis Gorelik <[EMAIL PROTECTED]> wrote:

> Matt,
> 
> > Using pointers saves memory but sacrifices speed.  Random memory access is
> > slow due to cache misses.  By using a matrix, you can perform vector
> > operations very fast in parallel using SSE2 instructions on modern
> processors,
> > or a GPU.
> 
> I doubt it.
> http://en.wikipedia.org/wiki/SSE2 - doesn't even mention "parallel" or
> "matrix".

It also doesn't mention that one instruction performs 8 16-bit signed multiply
accumulates in parallel, or various other operations: 16 x 8 bits, 8 x 16
bits, 4 x 32 bits (int or float), or 2 x 64 bit (double) in 128 bit registers.
 To implement the neural network code in the PAQ compressor I wrote vector dot
product code in MMX (4 x 16 bit for older processors) that is 6 times faster
than optimized C/C++.  There is an SSE2 version too.

> Actual difference in size would be 10 times, since your matrix is only
> 10% filled.

For a 64K by 64K matrix, each pointer is 16 bits, or 1.6 bits per element.  I
think for neural networks of that size you could use 1 bit weights.


-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=71210692-be60c4

Reply via email to