Re: [agi] Re: Thursday, March 25, 2021 Constructing Transformers For Longer Sequences with Sparse Attention Methods

Matt Mahoney Mon, 19 Apr 2021 11:41:04 -0700

On Mon, Apr 19, 2021, 1:08 PM Jim Bromer <[email protected]> wrote:


> Because I have been studying a little ML and DL in a TinyML course for
> using DL for microcontrollers (Simple sensors and activators for IoT kinds
> of things) I am starting to read more about DL. I have studied a lot of
> mathematics but I do not remember most of it and there are a lot of things
> that I never studied. So now that I, at the least, have the beginning of an
> intuitive clue as to what the ANN and DL guys are talking about I am
> starting to be able to pick up on their writing.  For one example, I had no
> idea why they kept mentioning matrixes since NNs are not doing matrix
> multiplication as far as I could tell.
>

Yes they are. Most of the computation has the form y = Wx where x and y are
the input and output vectors and W is the weight matrix. The rest is
adjusting W to reduce the output error, which is mostly a multiplication of
transposed vectors yx added to W.

Whether you understand transformer networks or not, the top ranked program
on the large text benchmark uses that algorithm running on a GPU.
http://mattmahoney.net/dc/text.html

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T68be2fedf1f53ef2-M91c382a78f6802f44a2ce216
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] Re: Thursday, March 25, 2021 Constructing Transformers For Longer Sequences with Sparse Attention Methods

Reply via email to