enwik9 has a vocabulary of 1.4 million words. A bigram matrix would have 2
trillion elements. But most of those would be 0. The vocabulary has a Zipf
or power law distribution. Half of the words occur only once. 90% less than
10 times and 99% less than 100 times.

Unfortunately GPUs aren't very good at handling sparse matrices. Following
pointers isn't parallel and requires random memory access, which is 50-100
times slower than sequential. Natural neural networks do this well by using
axons and synapses to represent the nonzero elements (sort of), and use
10^-5 as much energy per operation. AGI is going to require a type of
hardware optimization that will be hard to solve with transistors.

On Fri, Oct 22, 2021, 9:16 PM <[email protected]> wrote:

> It's actually possible my "word2vec" I made is as efficient and accurate
> IF I only store the top 5,000 relations to each other, instead of all
> 50k<>50K.....perhaps word2vec gives all an embed but each embed is not as
> dimensionally long as could be and so each is suffering due to not being
> able to have ex. 10,000 dimensions.
>
> The main thing about attempting my code is that it may be simpler to work
> with. If GPT cannot be made ~400 lines of Python, then it might be a overly
> complex algorithm.
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/Tc124b3d00b83e897-M4c0e71ab19fbea236e56ada5>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tc124b3d00b83e897-Ma5812558c0ca9237bf2e71f9
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to