enwik9 has a vocabulary of 1.4 million words. A bigram matrix would have 2 trillion elements. But most of those would be 0. The vocabulary has a Zipf or power law distribution. Half of the words occur only once. 90% less than 10 times and 99% less than 100 times.
Unfortunately GPUs aren't very good at handling sparse matrices. Following pointers isn't parallel and requires random memory access, which is 50-100 times slower than sequential. Natural neural networks do this well by using axons and synapses to represent the nonzero elements (sort of), and use 10^-5 as much energy per operation. AGI is going to require a type of hardware optimization that will be hard to solve with transistors. On Fri, Oct 22, 2021, 9:16 PM <[email protected]> wrote: > It's actually possible my "word2vec" I made is as efficient and accurate > IF I only store the top 5,000 relations to each other, instead of all > 50k<>50K.....perhaps word2vec gives all an embed but each embed is not as > dimensionally long as could be and so each is suffering due to not being > able to have ex. 10,000 dimensions. > > The main thing about attempting my code is that it may be simpler to work > with. If GPT cannot be made ~400 lines of Python, then it might be a overly > complex algorithm. > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/Tc124b3d00b83e897-M4c0e71ab19fbea236e56ada5> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tc124b3d00b83e897-Ma5812558c0ca9237bf2e71f9 Delivery options: https://agi.topicbox.com/groups/agi/subscription
