Enwik9 (Large text/Hutter prize) has a vocabulary of 1.4M words. Your word pair matrix will have 2 trillion parameters.
The usual way to implement a word pair semantic model is LSA (latent semantic analysis) of the most common 20K words. It is a 20K to about 200 to 20K neural network where you add neurons to the hidden layer continuously while training. I didn't implement it in PAQ because I couldn't figure out how to make it fast enough. On Sun, Aug 22, 2021, 5:39 PM <[email protected]> wrote: > Keep in mind my program is combing thoudands of contexts and does so by > combining percentages!!!!!!!!! :[[[[[[ !!!!!! > > This means every relation in ITS BRAIN like dog = cat .....dog = man . > ........ book = store .......... pump = lock ......... pump = push > .......... push = throw ........push = twist................ > > need, percentages!!! Dog doesn't = man, it is only "similar", ex. dog = > man 78%. > > I cannot do this faithfully, only my brain know how similar pony is to > woman.... > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T192296c5c5a27230-Mc95e416c20e9bfd91039f6d7 Delivery options: https://agi.topicbox.com/groups/agi/subscription
