I've been playing with the notion that syntax and semantics are simply opposing directions in a Grangeresque high order push down automata grammar. HOPDAs have the virtue that their quasi-context sensitivity lends them to what I like to call UTM fictions that we all indulge as we program computers.
On Fri, Jan 23, 2026 at 12:36 AM <[email protected]> wrote: > > https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5/preprocessor-for-hutter-prize > Good messages here. I just saw them now. Still need to read them. > Yes (lol) water water is rare, I also have mine lowering there right after > it sees a word to prime it stronger. > > Matt here is how my semantic model works below. Is mine better or worse > than yours? > > My current implementation just makes a 2 byte-depth tree. The 1st breadth > (the first root bytes) are the word on the right side of words (ex. "dog > sleeps") (that means I took every 2 words in the dataset and switched their > order just so i could store them like this: sleeps dog). This makes > building the word relations faster because the root starts with the shared > proofs first. Now: after the trees is built, I have a list that has 256 > lists, each of these lists has 256 zeros. This is the relation score > 0.0-1.0 for every byte to every byte. Now, I check every root byte: Say we > have "sleeps": if we have dog, cat, horse, human after this byte in the > mini tree, i go to other thing we made also - the 256 lists we made, and i > go to therefore dog, cat, horse, and human since they are what we have at > hand after this root byte. I give each to each a score. Now, what is that > score is calculated this way: How many counts does dog have? 550? How many > counts does cat have? 1000? If dog is lower, I normalize it. In this > example dog is about 2x more rare in the dataset. So what I do is I check > also: how many do they share here in the mini tree? We have in the mini > tree "sleeps dog" x2 and "sleeps cat" x10. So what I do is, if dog is lower > in counts total, I up the "sleeps dog" artificially from 2 to 2*2=4. So now > we have 4. Dog and cat share 4. So then I store this in dog's list, and > that is stored like this: 4 / total counts. Because if dog and cat share 4 > but dog has 1 billion counts in the dataset (so does cat after > normalization), then while they share 4, they really share almost nothing, > because if they shared 100% then they would share 1 billion as well. So it > is stored as a PART of the final amount. We come back to dog's list in the > NEXT proof. If dog and cat share more proofs, we will add more score on top > the score we just saved. We only did 1 part so far I'm saying. Lastly, I > also give proofs downranking if they are too common, this improved the > score. Rare words help prove 2 words are related, while common words have > much less effect. > > I just realized something (?): I read a lot about Transformers, but no one > seems to have explained something I just understood tonight: The token > embedding step (after Byte Pair Encoding (Token IDs)) is NOT only related > word dimensional vectors, the dimensional vectors also store the word's > syntax. So "cat" can be in a "space" near animal (semantic) related words, > but this vector can also have "info" and have simultaneously the word "cat" > near other words that are syntax words (not related words). So then, as > these embeds flow up the Transformer, it is adding info, making the last > vector of the User's Prompt (apparently only that vector is used to get the > next words after all computation is done) more and more specialized and > clearer, until at the last place it is unembed and this vector is therefore > used to check against every word vector in a vocab, and so if it was had > 100 animal names to the left of it in the prompt, it will predict those, > not a next word, but a related word, while if it didn't have animal words > much next to it and more syntax sentence flow words, it will predict such > one of those type, correspondingly of course based on the sentence. So, > it's not doing priming "and" syntax conditionalism "and" related words > "mechanisms", it's just doing dimensional vectors that have both types > already > > hmm let's compare: > > my way: > 1. build syntax tree > 2. build semantic tree > 1. translate last word(s) to get next words (already includes without > translation, for to search the tree) > 2. translate last words to vote on next words > > their way: > 1. build dimensional syntax vocab vectors > 2. also is building into those dimensional vectors the semantic, using > separate method still > 1.&2.&nope???: but then it does seem to do a few "more things here, it is > using self attention (each word looks at each word) which allows it to make > the last token word's vector "become" either a semantic thing (if all the > words are just related words, this makes sense and is ok) or a syntactic > word (if not related words are to the left of it), yes it's doing priming > and next word prediction in 1 go but then they are also after this self > attention sending the vectors up into a FFN and stuff like that which is > another step. Idk why now, exactly. > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T2d9ee7e1ee2cd20c-Mbc6d3e00e545d91e44c7e253> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T2d9ee7e1ee2cd20c-M81c5b458835058372484bf81 Delivery options: https://agi.topicbox.com/groups/agi/subscription
