thinking of language model challenge just a little like, how do you shrink a language model without doing any of the things that would make it shrink?
other things we could avoid might include: parsing sentences, making knowledgebases how about this idea of predicting future tokens more accurately? like how about this: we could have it output a bunch of tokens at once, rather than just one then feed them back as input and it could update them! like my approach to a self-modifying model in other thread it sounds fun !!
