this version has all the outputs fixed to a single value: https://bafkreic6ulo7ahnblyxdrklw7lcshmaob4yfawznnprrdis4nop7lw6nxu.ipfs.dweb.link/
it's quite clear that the model isn't differentiating between output positions. maybe this has to do with the decoder, which included output position encodings. the problem still seemed to occur when i used the [huggingface language model class rather than my own] which could mean i am using it wrongly. it could make sense to compare the decoder with google's original language model decoder, and see how they use it: https://github.com/deepmind/deepmind-research/tree/master/perceiver
