You need to semantically ground it or it'll get lost in woo woo land, that's the issue. Look at what happened with the Dota 2 model early on, it got to the point where it beat humans, but humans figured out how to counter its strategies, then it had to play another billion games against itself before it beat them again.
The point is that the alignment problem is less to do with paperclip maximizing scenarios and more that the space of functions is too absurdly large that merely playing against your self over and over does not necxezarily mean you keep getting better if you do not occasionally touch base and ground it in human reinforcement: We will be there learning along the way, because it is a question of what we even want. As Ingo said: Garbage in, garbage out; Polish a turd as much as you like.. This isn't really a theory or argument as it is a limitation on what it means categorically for syntax & semantics to converge in the functoral limit. The stable reasoning required to learn from less data like humans do is the issue of hallucination, humans definitely conflate things but AI conflate much more absurdly, arbitrarily even.(Functional programmers sometimes get lost out there too, it's always your hardware not your clever syntax that define good abstraction; this is why i came to nim from clojure/haskell)