Digging a bit more into the arxiv paper: https://arxiv.org/pdf/2301.05217
I am a bit surprised that the network bothers to 'discover' discrete fourier transforms rather than discovering convolutions in the home domain. It is also surprising that the 'phase transition' to generalization is relatively smooth wrt continued performance over the task. The network appears to use memorization as scaffolding toward further amplifications of structured mechanisms, this is then followed by garbage collection over the scaffolding. Is this kind of thing specific to these architectures? Is there evidence for something similar with us?
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. . FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
