https://youtube.com/clip/UgkxKIfQA8UpbuzBp2ahwAxXkIPxqJtGGRRn?si=hXSAi7XfM9lbg0nf

One way of viewing this is just that by introducing noise into gradient
descent once can avoid local minima.

Another way of viewing it is that so-called "misinformation" can teach
critical thinking so long as there is enough countervailing information to
overcome the misinformation.

This latter view is associated with the *specious* view that "bias" about
what IS the case in language models arises from the "bias" in the
population that generated the *randomly* curated training data because the
population's "bias" is *not* random.

This view is specious for the same reason that science is not democratic --
it engages in critical thinking that takes into account the broadest range
of data practical in order to find a consistent or canonical body of
knowledge -- a world model of maximum parsimony.  The fact that a
population may be "voting" in a sense for misinformation becomes merely
more phenomena to be modeled as data.  The model of that population models
the bias itself as knowledge.

This is why lossless compression of Wikipedia will result not only in
distilled knowledge about the topics of the articles, but about the biases
of the editors so as to more parsimoniously represent that corpus.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta0524e111294e22e-M31a197f019fde93c9c91b542
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to