Magic is always the explanation of those who can't understand.
Brent
On 12/12/2024 1:39 PM, 'Cosmin Visan' via Everything List wrote:
Magic!
On Thursday, 12 December 2024 at 20:00:58 UTC+2 John Clark wrote:
*The number of "tokens" (words or parts of words) used to train
LLMs is 100 times larger than it was in 2020, the largest are now
using tens of trillions. if you only consider text then the
entire Internet only contains about 3,100 trillion tokens. The
amount of text LLMs train on is doubling every year but the amount
of human generated text on the Internet is only growing at about
10% a year, if that trend continues AIs will run out of text
somewhere around 2028. Does that mean AI progress is about to hit
a wall? I don't think so for the following reasons:*
*For one thing, because of improvements in algorithms, the
computing power needed for a Large Language Model to achieve the
same performance has halved about every 8 months. *
*
*
*ALGORITHMIC PROGRESS IN LANGUAGE MODELS*
<https://arxiv.org/pdf/2403.05812>
*And computer chips specialized for AI rather than general
computing, like those made by Nvidia and other companies, are
getting faster even more rapidly than Moore's Law. Also, the rate
of growth of specialized data sets, such as astronomical and
biological data, are growing much much more quickly than text is;
that's how AIs got so good at predicting how proteins fold up. *
*And there is vastly more information if AI's are trained on other
types of data besides text, and some AI's are already being
trained on unlabeled images and videos. Yann LeCun, chief AI
scientist at Meta, said that "/although the 10^13 tokens used to
train a LLM sounds like a lot /(it would take a human 170,000
years to read that much)/, a 4-year-old child has absorbed a
volume of data 50 times greater than that just by looking at
objects during his waking hours. We’re never going to get to
human-level AI by just training on language, that’s just not
happening/".*
*And then there's synthetic data. AlphaGeometry was trained to
solve geometry problems using 100 million computer generated
synthetic examples with no human demonstrations, and it ended up
being as good at solving difficult geometry problems as the very
best high school students in the entire nation. *
*Solving olympiad geometry without human demonstrations*
<https://www.nature.com/articles/s41586-023-06747-5>
*AI researchers are starting to change their strategy and have
their AI's reread their training set many times because AI's
operate in a statistical way so rereading improves performance *
*Scaling Data-Constrained Language Models*
<https://arxiv.org/pdf/2305.16264>
*Andy Zouat Carnegie Mellon Universitysays "/once //an AI has got
a foundational knowledge base that’s probably greater than any
single person could have,it no longer needs more data to get
smarter. It just needs to sit and think. I think we’re probably
pretty close to that point/.”*
*
*
*John K Clark See what's on my new list at Extropolis
<https://groups.google.com/g/extropolis>*
nps
--
You received this message because you are subscribed to the Google
Groups "Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/everything-list/87d36fd7-9b3d-44e7-8bf7-885e87eca4e4n%40googlegroups.com
<https://groups.google.com/d/msgid/everything-list/87d36fd7-9b3d-44e7-8bf7-885e87eca4e4n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/everything-list/753ece80-1404-4803-9bfa-1cf1a3e4d9ef%40gmail.com.