At Secret Math Meeting, Researchers Struggle to Outsmart AI

John Clark Sun, 08 Jun 2025 05:50:49 -0700

Two days ago the following article went online:

At Secret Math Meeting, Researchers Struggle to Outsmart AI
<https://www.scientificamerican.com/article/inside-the-secret-meeting-where-mathematicians-struggled-to-outsmart-ai/>

I think it's behind a paywall but it's super important so you can read it
below. It looks like professional human mathematicians will soon be
obsolete, if they're not already.
==
*The world's leading mathematicians were stunned by how adept artificial
intelligence is at doing their jobs*
By Lyndie Chiou

*On a weekend in mid-May, a clandestine mathematical conclave convened.
Thirty of the world’s most renowned mathematicians traveled to Berkeley,
Calif., with some coming from as far away as the U.K. The group’s members
faced off in a showdown with a “reasoning” chatbot that was tasked with
solving problems they had devised to test its mathematical mettle. After
throwing professor-level questions at the bot for two days, the researchers
were stunned to discover it was capable of answering some of the world’s
hardest solvable problems. “I have colleagues who literally said these
models are approaching mathematical genius,” says Ken Ono, a mathematician
at the University of Virginia and a leader and judge at the meeting.*

*The chatbot in question is powered by o4-mini, a so-called reasoning large
language model (LLM). It was trained by OpenAI to be capable of making
highly intricate deductions. Google’s equivalent, Gemini 2.5 Flash, has
similar abilities. Like the LLMs that powered earlier versions of ChatGPT,
o4-mini learns to predict the next word in a sequence. Compared with those
earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more
nimble models that train on specialized datasets with stronger
reinforcement from humans. The approach leads to a chatbot capable of
diving much deeper into complex problems in math than traditional LLMs.*

*To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a
nonprofit that benchmarks LLMs, to come up with 300 math questions whose
solutions had not yet been published. Even traditional LLMs can correctly
answer many complicated math questions. Yet when Epoch AI asked several
such models these questions, which were dissimilar to those they had been
trained on, the most successful were able to solve less than 2 percent,
showing these LLMs lacked the ability to reason. But o4-mini would prove to
be very different.*

*Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D., to
join the new collaboration for the benchmark, dubbed FrontierMath, in
September 2024. The project collected novel questions over varying tiers of
difficulty, with the first three tiers covering undergraduate-, graduate-
and research-level challenges. By April 2025, Glazer found that o4-mini
could solve around 20 percent of the questions. He then moved on to a
fourth tier: a set of questions that would be challenging even for an
academic mathematician. Only a small group of people in the world would be
capable of developing such questions, let alone answering them. The
mathematicians who participated had to sign a nondisclosure agreement
requiring them to communicate solely via the messaging app Signal. Other
forms of contact, such as traditional e-mail, could potentially be scanned
by an LLM and inadvertently train it, thereby contaminating the dataset.*

*Each problem the o4-mini couldn’t solve would garner the mathematician who
came up with it a $7,500 reward. The group made slow, steady progress in
finding questions. But Glazer wanted to speed things up, so Epoch AI hosted
the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the
participants would finalize the last batch of challenge questions. The 30
attendees were split into groups of six. For two days, the academics
competed against themselves to devise problems that they could solve but
would trip up the AI reasoning bot.*

*By the end of that Saturday night, Ono was frustrated with the bot, whose
unexpected mathematical prowess was foiling the group’s progress. “I came
up with a problem which experts in my field would recognize as an open
question in number theory—a good Ph.D.-level problem,” he says. He asked
o4-mini to solve the question. Over the next 10 minutes, Ono watched in
stunned silence as the bot unfurled a solution in real time, showing its
reasoning process along the way. The bot spent the first two minutes
finding and mastering the related literature in the field. Then it wrote on
the screen that it wanted to try solving a simpler “toy” version of the
question first in order to learn. A few minutes later, it wrote that it was
finally prepared to solve the more difficult problem. Five minutes after
that, o4-mini presented a correct but sassy solution. “It was starting to
get really cheeky,” says Ono, who is also a freelance mathematical
consultant for Epoch AI. “And at the end, it says, ‘No citation necessary
because the mystery number was computed by me!’”Defeated, Ono jumped onto
Signal early that Sunday morning and alerted the rest of the participants.
“I was not prepared to be contending with an LLM like this,” he says, “I’ve
never seen that kind of reasoning before in models. That’s what a scientist
does. That’s frightening.”Although the group did eventually succeed in
finding 10 questions that stymied the bot, the researchers were astonished
by how far AI had progressed in the span of one year. Ono likened it to
working with a “strong collaborator.” Yang Hui He, a mathematician at the
London Institute for Mathematical Sciences and an early pioneer of using AI
in math, says, “This is what a very, very good graduate student would be
doing—in fact, more.”The bot was also much faster than a professional
mathematician, taking mere minutes to do what it would take such a human
expert weeks or months to complete.While sparring with o4-mini was
thrilling, its progress was also alarming. Ono and He express concern that
the o4-mini’s results might be trusted too much. “There’s proof by
induction, proof by contradiction, and then proof by intimidation,” He
says. “If you say something with enough authority, people just get scared.
I think o4-mini has mastered proof by intimidation; it says everything with
so much confidence.”By the end of the meeting, the group started to
consider what the future might look like for mathematicians. Discussions
turned to the inevitable “tier five”—questions that even the best
mathematicians couldn't solve. If AI reaches that level, the role of
mathematicians would undergo a sharp change. For instance, mathematicians
may shift to simply posing questions and interacting with reasoning-bots to
help them discover new mathematical truths, much the same as a professor
does with graduate students. As such, Ono predicts that nurturing
creativity in higher education will be a key in keeping mathematics going
for future generations.“I’ve been telling my colleagues that it’s a grave
mistake to say that generalized artificial intelligence will never come,
[that] it’s just a computer,” Ono says. “I don’t want to add to the
hysteria, but in some ways these large language models are already
outperforming most of our best graduate students in the world.”*

--
You received this message because you are subscribed to the Google Groups
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to everything-list+unsubscr...@googlegroups.com.
To view this discussion visit
https://groups.google.com/d/msgid/everything-list/CAJPayv1zxVfKa3tdSNW9BCy%3Db2PzR-7cSE1O8h1M9DxE5F1yzw%40mail.gmail.com.

At Secret Math Meeting, Researchers Struggle to Outsmart AI

Reply via email to