Send Link mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
https://mailman.anu.edu.au/mailman/listinfo/link
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Link digest..."
Today's Topics:
1. Re: DeepSeek .. does more with less (Stephen Loosley)
2. Berkeley researchers replicate DeepSeek R1 for $30
(Stephen Loosley)
----------------------------------------------------------------------
Message: 1
Date: Sat, 1 Feb 2025 02:52:18 +0000
From: Stephen Loosley <[email protected]>
To: LINK List <[email protected]>
Subject: Re: [LINK] DeepSeek .. does more with less
Message-ID:
<sy5p282mb44090cbc019795648e44d396c2...@sy5p282mb4409.ausp282.prod.outlook.com>
Content-Type: text/plain; charset="iso-8859-1"
Tom and Tony write,
>> Do you have an example of amazing results from it?
>>
>> I find the specifications unbelievable, in the literal sense, rather than
>> amazing.
>>
>> If nothing else it is good to see US tech & political leaders "Hoist with
>> his own petard". ;-)
>>
>
> I don't find the results more amazing than I find ALL the models. What I did
> find amazing
> is the almost stream-of-consciousness musings it goes through as it works on
> a problem?
>
> Antony Barry [email protected]
Good question Tom, and agree Tony.
Have joined quite a few, and put the exact question (to start with) to each.
As an ex-practicing psychologist, among other, one wanted a simple but
stretching question ..
"Syncronicity please?" (Carl Jung re "co-ordinated chance")
And then examined / explored further the various AI results
DeepSeek, was streets ahead in terms of quality and depth on this question, and
every follow-up question. Like talking with Jung. I found all of the others
almost trite by comparison.
DeepSeek gets me interested in follow-up psych research and for further reading
again, like no other experience over many years.
Thanks DeepSeek ..
------------------------------
Message: 2
Date: Sun, 02 Feb 2025 01:00:43 +1030
From: Stephen Loosley <[email protected]>
To: "link" <[email protected]>
Subject: [LINK] Berkeley researchers replicate DeepSeek R1 for $30
Message-ID: <[email protected]>
Content-Type: text/plain; charset="UTF-8"
DeepSeek R1 reproduced for $30:
Berkeley researchers replicate DeepSeek R1 for $30 ? casting doubt on H100
claims and controversy
Nickie Louise Posted On January 31, 2025
https://techstartups.com/2025/01/31/deepseek-r1-reproduced-for-30-berkeley-researchers-replicate-deepseek-r1-for-30-casting-doubt-on-h100-claims-and-controversy/
The rise of Chinese AI startup DeepSeek has been nothing short of remarkable.
After surpassing ChatGPT on the App Store, DeepSeek sent shockwaves to the tech
world, triggering a frenzy in the market.
But the attention hasn?t all been positive. DeepSeek?s website faced an attack
that forced the company to suspend registrations, and some skeptics questioned
whether the startup had relied on export-restricted Nvidia H100 chips rather
than the H800 chips it claimed to use?raising concerns about compliance and
cost efficiency.
Now, a breakthrough from researchers at the University of California, Berkeley,
is challenging some of these assumptions.
A team led by Ph.D. candidate Jiayi Pan has managed to replicate DeepSeek
R1-Zero?s core capabilities for less than $30?less than the cost of a night
out. Their research could spark a new era of small model RL revolution.
Their findings suggest that sophisticated AI reasoning doesn?t have to come
with a massive price tag, potentially shifting the balance between AI research
and accessibility.
Berkeley Researchers Recreate DeepSeek R1 for Just $30?A Challenge to H100
Narrative
The Berkeley team says they worked with a 3-billion-parameter language model
from DeepSeek, training it through reinforcement learning to develop
self-verification and search abilities. The goal was to solve arithmetic-based
challenges by reaching a target number?an experiment they managed to complete
for just $30.
By comparison, OpenAI?s o1 APIs cost $15 per million input tokens?more than 27
times the price of DeepSeek-R1, which runs at just $0.55 per million tokens.
Pan sees this project as a step toward lowering the barrier to reinforcement
learning scaling research, especially given its minimal cost.
But not everyone is on board. Machine learning expert Nathan Lambert questions
DeepSeek?s claim that training its 671-billion-parameter model only costs $5
million.
He argues that the figure likely excludes key expenses such as research
personnel, infrastructure, and electricity. His estimates put DeepSeek AI?s
annual operating costs somewhere between $500 million and over $1 billion.
Even so, the achievement stands out?especially considering that top U.S. AI
firms are pouring $10 billion a year into their AI efforts.
Breaking Down the Experiment: Small Models, Big Impact
According to Jiayi Pan?s post on Nitter, the team successfully reproduced
DeepSeek R1-Zero using a small language model with 3 billion parameters.
Running reinforcement learning on the Countdown game, the model developed
self-verification and search strategies?key abilities in advanced AI systems.
Key takeaways from their work:
They successfully reproduced DeepSeek R1-Zero?s methods for under $30.
Their 1.5-billion-parameter model demonstrated advanced reasoning skills.
Performance was on par with larger AI systems.
?We reproduced DeepSeek R1-Zero in the CountDown game, and it just works.
Through RL, the 3B base LM develops self-verification and search abilities all
on its own. You can experience the Ahah moment yourself for < $30,? Pan said on
X.
We reproduced DeepSeek R1-Zero in the CountDown game, and it just works
Through RL, the 3B base LM develops self-verification and search abilities all
on its own
You can experience the Ahah moment yourself for < $30
Code: https://t.co/B2IsN1PrXV
Here?s what we learned ? pic.twitter.com/43BVYMmS8X
? Jiayi Pan (@jiayi_pirate) January 24, 2025
Reinforcement Learning Breakthrough
The researchers began with a base language model, a structured prompt, and a
ground-truth reward. They then introduced reinforcement learning through
Countdown, a logic-based game adapted from a British TV show. In this
challenge, players must reach a target number using arithmetic operations?a
setup that encourages AI models to refine their reasoning skills.
Initially, the AI produced random answers. Through trial and error, it began
verifying its own responses, adjusting its approach with each
iteration?mirroring how humans solve problems. Even the smallest
0.5-billion-parameter model could only make simple guesses, but once scaled to
1.5 billion and beyond, the AI started exhibiting more advanced reasoning.
?We reproduced DeepSeek R1-Zero in the CountDown game, and it just works.
Through RL, the 3B base LM develops self-verification and search abilities all
on its own You can experience the Ahah moment yourself for < $30
https://github.com/Jiayi-Pan/TinyZero
Here?s what we learned,? Pan said in a post on Nitter
Surprising Discoveries
One of the most interesting findings was how different tasks led the model to
develop distinct problem-solving techniques. In Countdown, it refined its
search and verification strategies, learning to iterate and improve its
answers. When tackling multiplication problems, it applied the distributive
law?breaking numbers down much like humans do when solving complex calculations
mentally.
Another notable finding was that the choice of reinforcement learning
algorithm?whether PPO, GRPO, or PRIME?had little impact on overall performance.
The results were consistent across different methods, suggesting that
structured learning and model size play a greater role in shaping AI
capabilities than the specific algorithm used.
This challenges the notion that sophisticated AI requires vast computational
resources, demonstrating that complex reasoning can emerge from efficient
training techniques and well-structured models.
A key takeaway from the research was how the model adapted its problem-solving
techniques based on the task at hand.
Smarter AI Through Task-Specific Learning
One of the most interesting takeaways is how the AI adapted to different
challenges. For the Countdown game, the model learned search and
self-verification techniques. When tested with multiplication problems, it
approached them differently?using the distributive law to break down
calculations before solving them step by step.
Instead of blindly guessing, the AI refined its approach over multiple
iterations, verifying and revising its own answers until it landed on the
correct solution. This suggests that models can evolve specialized skills
depending on the task, rather than relying on a one-size-fits-all reasoning
method.
A Shift in AI Accessibility
With the full project costing less than $30 and the code publicly available on
GitHub, this research makes advanced AI more accessible to a wider range of
developers and researchers. It challenges the notion that groundbreaking
progress requires billion-dollar budgets, reinforcing the idea that smart
engineering can often outpace brute-force spending.
This work reflects a vision long championed by Richard Sutton, a leading figure
in reinforcement learning, who argued that simple learning frameworks can yield
powerful results. The Berkeley team?s findings suggest he was right?complex AI
capabilities don?t necessarily require massive-scale computing, just the right
training environment.
Conclusion
As AI development accelerates, breakthroughs like this could reshape how
researchers think about efficiency, cost, and accessibility. What started as an
effort to understand DeepSeek?s methods may end up setting new standards for
the field.
--
------------------------------
Subject: Digest Footer
_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link
------------------------------
End of Link Digest, Vol 387, Issue 2
************************************