Send Link mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mailman.anu.edu.au/mailman/listinfo/link
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Link digest..."


Today's Topics:

   1. Re: DeepSeek .. does more with less (Stephen Loosley)
   2. Berkeley researchers replicate DeepSeek R1 for $30
      (Stephen Loosley)


----------------------------------------------------------------------

Message: 1
Date: Sat, 1 Feb 2025 02:52:18 +0000
From: Stephen Loosley <[email protected]>
To: LINK List <[email protected]>
Subject: Re: [LINK] DeepSeek .. does more with less
Message-ID:
        
<sy5p282mb44090cbc019795648e44d396c2...@sy5p282mb4409.ausp282.prod.outlook.com>
        
Content-Type: text/plain; charset="iso-8859-1"

Tom and Tony write,

>> Do you have an example of amazing results from it?
>>
>> I find the specifications unbelievable, in the literal sense, rather than 
>> amazing.
>>
>> If nothing else it is good to see US tech & political leaders "Hoist with 
>> his own petard". ;-)
>>
>
> I don't find the results more amazing than I find ALL the models. What I did 
> find amazing
> is the almost stream-of-consciousness musings it goes through as it works on 
> a problem?
>
> Antony Barry [email protected]


Good question Tom, and agree Tony. 

Have joined quite a few, and put the exact question (to start with) to each.

As an ex-practicing psychologist, among other, one wanted a simple but 
stretching question ..

"Syncronicity please?"  (Carl Jung re "co-ordinated chance")

And then examined / explored further the various AI results  

DeepSeek, was streets ahead in terms of quality and depth on this question, and 
every follow-up question. Like talking with Jung. I found all of the others 
almost trite by comparison.

DeepSeek gets me interested in follow-up psych research and for further reading 
again, like no other experience over many years.

Thanks DeepSeek .. 


 


------------------------------

Message: 2
Date: Sun, 02 Feb 2025 01:00:43 +1030
From: Stephen Loosley <[email protected]>
To: "link" <[email protected]>
Subject: [LINK] Berkeley researchers replicate DeepSeek R1 for $30
Message-ID: <[email protected]>
Content-Type: text/plain; charset="UTF-8"


DeepSeek R1 reproduced for $30: 

Berkeley researchers replicate DeepSeek R1 for $30 ? casting doubt on H100 
claims and controversy


Nickie Louise Posted On January 31, 2025  
https://techstartups.com/2025/01/31/deepseek-r1-reproduced-for-30-berkeley-researchers-replicate-deepseek-r1-for-30-casting-doubt-on-h100-claims-and-controversy/



The rise of Chinese AI startup DeepSeek has been nothing short of remarkable. 
After surpassing ChatGPT on the App Store, DeepSeek sent shockwaves to the tech 
world, triggering a frenzy in the market. 

But the attention hasn?t all been positive. DeepSeek?s website faced an attack 
that forced the company to suspend registrations, and some skeptics questioned 
whether the startup had relied on export-restricted Nvidia H100 chips rather 
than the H800 chips it claimed to use?raising concerns about compliance and 
cost efficiency.

Now, a breakthrough from researchers at the University of California, Berkeley, 
is challenging some of these assumptions. 

A team led by Ph.D. candidate Jiayi Pan has managed to replicate DeepSeek 
R1-Zero?s core capabilities for less than $30?less than the cost of a night 
out. Their research could spark a new era of small model RL revolution.

Their findings suggest that sophisticated AI reasoning doesn?t have to come 
with a massive price tag, potentially shifting the balance between AI research 
and accessibility.


Berkeley Researchers Recreate DeepSeek R1 for Just $30?A Challenge to H100 
Narrative

The Berkeley team says they worked with a 3-billion-parameter language model 
from DeepSeek, training it through reinforcement learning to develop 
self-verification and search abilities. The goal was to solve arithmetic-based 
challenges by reaching a target number?an experiment they managed to complete 
for just $30. 

By comparison, OpenAI?s o1 APIs cost $15 per million input tokens?more than 27 
times the price of DeepSeek-R1, which runs at just $0.55 per million tokens. 
Pan sees this project as a step toward lowering the barrier to reinforcement 
learning scaling research, especially given its minimal cost.

But not everyone is on board. Machine learning expert Nathan Lambert questions 
DeepSeek?s claim that training its 671-billion-parameter model only costs $5 
million. 

He argues that the figure likely excludes key expenses such as research 
personnel, infrastructure, and electricity. His estimates put DeepSeek AI?s 
annual operating costs somewhere between $500 million and over $1 billion. 

Even so, the achievement stands out?especially considering that top U.S. AI 
firms are pouring $10 billion a year into their AI efforts.


Breaking Down the Experiment: Small Models, Big Impact

According to Jiayi Pan?s post on Nitter, the team successfully reproduced 
DeepSeek R1-Zero using a small language model with 3 billion parameters. 

Running reinforcement learning on the Countdown game, the model developed 
self-verification and search strategies?key abilities in advanced AI systems.


Key takeaways from their work:


    They successfully reproduced DeepSeek R1-Zero?s methods for under $30.
    Their 1.5-billion-parameter model demonstrated advanced reasoning skills.
    Performance was on par with larger AI systems.


    ?We reproduced DeepSeek R1-Zero in the CountDown game, and it just works. 
Through RL, the 3B base LM develops self-verification and search abilities all 
on its own. You can experience the Ahah moment yourself for < $30,? Pan said on 
X.

We reproduced DeepSeek R1-Zero in the CountDown game, and it just works

Through RL, the 3B base LM develops self-verification and search abilities all 
on its own


You can experience the Ahah moment yourself for < $30
Code: https://t.co/B2IsN1PrXV

Here?s what we learned ? pic.twitter.com/43BVYMmS8X

? Jiayi Pan (@jiayi_pirate) January 24, 2025

Reinforcement Learning Breakthrough

The researchers began with a base language model, a structured prompt, and a 
ground-truth reward. They then introduced reinforcement learning through 
Countdown, a logic-based game adapted from a British TV show. In this 
challenge, players must reach a target number using arithmetic operations?a 
setup that encourages AI models to refine their reasoning skills.

Initially, the AI produced random answers. Through trial and error, it began 
verifying its own responses, adjusting its approach with each 
iteration?mirroring how humans solve problems. Even the smallest 
0.5-billion-parameter model could only make simple guesses, but once scaled to 
1.5 billion and beyond, the AI started exhibiting more advanced reasoning.

    ?We reproduced DeepSeek R1-Zero in the CountDown game, and it just works. 
Through RL, the 3B base LM develops self-verification and search abilities all 
on its own You can experience the Ahah moment yourself for < $30

    https://github.com/Jiayi-Pan/TinyZero

    Here?s what we learned,? Pan said in a post on Nitter

Surprising Discoveries

One of the most interesting findings was how different tasks led the model to 
develop distinct problem-solving techniques. In Countdown, it refined its 
search and verification strategies, learning to iterate and improve its 
answers. When tackling multiplication problems, it applied the distributive 
law?breaking numbers down much like humans do when solving complex calculations 
mentally.

Another notable finding was that the choice of reinforcement learning 
algorithm?whether PPO, GRPO, or PRIME?had little impact on overall performance. 
The results were consistent across different methods, suggesting that 
structured learning and model size play a greater role in shaping AI 
capabilities than the specific algorithm used. 

This challenges the notion that sophisticated AI requires vast computational 
resources, demonstrating that complex reasoning can emerge from efficient 
training techniques and well-structured models.


A key takeaway from the research was how the model adapted its problem-solving 
techniques based on the task at hand.

Smarter AI Through Task-Specific Learning

One of the most interesting takeaways is how the AI adapted to different 
challenges. For the Countdown game, the model learned search and 
self-verification techniques. When tested with multiplication problems, it 
approached them differently?using the distributive law to break down 
calculations before solving them step by step.

Instead of blindly guessing, the AI refined its approach over multiple 
iterations, verifying and revising its own answers until it landed on the 
correct solution. This suggests that models can evolve specialized skills 
depending on the task, rather than relying on a one-size-fits-all reasoning 
method.

A Shift in AI Accessibility

With the full project costing less than $30 and the code publicly available on 
GitHub, this research makes advanced AI more accessible to a wider range of 
developers and researchers. It challenges the notion that groundbreaking 
progress requires billion-dollar budgets, reinforcing the idea that smart 
engineering can often outpace brute-force spending.

This work reflects a vision long championed by Richard Sutton, a leading figure 
in reinforcement learning, who argued that simple learning frameworks can yield 
powerful results. The Berkeley team?s findings suggest he was right?complex AI 
capabilities don?t necessarily require massive-scale computing, just the right 
training environment.


Conclusion

As AI development accelerates, breakthroughs like this could reshape how 
researchers think about efficiency, cost, and accessibility. What started as an 
effort to understand DeepSeek?s methods may end up setting new standards for 
the field.

 --





------------------------------

Subject: Digest Footer

_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link


------------------------------

End of Link Digest, Vol 387, Issue 2
************************************

Reply via email to