---------- Forwarded message ---------
From: Astral Codex Ten <astralcodex...@substack.com>
Date: Tue, Jul 8, 2025 at 8:24 AM
Subject: Now I Really Won That AI Bet
To: <johnkcl...@gmail.com>


...
͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
  ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
    ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
  ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
    ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
  ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
    ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
  ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
    ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
  ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
    ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
  ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏
    ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­
Forwarded this email? Subscribe here
<https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly93d3cuYXN0cmFsY29kZXh0ZW4uY29tL3N1YnNjcmliZT91dG1fc291cmNlPWVtYWlsJnV0bV9jYW1wYWlnbj1lbWFpbC1zdWJzY3JpYmUmcj02eDNubiZuZXh0PWh0dHBzJTNBJTJGJTJGd3d3LmFzdHJhbGNvZGV4dGVuLmNvbSUyRnAlMkZub3ctaS1yZWFsbHktd29uLXRoYXQtYWktYmV0IiwicCI6MTY3NzA5NDYxLCJzIjo4OTEyMCwiZiI6dHJ1ZSwidSI6MTE2MjIwODMsImlhdCI6MTc1MTk3NzQ3OCwiZXhwIjoyMDY3NTUzNDc4LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.yIHR0jrx620DKRvJwZ2awajrdz_iJXgQe7aXQgl917w?>
for more
Now I Really Won That AI Bet
<https://substack.com/app-link/post?publication_id=89120&post_id=167709461&utm_source=post-email-title&utm_campaign=email-post-title&isFreemail=true&r=6x3nn&token=eyJ1c2VyX2lkIjoxMTYyMjA4MywicG9zdF9pZCI6MTY3NzA5NDYxLCJpYXQiOjE3NTE5Nzc0NzgsImV4cCI6MTc1NDU2OTQ3OCwiaXNzIjoicHViLTg5MTIwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.bekIi_-crd6xf6uxUmAMhuHTEJZGMNtXbv_iTSDCw3o>
...

Jul 8

<https://substack.com/app-link/post?publication_id=89120&post_id=167709461&utm_source=substack&isFreemail=true&submitLike=true&token=eyJ1c2VyX2lkIjoxMTYyMjA4MywicG9zdF9pZCI6MTY3NzA5NDYxLCJyZWFjdGlvbiI6IuKdpCIsImlhdCI6MTc1MTk3NzQ3OCwiZXhwIjoxNzU0NTY5NDc4LCJpc3MiOiJwdWItODkxMjAiLCJzdWIiOiJyZWFjdGlvbiJ9.iEvMGVmEhjUh2VT__1eE1tDecsLfeshbnZGG4714Q_A&utm_medium=email&utm_campaign=email-reaction&r=6x3nn>
<https://substack.com/app-link/post?publication_id=89120&post_id=167709461&utm_source=substack&utm_medium=email&isFreemail=true&comments=true&token=eyJ1c2VyX2lkIjoxMTYyMjA4MywicG9zdF9pZCI6MTY3NzA5NDYxLCJpYXQiOjE3NTE5Nzc0NzgsImV4cCI6MTc1NDU2OTQ3OCwiaXNzIjoicHViLTg5MTIwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.bekIi_-crd6xf6uxUmAMhuHTEJZGMNtXbv_iTSDCw3o&r=6x3nn&utm_campaign=email-half-magic-comments&action=post-comment&utm_source=substack&utm_medium=email>
<https://substack.com/app-link/post?publication_id=89120&post_id=167709461&utm_source=substack&utm_medium=email&utm_content=share&utm_campaign=email-share&action=share&triggerShare=true&isFreemail=true&r=6x3nn&token=eyJ1c2VyX2lkIjoxMTYyMjA4MywicG9zdF9pZCI6MTY3NzA5NDYxLCJpYXQiOjE3NTE5Nzc0NzgsImV4cCI6MTc1NDU2OTQ3OCwiaXNzIjoicHViLTg5MTIwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.bekIi_-crd6xf6uxUmAMhuHTEJZGMNtXbv_iTSDCw3o>
<https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9vcGVuLnN1YnN0YWNrLmNvbS9wdWIvYXN0cmFsY29kZXh0ZW4vcC9ub3ctaS1yZWFsbHktd29uLXRoYXQtYWktYmV0P3V0bV9zb3VyY2U9c3Vic3RhY2smdXRtX21lZGl1bT1lbWFpbCZ1dG1fY2FtcGFpZ249ZW1haWwtcmVzdGFjay1jb21tZW50JmFjdGlvbj1yZXN0YWNrLWNvbW1lbnQmcj02eDNubiZ0b2tlbj1leUoxYzJWeVgybGtJam94TVRZeU1qQTRNeXdpY0c5emRGOXBaQ0k2TVRZM056QTVORFl4TENKcFlYUWlPakUzTlRFNU56YzBOemdzSW1WNGNDSTZNVGMxTkRVMk9UUTNPQ3dpYVhOeklqb2ljSFZpTFRnNU1USXdJaXdpYzNWaUlqb2ljRzl6ZEMxeVpXRmpkR2x2YmlKOS5iZWtJaV8tY3JkNnhmNnV4VW1BTWh1SFRFSlpHTU50WGJ2X2lUU0RDdzNvIiwicCI6MTY3NzA5NDYxLCJzIjo4OTEyMCwiZiI6dHJ1ZSwidSI6MTE2MjIwODMsImlhdCI6MTc1MTk3NzQ3OCwiZXhwIjoyMDY3NTUzNDc4LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.tsw1FFyxGm7XQIUjQ3XEuMvwwvpEqqxGRfDlBqf7Z0s?&utm_source=substack&utm_medium=email>

READ IN APP
<https://open.substack.com/pub/astralcodexten/p/now-i-really-won-that-ai-bet?utm_source=email&redirect=app-store&utm_campaign=email-read-in-app>


In June 2022, I bet a commenter $100 that AI would master image
compositionality by June 2025.

DALL-E2 had just come out, showcasing the potential of AI art. But it
couldn’t follow complex instructions; its images only matched the “vibe” of
the prompt. For example, here were some of its attempts at “a red sphere on
a blue cube, with a yellow pyramid on the right, all on top of a green
table”.
<https://substack.com/redirect/16423230-d82c-49f9-a2b2-b6806195457d?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>

At the time, I wrote:

I’m not going to make the mistake of saying these problems are inherent to
AI art. My guess is a slightly better language model would solve most of
them…for all I know, some of the larger image models have already fixed
these issues. These are the sorts of problems I expect to go away with a
few months of future research.

Commenters objected that this was overly optimistic. AI was just a
pattern-matching “stochastic parrot”. It would take a deep understanding of
grammar to get a prompt exactly right, and that would require some entirely
new paradigm beyond LLMs. For example, from Vitor
<https://substack.com/redirect/3b2ad7b0-d6e9-4919-b37f-73b83e61e847?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
:

Why are you so confident in this? The inability of systems like DALL-E to
understand semantics in ways requiring an actual internal world model
strikes me as the very heart of the issue. We can also see this exact
failure mode in the language models themselves. They only produce good
results when the human asks for something vague with lots of room for
interpretation, like poetry or fanciful stories without much internal logic
or continuity.

Not to toot my own horn, but two years ago you were naively saying we'd
have GPT-like models scaled up several orders of magnitude (100T
parameters) right about now (
https://slatestarcodex.com/2020/06/10/the-obligatory-gpt-3-post/#comment-912798
<https://substack.com/redirect/1c479eec-9dda-4f89-b0e6-a208e82aa341?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
).

I'm registering my prediction that you're being equally naive now. Truly
solving this issue seems AI-complete to me. I'm willing to bet on this
(ideas on operationalization welcome).

So we made a bet
<https://substack.com/redirect/fcec5120-908f-4440-8b76-e3e4855249aa?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>!


All right. My proposed operationalization of this is that on June 1, 2025,
if either if us can get access to the best image generating model at that
time (I get to decide which), or convince someone else who has access to
help us, we'll give it the following prompts:

1. A stained glass picture of a woman in a library with a raven on her
shoulder with a key in its mouth

2. An oil painting of a man in a factory looking at a cat wearing a top hat

3. A digital art picture of a child riding a llama with a bell on its tail
through a desert

4. A 3D render of an astronaut in space holding a fox wearing lipstick

5. Pixel art of a farmer in a cathedral holding a red basketball

We generate 10 images for each prompt, just like DALL-E2 does. If at least
one of the ten images has the scene correct in every particular on 3/5
prompts, I win, otherwise you do. Loser pays winner $100, and whatever the
result is I announce it on the blog (probably an open thread). If we
disagree, Gwern is the judge.

Some image models of the time refused to draw humans, so we agreed that
robots could stand in for humans in pictures that required them.

In September 2022, I got some good results from Google Imagen and announced
I had won the three-year bet in three months
<https://substack.com/redirect/792fbfbf-9f72-444f-b2e4-529ee9725aba?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>.
Commenters yelled at me, saying that Imagen still hadn’t gotten them quite
right and my victory declaration was premature. The argument blew up enough
that Edwin Chen of Surge, an “RLHF and human LLM evaluation platform”,
stepped in and asked his professional AI data labelling team. Their verdict
was clear
<https://substack.com/redirect/346f3e5a-c4d7-43c8-97b5-98c34a1f9eb7?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>:
the AI was bad and I was wrong. Rather than embarrass myself further, I
agreed to wait out the full length of the bet and re-evaluate in June 2025.

The bet is now over, and official judge Gwern agrees I’ve won
<https://substack.com/redirect/9d5d05c7-707c-45b4-ad67-32957ed002aa?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>.
Before I gloat, let’s look at the images that got us here.
AI Compositionality: A Three Year Retrospective

*Image Set 1: June 2022*

When we first made the bet in June 2022, the best that an AI model could do
on the five prompts was:
<https://substack.com/redirect/3b057606-ee12-4da8-8f40-979dafa238f5?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>

You can see why people would be skeptical! In most images, the pieces are
all there: astronauts, foxes, lipstick. But they’re combined in whatever
way seems most “plausible” or “realistic”, rather than the way indicated by
the prompt - so for example, the astronaut is wearing the lipstick, rather
than the fox. Other times there are unrelated inexplicable failures, like
the half-fox, half-astronaut abomination in panel #1. Here we get 0/5.

*Image Set 2: June 2022*

Three months later, I declared premature victory when Google Imagen
produced the following:
<https://substack.com/redirect/f927ab46-e2c5-4640-af6d-7e00887d129a?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>

I said it got the cat, llama, and basketball exactly right, meeting the
necessary 3/5. Edwin and his evaluators disagreed
<https://substack.com/redirect/346f3e5a-c4d7-43c8-97b5-98c34a1f9eb7?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>.
They granted success on the cat. But the llama didn’t really have a clear
bell on its tail (the closest, #4, was more of a globe). And the final
robot wasn’t much of a farmer, wasn’t in much of a cathedral, and the
basketball was more orange than red. They granted me 1/5. Fine.

*Image Set 3: January 2024*

One of the questions on the 2023 - 2024 ACX prediction contest was whether
any AI would win the bet by the end of 2023. In order to resolve the
question, Edwin and his Surge team returned to the image mines in January
2024. They checked DALL-E3 and Midjourney; I’m including only the pictures
from DALL-E3, which did better. Here they are
<https://substack.com/redirect/f902169d-fb5f-4ef0-b6d3-4cb4eb4d2d77?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
:
<https://substack.com/redirect/9fc039b0-0590-46ab-8cdc-38d6be63eb42?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
<https://substack.com/redirect/3f6257e1-09f3-4a84-9d2c-94bea88d7cde?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>

These are of higher artistic quality, and they can finally generate humans
(instead of just robots).

But they still don’t win the bet. This time Edwin granted the cat and the
farmer. But the stupid llama still didn’t have the bell on its tail, the
#$%&ing raven still didn’t have the key in its mouth, and although the fox
had lipstick in one picture (#2), the astronaut wasn’t exactly holding it.
2/5, one short of victory.

On prediction markets, where users had given 62% probability that Edwin
would grant me the win that year, reactions were outraged
<https://substack.com/redirect/cc85cf65-770d-491c-9d89-3764b0cc51f1?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>.
“Are you kidding me?” asked one commenter. “Is Edwin Chen an asshole?
Clearly he is,” said another.

*Image Set 4: September - December 2024*

User askwho
<https://substack.com/redirect/de15a8d8-af4f-43e5-86eb-1c9ecc6dbcc9?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
on the Bayesian Conspiracy Discord claimed that Google Imagen passed the
test
<https://substack.com/redirect/ad90322f-e190-4f70-914f-9d835b75354b?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
in September 2024 (he said Imagen 2, but based on the timing it may have
been Imagen 3). But he didn’t post it publicly and couldn’t remember all
details, so I’ll evaluate this related claim
<https://substack.com/redirect/d55a75e0-8a0b-4ef6-9bcb-41ec947b4728?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>,
also about Imagen 3, from December:
<https://substack.com/redirect/07f55223-00b6-4818-b40a-0645091b6472?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
A
stained glass picture of a woman in a library with a raven on her shoulder
with a key in its mouth
<https://substack.com/redirect/98c46340-1af9-4d37-ab2b-03dd9d0f27e4?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
An
oil painting of a man in a factory looking at a cat wearing a top hat
<https://substack.com/redirect/565be96d-cd17-45e3-8426-71bf91c5d74f?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
A
digital art picture of a child riding a llama with a bell on its tail
through a desert
<https://substack.com/redirect/e4599425-1f8b-47f3-a63e-e21d2006c8ec?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
A
3D render of an astronaut in space holding a fox wearing lipstick
<https://substack.com/redirect/43c39f92-f99d-4a9a-9979-4ce69e5e9d79?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
Pixel
art of a farmer in a cathedral holding a red basketball

I would give this 3/5. We keep the top-hatted cat and the
basketball-holding farmer, and the bell is finally on the llama’s tail. But
the raven picture isn’t stained glass, and the fox still doesn’t have
lipstick.

I tried to contact Edwin for confirmation, without success. I wondered what
had happened to him, and a quick search found that his AI data-labeling
company did very well and he’s now probably a billionaire
<https://substack.com/redirect/d7369a04-de23-424b-a54d-d6e066846abd?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>.
I hope he’s relaxing on a yacht somewhere, far away from angry prediction
market commenters.

In the absence of a grader, I figured I would let the bet run out the clock.

*Image Set 5: May - June 2025*

These are using ChatGPT 4o, released in May 2025, all images generated June
1 (thanks a reader
<https://substack.com/redirect/81977361-2de3-420d-91de-1fda98f16af9?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
):
<https://substack.com/redirect/d9f1ce9e-ca4b-4c82-9fc6-ea47377531d9?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
A
stained glass picture of a woman in a library with a raven on her shoulder
with a key in its mouth
<https://substack.com/redirect/2b00c341-d924-45b3-a064-c9c09acb5d84?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
An
oil painting of a man in a factory looking at a cat wearing a top hat
<https://substack.com/redirect/aabf2ffe-16a4-47be-9da8-92619f0b36f4?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
A
digital art picture of a child riding a llama with a bell on its tail
through a desert
<https://substack.com/redirect/4b78c3ab-d0cd-4388-87a9-6363ef2d5bbf?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
A
3D render of an astronaut in space holding a fox wearing lipstick
<https://substack.com/redirect/23a18a44-db2f-48b8-a401-e91c811b4f60?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
Pixel
art of a farmer in a cathedral holding a red basketball

Not only is this 5/5, but it’s an obvious step up in matching the styles,
and these were all produced on the first try. In retrospect, it feels like
judges were right to dismiss former models, which were sort of blundering
about and getting some of them right by coincidence. 4o just works.

Edwin is presumably still on his yacht, but original contest judge Gwern
gave it his seal of approval, saying:

I think I agree he has clearly won the bet. As you say, the images look
correct and I'm willing to call the ball 'red' because of the overall
yellow tint (good old color constancy).

In Memoriam: Your Last Set Of Goalposts, Gone But Not Forgotten

It’s probably bad form to write a whole blog post gloating that you won a
bet.

I’m doing it anyway, because we’re still having the same debate - whether
AI is a “stochastic parrot” that will never be able to go beyond “mere
pattern-matching” into the realm of “real understanding”.

My position has always been
<https://substack.com/redirect/3d4fb146-9f9a-405d-925b-1eaf2348d53c?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
that there’s no fundamental difference: you just move from matching shallow
patterns to deeper patterns, and when the patterns are as deep as the ones
humans can match, we call that
<https://substack.com/redirect/dc1ce8da-9d53-4049-b51e-5eb556225a5b?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
“real understanding”. This isn’t quite right - there’s a certain form of
mental agency that humans still do much better than AIs - but again, it’s a
(large) difference in degree rather than in kind.

I think this thesis has done well so far. So far, every time people have
claimed there’s something an AI can never do without “real understanding”,
the AI has accomplished it with better pattern-matching. This was true back
in 2020 when GPT-2 failed to add 2+1 and Gary Marcus declared
<https://substack.com/redirect/76db58c8-a363-44d0-abfa-e248a6ce32b2?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
that scaling had failed and it was time to “consider investing in different
approaches” (according to Terence Tao, working with AIs is now “on par with
trying to advise a mediocre, but not completely incompetent, static
simulation of a graduate student”). I think progress in AI art tells the
same story.

There is still one discordant note in this story. When I give 4o a really
hard prompt…

*Please draw a picture of a fox wearing lipstick, holding a red basketball
under his arm, reading a newspaper whose headline is "I WON MY THREE YEAR
AI BET". The fox has a raven on his shoulder, and the raven has a key in
its mouth.*

…it still can’t get it quite right:
<https://substack.com/redirect/8942e923-1d69-4e12-9973-5581e926887d?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
The
raven isn’t on the fox’s shoulder!

But a smart human can complete an arbitrarily complicated prompt. So is
there still some sense in which the AI is “just pattern matching”, but the
human is “really understanding”? Maybe AIs get better at pattern-matching
as they scale up, and eventually they’ll get good enough for every
conceivable reasonably task, but they still won’t be *infinitely *good in
the same way humans are?

I think there’s something going on here where the AI is doing the
equivalent of a human trying to keep a prompt in working memory after
hearing it once - something we *can’t* do arbitrarily well. I admit I can’t
prove this, and it’s not necessarily intuitive - the AI does have a
scratchpad, not to mention it has the prompt in front of it the whole time.
It’s just what makes sense to me based on an analogy with math problems,
where AIs often break down at the same point humans do (eg they can
multiply two-digit numbers “in their head”, but not three-digit numbers). I
think this will be solved when we solve agency well enough that the AI can
generate plans like drawing part of the picture at a time, then checking
the prompt, then doing the rest of it. This may require new skills, like
self-reference and planning, which might be added in by hand, emerge
naturally from the scaling and training process, or some combination of
both.
<https://substack.com/redirect/42c61a5e-e731-40aa-85a4-e35520753082?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>
<https://substack.com/redirect/5e9c206e-752f-4fbf-afe6-fc2c74ed6eb3?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>

If you disagree, let me know - maybe we can bet on it!

*Thanks to everyone who helped operationalize, judge, and generate images
for this bet. Vitor, you owe me $100, email me at sc...@slatestarcodex.com
<sc...@slatestarcodex.com>.*

You're currently a free subscriber to Astral Codex Ten
<https://substack.com/redirect/69d9afd4-6beb-4061-a504-393dc2c61b81?j=eyJ1IjoiNngzbm4ifQ.I1PMvYo4mI3PquTDRhL5Dev-9_ouIq3kw6ZhrVNsy8o>.
For the full experience, upgrade your subscription.
<https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly93d3cuYXN0cmFsY29kZXh0ZW4uY29tL3N1YnNjcmliZT91dG1fc291cmNlPXBvc3QmdXRtX2NhbXBhaWduPWVtYWlsLWNoZWNrb3V0Jm5leHQ9aHR0cHMlM0ElMkYlMkZ3d3cuYXN0cmFsY29kZXh0ZW4uY29tJTJGcCUyRm5vdy1pLXJlYWxseS13b24tdGhhdC1haS1iZXQmcj02eDNubiZ0b2tlbj1leUoxYzJWeVgybGtJam94TVRZeU1qQTRNeXdpYVdGMElqb3hOelV4T1RjM05EYzRMQ0psZUhBaU9qRTNOVFExTmprME56Z3NJbWx6Y3lJNkluQjFZaTA0T1RFeU1DSXNJbk4xWWlJNkltTm9aV05yYjNWMEluMC40RDZHWExCb05YQXhFZEFOR1F4S0lvaFF4V09kcl9Oa0RFa3hXbzQtSU9FIiwicCI6MTY3NzA5NDYxLCJzIjo4OTEyMCwiZiI6dHJ1ZSwidSI6MTE2MjIwODMsImlhdCI6MTc1MTk3NzQ3OCwiZXhwIjoyMDY3NTUzNDc4LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.Oaa5B4z_2jG4GazpC8K5mTUNofHi68feqgP7t-HciF8?&utm_source=substack&utm_medium=email&utm_content=postcta>

Upgrade to paid
<https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly93d3cuYXN0cmFsY29kZXh0ZW4uY29tL3N1YnNjcmliZT91dG1fc291cmNlPXBvc3QmdXRtX2NhbXBhaWduPWVtYWlsLWNoZWNrb3V0Jm5leHQ9aHR0cHMlM0ElMkYlMkZ3d3cuYXN0cmFsY29kZXh0ZW4uY29tJTJGcCUyRm5vdy1pLXJlYWxseS13b24tdGhhdC1haS1iZXQmcj02eDNubiZ0b2tlbj1leUoxYzJWeVgybGtJam94TVRZeU1qQTRNeXdpYVdGMElqb3hOelV4T1RjM05EYzRMQ0psZUhBaU9qRTNOVFExTmprME56Z3NJbWx6Y3lJNkluQjFZaTA0T1RFeU1DSXNJbk4xWWlJNkltTm9aV05yYjNWMEluMC40RDZHWExCb05YQXhFZEFOR1F4S0lvaFF4V09kcl9Oa0RFa3hXbzQtSU9FIiwicCI6MTY3NzA5NDYxLCJzIjo4OTEyMCwiZiI6dHJ1ZSwidSI6MTE2MjIwODMsImlhdCI6MTc1MTk3NzQ3OCwiZXhwIjoyMDY3NTUzNDc4LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.Oaa5B4z_2jG4GazpC8K5mTUNofHi68feqgP7t-HciF8?&utm_source=substack&utm_medium=email&utm_content=postcta>

Like
<https://substack.com/app-link/post?publication_id=89120&post_id=167709461&utm_source=substack&isFreemail=true&submitLike=true&token=eyJ1c2VyX2lkIjoxMTYyMjA4MywicG9zdF9pZCI6MTY3NzA5NDYxLCJyZWFjdGlvbiI6IuKdpCIsImlhdCI6MTc1MTk3NzQ3OCwiZXhwIjoxNzU0NTY5NDc4LCJpc3MiOiJwdWItODkxMjAiLCJzdWIiOiJyZWFjdGlvbiJ9.iEvMGVmEhjUh2VT__1eE1tDecsLfeshbnZGG4714Q_A&utm_medium=email&utm_campaign=email-reaction&r=6x3nn>
Comment
<https://substack.com/app-link/post?publication_id=89120&post_id=167709461&utm_source=substack&utm_medium=email&isFreemail=true&comments=true&token=eyJ1c2VyX2lkIjoxMTYyMjA4MywicG9zdF9pZCI6MTY3NzA5NDYxLCJpYXQiOjE3NTE5Nzc0NzgsImV4cCI6MTc1NDU2OTQ3OCwiaXNzIjoicHViLTg5MTIwIiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.bekIi_-crd6xf6uxUmAMhuHTEJZGMNtXbv_iTSDCw3o&r=6x3nn&utm_campaign=email-half-magic-comments&action=post-comment&utm_source=substack&utm_medium=email>
Restack
<https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9vcGVuLnN1YnN0YWNrLmNvbS9wdWIvYXN0cmFsY29kZXh0ZW4vcC9ub3ctaS1yZWFsbHktd29uLXRoYXQtYWktYmV0P3V0bV9zb3VyY2U9c3Vic3RhY2smdXRtX21lZGl1bT1lbWFpbCZ1dG1fY2FtcGFpZ249ZW1haWwtcmVzdGFjay1jb21tZW50JmFjdGlvbj1yZXN0YWNrLWNvbW1lbnQmcj02eDNubiZ0b2tlbj1leUoxYzJWeVgybGtJam94TVRZeU1qQTRNeXdpY0c5emRGOXBaQ0k2TVRZM056QTVORFl4TENKcFlYUWlPakUzTlRFNU56YzBOemdzSW1WNGNDSTZNVGMxTkRVMk9UUTNPQ3dpYVhOeklqb2ljSFZpTFRnNU1USXdJaXdpYzNWaUlqb2ljRzl6ZEMxeVpXRmpkR2x2YmlKOS5iZWtJaV8tY3JkNnhmNnV4VW1BTWh1SFRFSlpHTU50WGJ2X2lUU0RDdzNvIiwicCI6MTY3NzA5NDYxLCJzIjo4OTEyMCwiZiI6dHJ1ZSwidSI6MTE2MjIwODMsImlhdCI6MTc1MTk3NzQ3OCwiZXhwIjoyMDY3NTUzNDc4LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.tsw1FFyxGm7XQIUjQ3XEuMvwwvpEqqxGRfDlBqf7Z0s?&utm_source=substack&utm_medium=email>

-- 
You received this message because you are subscribed to the Google Groups 
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to everything-list+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/everything-list/CAJPayv0t7%3DER%2B5QLt6Gx6VyavbWTr4Q43ZeHUbVwHuTad5Z5ew%40mail.gmail.com.

Reply via email to