[...]
LLMs Can Be Useful, but not for any Task
The reinforcement learning neural network models driving the
functionality of current LLMs constitute major technological
developments (McClelland, 2009; McClelland et al., 2003). The
capabilities of these models have been gathering pace over the past
decades mainly through advances in computing power and the amount of
data available for training - i.e. for successively adjusting the
weights of the network nodes used for stochastic predictions (Perfors,
2026). But the basic model functionality has remained constant: based on
high-dimensional correlation matrices describing the frequency of
co-occurrence of data units over time (language) or space (graphics),
the models can take user input as the start of a pattern and use it to
compute the most plausible continuation of that pattern.
The capacity for high-dimensional pattern matching and extension (also
referred to as "autocomplete", Bergstrom, 2025) can be useful in a
variety of domains, not least because the distilled patterns allow for
generalisation beyond the individual instances on which they are based
(Lake & Baroni,2023; Peters & Chin-Yee, 2025; but see Becker et al.,
2025). For example, when trained on the respective content domains, LLMs
can help identify patterns in chemical structures (Jumper et al., 2021),
in clinical samples (Epping et al., 2025); and between words of
different languages (Gao et al., 2024; but see Maiberg, 2026).
The mechanism of matching and generalising probabilistic patterns based
on information from a given database is less useful for tasks that
require other types of mechanisms for their solution. For instance,
tasks requiring contextual sensitivity and hence a solution to the frame
problem in artificial intelligence (Oaksford & Chater, 2009; Pylyshyn,
1987); high accuracy and precision (Hsu, 2025; Kalai et al., 2025); or
novel, creative solutions for which no pattern or template has yet been
built (Habib et al., 2023; Meincke et al., 2025).
The mechanism based limitations in the scope of applicability of LLMs
are often masked in current discourse about them, a problem complicated
by the optimisation of LLMs for the production of generic, plausible and
confident appearing output regardless of how the output relates to what
is in fact the case (Kalai et al., 2025). This risks creating the
illusion that LLMs can do things that they cannot, and that they have a
connection to truth and understanding that they do not.
LLMs Cannot Think
The companies marketing their LLMs often describe them with
anthropomorphising terms like "thinking" and "reasoning", which might
create the impression that they can think (Mirzadeh et al., 2025;
Shojaee et al., 2026). But for that impression to be accurate we would
have to stretch the meaning of the term to refer trivially to whatever
the LLMs produce as output - much like the meaning of intelligence has
historically been watered down to whatever the tests used to
operationalise the construct measured (Loru et al., 2025; Mitchell,
2023; Quattrociocchi & Capraro, 2025; van der Maas et al., 2021). The
task of developing systems with non-trivial capability for human-like
cognition is computationally intractable (van Rooij et al., 2024).
Focussing on the foundation rather than on the endpoint, to me there is
a simple and inescapable basis to any thinking and reasoning: logical
consistency. Just as we cannot see both interpretations of an ambiguous
image like the rabbit-duck illusion or the Necker Cube at the same time
(Gopnik & Rosati, 2001), we are incapable of assigning meaning to the
conjunction of two contradictory statements. We can focus our attention
on the meaning of one statement and then move over to the meaning of the
other, but we cannot integrate them into a single meaningful
representation. Thinking and understanding break down when we encounter
an inconsistency, like an alarm signal that prompts us to stop and
reevaluate the situation (Johnson-Laird et al., 2004); and even thinking
that is not outright contradictory but moves fast and loose from one
representation to another one incompatible with it is classified as a
formal thought disorder (Holyoak & Morrison, 2005). This does not imply
people are good at detecting inconsistencies regardless of problem
complexity (Oberauer et al., 2016); but merely that it is a foundation,
however local and fragile, on which thinking and understanding depends
(Oaksford & Chater, 2020; Wheeler, 2026).
Now, one of the more notorious features of LLMs is their logical
inconsistency. They routinely produce contradictory output or output
that changes the topic mid-argument, and construct so-called
"hallucinations" or "bullshit" responses (Frankfurt, 2005; Hicks et al.,
2024; Kalai et al., 2025) in unforeseeable ways (Hägele et al., 2026).
Further, LLMs seem incapable of detecting when such inconsistencies
occur and just keep producing further output unabated - hence their
functionality breaks down in ways different from how human thinking
breaks down. This makes sense as their inconsistency is not a bug but a
natural consequence of the stochastic mechanisms underlying them,
together with their disconnection from any ground truth about which
relatively stable conceptual representations could be formed (Kalai et
al., 2025; Spencer-Brown, 1969; Wittgenstein, 1991). LLM developers have
themselves stated that the problem of inconsistent, nonsensical output
is impossible in principle to overcome, regardless of the amount of
computing power and training data the models are based on (Shojaee et
al., 2026; Song and Han, 2026).
The path from LLMs to thinking machines thus seems impossible from the
outset due to the absence by design of the requirement for consistency.
Many older computational models exist that fulfil the consistency
requirement. But the capacity for both consistency and scalability
remains an open, potentially unsolvable problem (Gödel, 1931; Kwisthout
et al., 2011; Pylyshyn, 1987).
LLMs Can Undermine Thinking and Understanding
Thinking and reasoning, and with them knowledge and understanding, can
improve with practice, and they can deteriorate without practice. LLMs
are sometimes compared to electronic calculators (Geuter, 2024; Voinea
et al., 2026), which have greatly increased the speed and accuracy of
everyday calculations. The concomitant reduction in the need for simple
mental arithmetic may have led to a decrease in our average mental
arithmetic skills - but it freed up time to engage in potentially more
complex and creative tasks. At the same time, our collective
understanding of simple arithmetic has arguably not declined because the
arithmetic rules by which calculators operate are transparent, precise
and can be looked up in reliable sources anytime we need them (Sloman &
Fernbach, 2017).
The situation is different in several ways for LLMs. They are being used
to replace complex and creative tasks that draw on our capacity for
critical thinking (Reuters, 2026). They have the feature of producing
seemingly plausible but imprecise and sometimes wildly inaccurate
output, and they are intransparent about their sources - although their
training data tends to include any information from the internet,
however unreliable and regardless of legal requirements for source
acknowledgment (Blau et al., 2024; Gewirtz, 2025; Meyer, 2025). For
example, if asked for a solution to Lord’s paradox (Lord, 1967), a LLM
might produce different output each time it is asked, and every time the
output may sound plausible but may be justified in part by false or
nonexistent evidence that is difficult to detect by nonexperts in the
field (Fisher, 2021; Walters & Wilder, 2023).
The literature on the impact of LLMs on thinking and understanding is
still very new and preliminary. But some studies have pointed to reduced
task engagement and learning when relying on LLMs (Melumad, 2025; Shen,
2026; Stadler et al., 2024); and based on the existing literature on
cognition we can expect the principle "use it or lose it" to apply here
too (Bainbridge, 1983; Furman, 2025; Mızrak, 2020). In contrast to the
calculator example, what we risk undermining in this case is our
capacity for critical thinking, and the source reliability and
transparency on which our collective understanding depends. This comes
in addition to LLM enabled mass production of slop, mis- and
disinformation (Clark & Lewandowsky, 2026; Furman, 2025; Köbis &
Doležalová, 2021; Perfors, 2025; Thorp, 2026).
Technology is arguably not value neutral, and the ways in which current
LLMs have been built and deployed risk undermining not only our thinking
and understanding as individuals but also our participation as active,
diverse citizens in democratic decision making processes (Kant, 1784;
Lewandowsky & Hertwig, 2025; Lewandowsky & Garcia, 2026). Huxley’s
dystopic novel Brave New World (Huxley, 1932) might reflect a luddite
position, which might sound pejorative in first instance. But it
illustrates that technology can take us in different directions towards
different societal goals, which are worth thinking about.
There Are no Shortcuts to Understanding
Understanding doesn’t work without thinking, which is often hard,
cumbersome and full of errors. It will also keep trapping us in
illusions, as Shiffrin et al. point out. But there is no free lunch to
understanding. If we keep working on it we have reason to expect to keep
escaping some of the illusions and increase our understanding over time
- following the positive side of the "use it or lose it" principle. Some
uses of LLMs may not undermine understanding, and in some cases we can
avoid illusions by making an active decision about which parts of our
thought processes, if any, to replace with their output.
https://doi.org/10.1007/s42113-026-00288-6