[nexa] Illusions of Understanding from Outsourcing Thinking to LLMs

Daniela Tafani via nexa Fri, 15 May 2026 14:39:05 -0700

[...]
LLMs Can Be Useful, but not for any Task

The reinforcement learning neural network models driving thefunctionality of current LLMs constitute major technologicaldevelopments (McClelland, 2009; McClelland et al., 2003). Thecapabilities of these models have been gathering pace over the pastdecades mainly through advances in computing power and the amount ofdata available for training - i.e. for successively adjusting theweights of the network nodes used for stochastic predictions (Perfors,2026). But the basic model functionality has remained constant: based onhigh-dimensional correlation matrices describing the frequency ofco-occurrence of data units over time (language) or space (graphics),the models can take user input as the start of a pattern and use it tocompute the most plausible continuation of that pattern.

The capacity for high-dimensional pattern matching and extension (alsoreferred to as "autocomplete", Bergstrom, 2025) can be useful in avariety of domains, not least because the distilled patterns allow forgeneralisation beyond the individual instances on which they are based(Lake & Baroni,2023; Peters & Chin-Yee, 2025; but see Becker et al.,2025). For example, when trained on the respective content domains, LLMscan help identify patterns in chemical structures (Jumper et al., 2021),in clinical samples (Epping et al., 2025); and between words ofdifferent languages (Gao et al., 2024; but see Maiberg, 2026).

The mechanism of matching and generalising probabilistic patterns basedon information from a given database is less useful for tasks thatrequire other types of mechanisms for their solution. For instance,tasks requiring contextual sensitivity and hence a solution to the frameproblem in artificial intelligence (Oaksford & Chater, 2009; Pylyshyn,1987); high accuracy and precision (Hsu, 2025; Kalai et al., 2025); ornovel, creative solutions for which no pattern or template has yet beenbuilt (Habib et al., 2023; Meincke et al., 2025).

The mechanism based limitations in the scope of applicability of LLMsare often masked in current discourse about them, a problem complicatedby the optimisation of LLMs for the production of generic, plausible andconfident appearing output regardless of how the output relates to whatis in fact the case (Kalai et al., 2025). This risks creating theillusion that LLMs can do things that they cannot, and that they have aconnection to truth and understanding that they do not.


LLMs Cannot Think

The companies marketing their LLMs often describe them withanthropomorphising terms like "thinking" and "reasoning", which mightcreate the impression that they can think (Mirzadeh et al., 2025;Shojaee et al., 2026). But for that impression to be accurate we wouldhave to stretch the meaning of the term to refer trivially to whateverthe LLMs produce as output - much like the meaning of intelligence hashistorically been watered down to whatever the tests used tooperationalise the construct measured (Loru et al., 2025; Mitchell,2023; Quattrociocchi & Capraro, 2025; van der Maas et al., 2021). Thetask of developing systems with non-trivial capability for human-likecognition is computationally intractable (van Rooij et al., 2024).

Focussing on the foundation rather than on the endpoint, to me there isa simple and inescapable basis to any thinking and reasoning: logicalconsistency. Just as we cannot see both interpretations of an ambiguousimage like the rabbit-duck illusion or the Necker Cube at the same time(Gopnik & Rosati, 2001), we are incapable of assigning meaning to theconjunction of two contradictory statements. We can focus our attentionon the meaning of one statement and then move over to the meaning of theother, but we cannot integrate them into a single meaningfulrepresentation. Thinking and understanding break down when we encounteran inconsistency, like an alarm signal that prompts us to stop andreevaluate the situation (Johnson-Laird et al., 2004); and even thinkingthat is not outright contradictory but moves fast and loose from onerepresentation to another one incompatible with it is classified as aformal thought disorder (Holyoak & Morrison, 2005). This does not implypeople are good at detecting inconsistencies regardless of problemcomplexity (Oberauer et al., 2016); but merely that it is a foundation,however local and fragile, on which thinking and understanding depends(Oaksford & Chater, 2020; Wheeler, 2026).

Now, one of the more notorious features of LLMs is their logicalinconsistency. They routinely produce contradictory output or outputthat changes the topic mid-argument, and construct so-called"hallucinations" or "bullshit" responses (Frankfurt, 2005; Hicks et al.,2024; Kalai et al., 2025) in unforeseeable ways (Hägele et al., 2026).Further, LLMs seem incapable of detecting when such inconsistenciesoccur and just keep producing further output unabated - hence theirfunctionality breaks down in ways different from how human thinkingbreaks down. This makes sense as their inconsistency is not a bug but anatural consequence of the stochastic mechanisms underlying them,together with their disconnection from any ground truth about whichrelatively stable conceptual representations could be formed (Kalai etal., 2025; Spencer-Brown, 1969; Wittgenstein, 1991). LLM developers havethemselves stated that the problem of inconsistent, nonsensical outputis impossible in principle to overcome, regardless of the amount ofcomputing power and training data the models are based on (Shojaee etal., 2026; Song and Han, 2026).

The path from LLMs to thinking machines thus seems impossible from theoutset due to the absence by design of the requirement for consistency.Many older computational models exist that fulfil the consistencyrequirement. But the capacity for both consistency and scalabilityremains an open, potentially unsolvable problem (Gödel, 1931; Kwisthoutet al., 2011; Pylyshyn, 1987).


LLMs Can Undermine Thinking and Understanding

Thinking and reasoning, and with them knowledge and understanding, canimprove with practice, and they can deteriorate without practice. LLMsare sometimes compared to electronic calculators (Geuter, 2024; Voineaet al., 2026), which have greatly increased the speed and accuracy ofeveryday calculations. The concomitant reduction in the need for simplemental arithmetic may have led to a decrease in our average mentalarithmetic skills - but it freed up time to engage in potentially morecomplex and creative tasks. At the same time, our collectiveunderstanding of simple arithmetic has arguably not declined because thearithmetic rules by which calculators operate are transparent, preciseand can be looked up in reliable sources anytime we need them (Sloman &Fernbach, 2017).

The situation is different in several ways for LLMs. They are being usedto replace complex and creative tasks that draw on our capacity forcritical thinking (Reuters, 2026). They have the feature of producingseemingly plausible but imprecise and sometimes wildly inaccurateoutput, and they are intransparent about their sources - although theirtraining data tends to include any information from the internet,however unreliable and regardless of legal requirements for sourceacknowledgment (Blau et al., 2024; Gewirtz, 2025; Meyer, 2025). Forexample, if asked for a solution to Lord’s paradox (Lord, 1967), a LLMmight produce different output each time it is asked, and every time theoutput may sound plausible but may be justified in part by false ornonexistent evidence that is difficult to detect by nonexperts in thefield (Fisher, 2021; Walters & Wilder, 2023).

The literature on the impact of LLMs on thinking and understanding isstill very new and preliminary. But some studies have pointed to reducedtask engagement and learning when relying on LLMs (Melumad, 2025; Shen,2026; Stadler et al., 2024); and based on the existing literature oncognition we can expect the principle "use it or lose it" to apply heretoo (Bainbridge, 1983; Furman, 2025; Mızrak, 2020). In contrast to thecalculator example, what we risk undermining in this case is ourcapacity for critical thinking, and the source reliability andtransparency on which our collective understanding depends. This comesin addition to LLM enabled mass production of slop, mis- anddisinformation (Clark & Lewandowsky, 2026; Furman, 2025; Köbis &Doležalová, 2021; Perfors, 2025; Thorp, 2026).

Technology is arguably not value neutral, and the ways in which currentLLMs have been built and deployed risk undermining not only our thinkingand understanding as individuals but also our participation as active,diverse citizens in democratic decision making processes (Kant, 1784;Lewandowsky & Hertwig, 2025; Lewandowsky & Garcia, 2026). Huxley’sdystopic novel Brave New World (Huxley, 1932) might reflect a ludditeposition, which might sound pejorative in first instance. But itillustrates that technology can take us in different directions towardsdifferent societal goals, which are worth thinking about.


There Are no Shortcuts to Understanding

Understanding doesn’t work without thinking, which is often hard,cumbersome and full of errors. It will also keep trapping us inillusions, as Shiffrin et al. point out. But there is no free lunch tounderstanding. If we keep working on it we have reason to expect to keepescaping some of the illusions and increase our understanding over time- following the positive side of the "use it or lose it" principle. Someuses of LLMs may not undermine understanding, and in some cases we canavoid illusions by making an active decision about which parts of ourthought processes, if any, to replace with their output.


https://doi.org/10.1007/s42113-026-00288-6

[nexa] Illusions of Understanding from Outsourcing Thinking to LLMs

Reply via email to