The proof of language independence in Kolmogorov complexity as the data gets larger is that you can always change the language by appending a fixed sized translator. For English, that's about 10^9 bits, which is a factor of 2^10^9 ≈ 10^300,000,000 probability difference.
We can reduce this problem by using the simplest language possible, like your idea of a state machine made of the fewest 2 input NOR gates that sequentially outputs enwik9. But these are really hard to program. Wolfram held a contest to prove that a 2 state non halting Turing machine with a base 3 tape is universal. But nobody knows how to even write a "hello world" program on it. Which raises the bigger problem that Kolmogorov complexity is not computable. The winning theory then becomes the one that people put the most effort into solving. Which raises the even bigger problem that as you mentioned, motivation, ego, and money drive science. Scientists who should know better still want to prove themselves right. If the experiment doesn't give the right answer, then fix the experiment. This happens even in physics, but is especially bad in medicine and the social sciences where you can cherry pick the data that supports your theories. Suppose you want to answer the question of whether covid-19 vaccines are safe and effective. The data set huge. Just on Worldometer you have weekly case, hospitalization and death rates by week and county with vaccination rates and test coverage. There are thousands of studies, millions of genome sequences of different strains, billions of raw data points for individual cases, and tracking data for billions of people in Asian countries where people had to run apps or wear a device that continually reported their location to the government. Do you compress all of it? What about data you think is irrelevant? What about data you think is unreliable? What about studies that were not peer reviewed? What about studies funded by vaccine makers? Do you trust the US CDC? Do you trust the Chinese CDC? Do you trust Turkmenistan, the only country to report zero cases throughout the pandemic? Who gets to decide which data to include? How do you convince people who believe that the moon landing was fake? How do you convince people when anything on the Internet could be fake? When any text or image or video could be created by AI? -- Matt Mahoney, [email protected] On Sun, Nov 23, 2025, 10:30 AM James Bowery <[email protected]> wrote: > There are, of course, an infinite number of "arguments" one can come up > with to expand what Nick Szabo calls the "Argument Surface" and that is > where the real "problem for statistics about people" arises -- not in the > choice of language ambiguity. People who are not motivated to get rid of > motivated reasoning will not be motivated to solve problems like the choice > of language ambiguity -- as just one example of many. I will grant, > however, that particular redoubt is only for the elect who, like you and I, > have been involved with judging the Hutter Prize. IIRC, even Shane Legg > sets forth that argument as a reason to avoid the ALgorithmic Information > Criterion -- and you can't get much more authoritative than that unless you > go to Hutter himself or, in the hypothetical case, Solomonoff. I did > express concern to Marcus at one time, when Solomonoff was still living and > shortly after the Hutter Prize had been announced, that Solomonoff might > "torpedo" the Hutter Prize with that argument (if I recall the exact > wording). Marcus reassured me that Solomonoff would do no such thing. > IIRC shortly thereafter Solomoff posted something like that argument to his > blog. IIRC Marcus objected to using the ALIC for global warming despite > the Biden administration setting the value of addressing that issue at > around $10T/year -- and I can see merit in that objection given the scale > of the data. > > But it all comes down to "incentives" when we are addressing the > "motivated reasoning" problem and that's why I posted my Congressional > testimony about the "incentives" regarding rocket technology -- which you > commented on but did not seem to get the point I was trying to make about > incentives. > > Once we're in the realm of macrosocial psychological dynamical models, the > incentives are so great as to beggar the imagination. This is far greater > even than Biden's rNPV of $10T/year and the macrosocial psychology data is > many orders of magnitude smaller than climate data. That said, there is > room for your concern about choice of language in conjunction with the > identification "noise" regarding which, as I've often pointed out: "one > man's noise is another man's cyphertext". > > So we have two "argument surfaces" here: > > How much of the macrosocial dataset is "*noise*" as opposed to > inadequately motivated forensic epistemology "decyphering" that noise? > > How much of the wiggle room for *choice of language *can be squeezed out > by forensic epistemology motivated by an rNPV of $10T/year, ie: well in > excess of $100T, with let's say only 1% of that amount going to ALIC > research: >$1T? > > First of all, recognize that the exploit you regard is decisive > is miniscule compared to the argument surface presently not only tolerated > but exploited by the academy, think tanks and punditry. At present there > is virtually nothing BUT macrosocial psychological "argument surface", e.g. > arguments such as the one to which you appealed for normative alignment of > young men to be optimistic lest their pessimism be a self fulfilling > prophecy. > > Secondly, forensic epistemology is precisely about *presuming* criminal > behavior such as that to which you appeal as a reason for despair. With > >$1T at stake there will be enormous motivation to suss out issues > regarding "language choice" and I can easily demonstrate that none of the > existing authorities have been sufficiently motivated to reduce that aspect > of the argument surface: > > As I've pointed out before, not only is there an entirely different > theoretical basis for addressing that reason (really excuse) to support > avoidance of scientific accountability by our policy makers (ie: NiNOR > Complexity), but there are obvious, at-hand, techniques to reduce that > argument surface. For example, a GPU provides an "instruction set", ie > "language", that is radically different from a CPU. So are we to now throw > up our hands in despair and let those in power get away with "Well gee who > could have KNOWN???" when things don't go "according to projections"? > Really? Why am I the ONLY person to have addressed the *obvious* fact > that a GPU's "instruction set" is describable as a relatively tiny > procedure in a canonical instruction set and that procedure's algorithmic > length should be used? > > Could it be that, perhaps, I'm the only sufficiently MOTIVATED person > among those who have been taking information criteria remotely seriously? > > > On Thu, Nov 20, 2025 at 5:27 PM Matt Mahoney <[email protected]> > wrote: > >> On Thu, Nov 20, 2025, 10:11 AM James Bowery <[email protected]> wrote: >> >>> >>> >>> On Wed, Nov 19, 2025 at 11:19 AM Matt Mahoney <[email protected]> >>> wrote: >>> >>>> Algorithmic information or compression is great for evaluating language >>>> models but not for everything.... >>>> >>>> I could try compressing world population data by fitting it to a >>>> polynomial, >>>> >>> >>> Do you understand the difference between statistics and dynamics? >>> >> >> No, it's the difference between compressing text and compressing video. >> You can't accurately measure the compression of a tiny signal in a sea of >> noise. >> >> This becomes a problem for statistics about people. It only takes a few >> bits of Kolmogorov complexity for social scientists to construct models >> that favor one group over another, and those bits can be hidden in the >> choice of language ambiguity. >> >> I think it would be great if we could answer political questions >> objectively. So how would you solve the problem? >> >> >>> <https://agi.topicbox.com/groups/agi/T504adacb23f3c455-Md49fd5f054dbc9f5d8062388> >>> >> -- Matt Mahoney, [email protected] >> > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T504adacb23f3c455-M7931c7e8f4f1083a38690dce> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T504adacb23f3c455-M99fe6983586de890e5b7d816 Delivery options: https://agi.topicbox.com/groups/agi/subscription
