Hi Linas! It looks like you are still subscribed. I miss the days before AI
was solved when we still had a lot of people working on the priblem.

Anyway, I'm working on a Hutter prize entry. I'm on the committee like
James, so I'm not eligible for prize money, but it's a fun project and I'm
learning something. Kind of like Edison when he was trying to invent the
light bulb and learned 1000 things that didn't work. Data compression
research is like that. I was critical of the hardware limits, but now I
think it is still possible to develop a human level small language model
(SLM) that will run on a PC. I think 10 GB should be sufficient because it
is 80 times the size of human long term memory. It should be possible to
train a 10^9 parameter model with a 50K token vocabulary in about a day, or
10,000 x real time. A SLM will take AI control out of the hands of
billionaires and give it to the people. My code is GPL v3 licensed.

So far my program compresses enwik9 (1 GB Hutter corpus) to 145 MB in 10
minutes, which is near the Pareto frontier on the LTCB benchmark that I
started in 2006. It uses 5 stages:

A - article sorting by topic. I tried clustering by token matching but
wasn't able to improve on the order used in the latest Hutter entry, so I
just used that.

B - basic XML decoding, separating the text from the metadata like
contributor ID, timestamp, etc, and decoding markup like "&amp;lt;" to "<".

C - Capitalization and space encoding, so "This is a test" becomes
"^ThisIsATest".

D - dictionary single byte encoding. I use byte pair encoding to code 255
common letter groupings like "ing" or "The" as single bytes. These 4 steps
reduce enwik9 to 580 MB in about a minute, which speeds up the slow step of
entropy coding.

E - entropy coding. Right now just a contiguous context model. I plan to
add a language model with a token dictionary, short term memory, and a next
token matrix represented by a neural network using hidden neurons to
represent the sparse parts of the matrix.

Those predictions will be mixed with the context model. The context model
is a byte level order 0-1-2-3-4-6 ICM-ISSE chain, a match model, and mixer
from ZPAQ. An ICM maps a context to a bit history (an 8 bit state) and then
to a next bit prediction. An ISSE maps a context (including the past bits
of the current byte) to a bit history, which is used to select a pair of
weights to average the stretched bit probability from the previous
component with a constant 1. The match model searches for matching context
and predicts whatever bit came next, weighted by the length of the match.
The mixer averages the stretched predictions of all the other components,
combines them by weighted averaging (using weights selected by the order 0
context) and squashes the sum to a probability, which encodes the next bit
using arithmetic coding. A probability p is stretched by x = ln(p/(1-p))
and squashed by the inverse p = 1/(1 + e^-x). The model is updated by
adjusting all the weights to reduce the prediction error by about 0.1%.

Stuff that doesn't work that I discovered in the last month: Replacing the
bit history with a byte history to predict directly by counting next bits.
This was supposed to save space because the current implementation needs 32
bytes to encode 8 bit histories for a single context match. Adding a symbol
to step C to encode ALL UPPERCASE. Combining steps C and D into one step.
Lots of other minor experiments that didn't work.

-- Matt Mahoney, [email protected]

On Tue, Mar 3, 2026, 4:31 AM Linas Vepstas <[email protected]> wrote:

> Hi James, Hi Matt,
>
> Last time I posted on this mailing list was 25 years ago. I'm posting
> now, because my internet connection failed, my email bounced, and
> emails to AGI are in my inbox, instead of gathering dust on my hard
> drive. I have no clue if this mailing list will allow me to post, or
> if this email will bounce. I shall find out.
>
> James,
> I want to mention something that might feel very off-topic; trust me,
> it will come around.
>
> When I was a freshman in college, second week, I get my first serious
> homework assignment: write a two page paper, double-spaced, wide
> margins, No problem, Easy. I put it off till the night before. Topic:
> Manuscripts of 1844, Karl Marx. Wait, hang on, wasn't he some
> communist or something like that?  But its only like 10-12 pages, so
> whatever. I get through the first paragraph. The second paragraph, I'm
> lost. I read it five times, and wtf ...???
>
> OK, so here's the important part of this story, that you don't know
> about. I'm smart, like, I'm actually smart. So I know how to deal with
> this. Find the subject, find the verb, and see, everything else in
> there is going to be modifiers -- adjectives, adverbs, noun phrases.
> Identify those, and I shall master this Marx guy. I shall be king and
> rule this intellectual roost!
>
> Shit. I can't find the subject, I can't find the verb. I try and I try
> ... fuck me. Where's the period at the end of the sentence? Uhhh
> .......turn the page, ..... uhhh, wait .. there it is .... there's the
> period at the end of the sentence.  The sentence is effing one whole
> effing page long. Suitably armed with this new knowledge, I was able
> to find the subject, the verb, the modifiers and clauses, and figure
> out wtf Marx was writing about. Except its now midnight. So I bang out
> my two-page paper. Go sleep. Hand it in the next morning. (I actually
> got like an A+. Can't complain.)
>
> Turns out no one else understood the Manuscripts of 1844.  This total
> and abject failure caused Marx to re-evaluate his approach, and a few
> years later, he wrote the Communist Manifesto in response. Using plain
> words, simple language, short sentences. Catchy phrases that you could
> remember and chant while you attended a street protest. "History is
> Not Written by Kings and Queens, but by the Workers Relation to his
> Means of Production" -- easy stuff, just rolls of your tongue.
>
> Why is no one reading your socio-political theory? It's too dense.
> What should you do? Turn it into slogans. Better yet: memes, maybe a
> cat saying "can haz socio-political theory"
>
> Why is no one funding you? Same reason political philosophers weren't
> funded, ever, anywhere over millennia of history. Dangerous
> trouble-makers, might incite a riot. You want insurance companies to
> throw money your way? Well, can you take your ideas and turn them into
> some stochastic differential equations that can arbitrage insurance
> risk? No? Errm, well....
>
> Realistically? The number #1 most important thing you can do these
> days is to catch the eye of some youtubers who will take your
> discussion (this discussion) and turn it into comprehensible
> entertainment for those of us with short attention spans and ADHD
> looking for a quick dopamine hit before the next elections. Matt has
> been on this mailing list maybe 25-30 years ... why aren't the youtube
> people here, begging for an interview with Matt Mahoney? You got a
> message? This is the way to get it out. You want to theorize with
> intellectual peers? Well you can do that too, but hard intellectual
> labor doesn't sit well with engaging in the difficult martial battle
> of turning the is-world into the ought-world. Karl Marx got lucky. And
> in his name, great crimes were committed. So it goes.
>
> -- linas
>
> On Wed, Feb 25, 2026 at 5:57 PM James Bowery <[email protected]> wrote:
> >
> >
> >
> > On Wed, Feb 25, 2026 at 3:59 PM Matt Mahoney <[email protected]>
> wrote:
> >>
> >>
> >> The reason that we can use text compression to test language models but
> we can't use video compression to test vision models is signal to noise
> ratio.
> >
> >
> > We can't really use compression to test language models—not because of
> the noise floor, but because of political economy and industrial path
> dependency (ala The Hardware Lottery).  It's absurd that neither the Hutter
> Prize nor your LTCB is funded by the insurance industry with billions of
> dollars in underwriting in order to discount the risk that capital
> investments in current algorithms and hardware will need to be written
> off.  That's all political economy nonsense arising from taxing activity
> rather than the liquidation value of net assets at something like the
> 30-year Treasury rate.  And, yeah, that's _my_ theory of macrosocial
> dynamics showing from my experience getting the government to stop
> protecting public and private monopoly power with the resources of
> entrepreneurs like Musk.  And that's why I started looking for ways of
> reforming the social pseudosciences because no one would believe me until
> now it's almost too late.
> >
> > Given what is at stake in macrosocial dynamical theory, the juice is
> worth the squeeze whereas it may not be in video compression, nor in many
> other areas where less is at stake and therefore there is less of a
> conflict of interest bedeviling the science.
> >
> >
> > Artificial General Intelligence List / AGI / see discussions +
> participants + delivery options Permalink
>
>
>
> --
> Patrick: Are they laughing at us?
> Sponge Bob: No, Patrick, they are laughing next to us.
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tb9c1aaff01c2b823-Mfa897d4342637d31e0bb5c47
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to