Re: [agi] “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

James Bowery Mon, 24 Jul 2023 10:45:57 -0700

On Mon, Jul 24, 2023 at 11:46 AM Matt Mahoney <[email protected]>
wrote:


> On Sun, Jul 23, 2023, 8:16 PM James Bowery <[email protected]> wrote:
>
>> https://aclanthology.org/2023.findings-acl.426.pdf
>>
>
> Yes, and I think the only reason gzip didn't outperform the other text
> classifiers on the largest data sets is that it only finds matching strings
> over a 32 KB window.
>
> Of course, text classification is mostly adversarial, used for spam
> filtering and censorship. String matching can be easily defeated by
> deliberately misspelling words like "v1agra". But smarter algorithms that
> understand text and images have solved the problem. Your inbox is no longer
> full of spam and viruses. Your political posts just get down ranked instead
> of a ban.
>
> For some reason this reminds me of Matt's distributed competitive routing
>> AGI proposal <https://www.mattmahoney.net/agi2.html>:
>>
>
> Ah, yes, in 2008 before smart phones, social media, blockchain, and the
> Arab Spring ushering the demand for internet censorship. A time when we
> were young and idealistic and thought that our ideas about AGI would change
> the world. Now we are older and watching big tech solve the problems we
> failed to solve, and maybe not liking those changes. Instead of the
> internet being a tool for the people to control the government, it is
> becoming the other way around.
>
> Well I did warn you that AGI would be expensive....
>

Not to harp on the IS vs OUGHT distinction but if one factors the decision
half to individuals, we're left with approximating the Algorithmic
Information of internet data, with expansion encached via mechanisms like
Information Centric Networking to speed query response.


> Maybe I am being pessimistic about P2P networks. Freenet and Tor are
> mostly unusable because they lack search engines. Napster was killed by
> shutting down it's centralized search service. USENET was O(n^2) and
> disappeared. Mastodon lacks a funding model. Bitcoin uses 1% of the world's
> electricity. Ethereum with proof of stake and support for arbitrary
> messages is still O(n^2) making transactions unaffordable for widespread
> use. I'm afraid that censorship is here to stay, in part because we want it
> as long as it's applied to other voices we want silenced.
>

We're already seeing search engines threatened by queries to models and
we're already seeing a huge, although technically misguided, movement
toward "personal language models".

Think about it like this:

If the 1 billion context window guys at MS research are right and the
approximation of the algorithmic information can be divided among a hundred
million otherwise idle screensavers/smarft phones...

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T4dbad1e5c8d7f685-M39650a693c543e0890a2feca
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

Reply via email to