On Sun, Aug 6, 2023 at 11:28 AM James Bowery <[email protected]> wrote:
> On Sun, Aug 6, 2023 at 9:53 AM Matt Mahoney <[email protected]> wrote:
>>
>> ... In the US, racial discrimination has been illegal since the 1960's and 
>> television has been portraying a colorblind world since the 1970's with no 
>> effect....
>
>
> This is the kind of thing that would be verified or debunked by Hume's 
> Guillotine:
>
> https://github.com/jabowery/HumesGuillotine
>
> The endless yammering at each other (particularly in the guise of "social 
> science") is getting us nowhere.  People are in hysterics and getting more 
> hysterical.

I was going to ask exactly what data you would compress to prove your
social theories. But you already answered my question. Here is some
baseline data.

27,077,896 LaboratoryOfTheCountiesUncompressed.csv-8.paq8o
28,322,300 LaboratoryOfTheCountiesUncompressed-m57.zpaq
28,449,825 LaboratoryOfTheCountiesUncompressed-m5.zpaq
28,741,625 LaboratoryOfTheCountiesUncompressed.pmm
29,521,520 LaboratoryOfTheCountiesUncompressed-b100m.bcm
30,380,751 LaboratoryOfTheCountiesUncompressed-m4.zpaq
30,380,751 LaboratoryOfTheCountiesUncompressed-m3.zpaq
33,305,581 LaboratoryOfTheCountiesUncompressed-m256-o16-r1.pmd
33,311,991 LaboratoryOfTheCountiesUncompressed.csv.7z
34,559,264 LaboratoryOfTheCountiesUncompressed.csv.bz2
36,253,433 LaboratoryOfTheCountiesUncompressed-m5.rar
38,504,064 LaboratoryOfTheCountiesUncompressed-9.zip
40,647,091 LaboratoryOfTheCountiesUncompressed-m2.zpaq
43,370,210 LaboratoryOfTheCountiesUncompressed-m1.zpaq
91,360,518 LaboratoryOfTheCountiesUncompressed.csv

These are not in the contest format of a 32 or 64 bit Linux self
extracting archive and they don't include the decompressor size. But
they all easily fit within the contest CPU time and memory limits. The
slowest, and top program, was paq8o -8 taking 77 minutes and 1.6 GB of
memory on a Core i7-1165G7, 2.80 GHz, 16 GB, Win11. For all programs I
selected options for max compression.

The input is a giant CSV file, a text file with 3199 rows each
representing a US county and 6624 columns representing economic,
demographic, and crime data. The lines are separated by linefeed
characters and the columns by tabs. The data is in decimal numeric
format, either integers without commas or with one decimal point. Row
and column headers quoted. The county names are replaced by numbers.
The meanings of the columns are described in a set of auxiliary files
that are not part of the contest.

I am pretty sure that a program that found correlations in the data,
such as between population, race, age, income, and crime, would
achieve better compression. How would we use this information to set
policy?

-- 
-- Matt Mahoney, [email protected]

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T772759d8ceb4b92c-M778ed3794853b3e4601f0756
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to