On Sun, Aug 6, 2023 at 11:28 AM James Bowery <[email protected]> wrote: > On Sun, Aug 6, 2023 at 9:53 AM Matt Mahoney <[email protected]> wrote: >> >> ... In the US, racial discrimination has been illegal since the 1960's and >> television has been portraying a colorblind world since the 1970's with no >> effect.... > > > This is the kind of thing that would be verified or debunked by Hume's > Guillotine: > > https://github.com/jabowery/HumesGuillotine > > The endless yammering at each other (particularly in the guise of "social > science") is getting us nowhere. People are in hysterics and getting more > hysterical.
I was going to ask exactly what data you would compress to prove your social theories. But you already answered my question. Here is some baseline data. 27,077,896 LaboratoryOfTheCountiesUncompressed.csv-8.paq8o 28,322,300 LaboratoryOfTheCountiesUncompressed-m57.zpaq 28,449,825 LaboratoryOfTheCountiesUncompressed-m5.zpaq 28,741,625 LaboratoryOfTheCountiesUncompressed.pmm 29,521,520 LaboratoryOfTheCountiesUncompressed-b100m.bcm 30,380,751 LaboratoryOfTheCountiesUncompressed-m4.zpaq 30,380,751 LaboratoryOfTheCountiesUncompressed-m3.zpaq 33,305,581 LaboratoryOfTheCountiesUncompressed-m256-o16-r1.pmd 33,311,991 LaboratoryOfTheCountiesUncompressed.csv.7z 34,559,264 LaboratoryOfTheCountiesUncompressed.csv.bz2 36,253,433 LaboratoryOfTheCountiesUncompressed-m5.rar 38,504,064 LaboratoryOfTheCountiesUncompressed-9.zip 40,647,091 LaboratoryOfTheCountiesUncompressed-m2.zpaq 43,370,210 LaboratoryOfTheCountiesUncompressed-m1.zpaq 91,360,518 LaboratoryOfTheCountiesUncompressed.csv These are not in the contest format of a 32 or 64 bit Linux self extracting archive and they don't include the decompressor size. But they all easily fit within the contest CPU time and memory limits. The slowest, and top program, was paq8o -8 taking 77 minutes and 1.6 GB of memory on a Core i7-1165G7, 2.80 GHz, 16 GB, Win11. For all programs I selected options for max compression. The input is a giant CSV file, a text file with 3199 rows each representing a US county and 6624 columns representing economic, demographic, and crime data. The lines are separated by linefeed characters and the columns by tabs. The data is in decimal numeric format, either integers without commas or with one decimal point. Row and column headers quoted. The county names are replaced by numbers. The meanings of the columns are described in a set of auxiliary files that are not part of the contest. I am pretty sure that a program that found correlations in the data, such as between population, race, age, income, and crime, would achieve better compression. How would we use this information to set policy? -- -- Matt Mahoney, [email protected] ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T772759d8ceb4b92c-M778ed3794853b3e4601f0756 Delivery options: https://agi.topicbox.com/groups/agi/subscription
