To demonstrate data set size will be the least of our problems, take a random paper published in support of a controversial microsocial theory such as "Firearm availability and homicide: A review of the literature" since that is the first scholarly result I get from googling "gun control studies" (via Harvard Injury Control Research Center <https://www.hsph.harvard.edu/hicrc/firearms-research/guns-and-death/> where it is the first of several cites).
This paper has 174 cites according to google scholar and it claims to survey 4+7+10+9 = 30 studies. It concludes "the available evidence is consistent with the hypothesis that increased gun prevalence increases the homicide rate". We can start by adding the data sets from those 30 studies. Not having read this survey nor how it arrives at the conclusion, I can state with some confidence that there will be those who object to the conclusion based on criticisms of one or more of the surveyed studies and the way they choose and/or interpret their data sets. Such criticisms will no doubt offer alternative data sets that are claimed to be less biased or which demonstrate bias in the cited data sets. Indeed, it is likely some of the 174 cites will do just that. So add those data sets to the corpus. Other cites may rely on this survey to set forth other microsocial theories which will produce similar critiques offering other data sets.... and so on and so forth. This will quickly get into a wide range of social measures, and those will each have their longitudinal (time series) dimensions. Note, we are still in the microsocial regime -- what I refer to in my wording of the metaculus claim as "a vast grab-bag of microsocial models". By incorporating all those data sets into a single corpus, all those microsocial models will be challenged to integrate into a consistent macrosocial model so as to achieve a smaller self-extracting archive. It will probably be possible to achieve a compelling macrosocial model using only public domain data but even that is not a necessary precondition. For example, Raj Chetty has managed to wheedle his way into getting access to the IRS records on you and me to do sociology studies for Harvard. Of course, Dr. Chetty is violating a principle of science when he doesn't share his data with the public since how are we to "replicate his results" without replicating his data? Ah, but with the lossless compression prize approach, all we need to do is submit our compression programs to the government and let the government put a single number into the public domain: The respective sizes of the resulting self-extracting archives! Of course, the residual probability exists that government corruption or incompetence will fail, "on accident", to properly execute the compression program and/or report an erroneous size. Incentives are powerful things. But at least we will have raised the level of discourse and lowered the likelihood of bias and fraud. Since gun control is on the verge of sending the State of Virginia into a violent confrontation between ots State government and over 90% of the county governments, it would be the height of irresponsibility for any technically literate organization not pursue an option like this given the relatively small amount of money it would cost compared to the underwritten risk-adjusted liability. On Wed, Dec 25, 2019 at 8:51 PM Matt Mahoney <mattmahone...@gmail.com> wrote: > > > On Wed, Dec 25, 2019, 12:09 PM James Bowery <jabow...@gmail.com> wrote: > >> The Metaculus question is now open: >> >> Will lossless compression fail to be accepted as a macrosociology model >> selection criterion? >> <https://www.metaculus.com/questions/3215/will-lossless-compression-fail-to-be-accepted-as-a-macrosociology-model-selection-criterion/> >> > > For this to work you need huge data sets and a compression competition. > Google and Facebook can collect it but are unlikely to make it public. > Census data is a possibility. > > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/Tf67f6c4584fac2f7-M05a4c1a27bf5370cf0fae95f> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tf67f6c4584fac2f7-M4dab31883ee3a325b8625b90 Delivery options: https://agi.topicbox.com/groups/agi/subscription