See the rules for Matt's large text compression benchmark. You *always* add the length of the decompression program to the length of the compressed data in order to approximate the Kolmogorov Complexity of the data.
Only by bringing the errors of the model into the same units (bits) as the model in a rigorous manner can you avoid "cheating" the way the social pseudosciences do in their quasi-theological sophistry that threatens to result in a mass kill-off of people in the West. It's really a crying shame the billion word benchmark doesn't use Matt's rules. It might have raised awareness to the point that the leadership at Google, opiated as they are on network effect capture, could avert mass bloodshed by promoting a genuinely fair and objective means of resolving disputes over social theories. That is to say, promote, for the first time in history, genuine social science. On Sun, Oct 6, 2019 at 6:02 PM <[email protected]> wrote: > I was working on the compression prize last month ago or so for many > reasons. I failed every time but I learned a lot more about trees, bin > files, etc. Every time I'd come up with a unique solution that's innovative > but however was not verified to do the job well enough > > One idea was that 9^9^9^9^9 creates a massive number in binary. Using just > a few bits. It's most of the file. You can generate a huge movie - at least > a few thousand for sure out of all possible movies. Better yet, you can > manipulate it with high precision ex. 9^9^9+1 changes the binary data by 1 > single bit increase. You could attempt to recreate small parts of the > binary stream like this, instead of 9^9 for the WHOLE bloody file. > > One idea was a binary tree - stick parts of the 100MB binary stream into a > binary tree and remember the order the branches lay in. Both "10100001" and > "101110101" share the first 3 bits. Binary is already storing massive > numbers. So this idea topped the cake, it stores binary code in a yet > better code. > > One idea was to store the binary bit length (or word length), the dict of > all words, and add word by word based on a range working through all > possible files. Last I remember this would have worked if there wasn't SO > much unstructured crap in the 100MB :-) > > Another idea I was just pondering is taking the smallest text file, > smallest compressor, and finding the maximal compression, but slightly > larger files each time. You could possibly learn what max compression looks > like. > Also, the decompressor and compressed txt file are smallest when both are > smaller in size - the decompressor could have ALL the data in it and cheat > while the compressed file has 0 bits, therefore if both are evenly sized > both might be smallest in size! Like this: > > A=compressed file size > B=decompressor file size > > ---------- > N/A > > or > > N/A > ---------- > > or > > -- > -- > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/T2d0576044f01b0b1-Mf3772d7f6160b16a4d6ecb65> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T2d0576044f01b0b1-Mca43428d512098ab49760e5f Delivery options: https://agi.topicbox.com/groups/agi/subscription
