@Matt and others, So: So far my enwik8 (100MB) lossless compression score is 20,085,564 bytes (using my AI that predicts/stores patterns). I haven't fully exhausted my ideas/ insight, so expect it to improve lots.
BTW: Many of the top scores on the Large Text Compression Benchmark - even the new nncp despite being a Transformer architecture, use a pre-processor that shaves off often ~0.75MB (as shown on Matt's helpful long benchmark page) by ex. replacing common words with smaller-to compress codes, so you could say my score is 20MB - 0.75MB = 19.25MB. While this may be the right thing to do, I believe it is solved better using the AI, so I refuse to do it for now. Re-arranging related enwik8 articles or by a high-D method also is used, this should not hurt prediction, why can't the AI just detect the new topic change after a few words? Questions: Matt I saw on your page methods you use, listed below, I actually don't fully understand many of these. I collect methods and so this is really interesting. Matt can you give an actual example for each the below using a sentence to show how it works so there is no doubt it being understood? For example when you say "ISSE", show an toy example like this: "Sally walked the dog, the dog saw a cat, the cat saw a dog" > cat and dog have similar predictions and/or are close, so it makes sense to predict dog>meowed if only saw cat>meowed, as cat and dog have been seen to be interchangeable, there is evidence. It would also be most awesome if you know ~how much MBs each may shave off enwik8. SSE (Secondary Symbol Estimation) --- like you can do this a 2nd time!!! ISSE (Indirect Secondary Symbol Estimation) --- no you didn't SEE (Secondary Escape Estimation) --- I may actually get this a bit, unsure though! Isn't this for when exhaust all orders? ICM (Indirect Context Match) --- as if this is not covered above :/(, no clue what this is ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T23ab994ac902fe7e-Mb25840abc26bf56c4f0a53af Delivery options: https://agi.topicbox.com/groups/agi/subscription
