The input file doesn't have to be in the code, it can be opened from file but I didn't for now.
For now just a-z and space ya, i was just testing it. Can add all 256 chars later, soon. It uses Arithmetic Coding yes. It updates the tree and its frequency counts using Online Learning, making a 17 letter long tree (each branch is 17 letters long) and searches for 1-17 letter matches before saving the predicted byte and mixes the 17 models as 1 set of predictions. It starts mixing the 17 models by taking the last longest match ex. 14 letters, and eats away remaining space and may stop at the 6th model of 6 letters long mixing just 14-6 letter matches from tree. The a-z 0.07 you see are the every char vocab given a small % just in case it has uncertainty while predicting and has no counts yet but needs to predict it. I was going to set it to each char gets 0.000002 but it doesn't like that and I think it ends up at that from 0.07 anyway. For some reason it likes that certain vocab chars have ex. 0.2 or 0.0003, I don't know why! The data counts should work this itself not the backup counts set. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tcfc4df5e57c62b43-M77dc0647da1e67861f05e22f Delivery options: https://agi.topicbox.com/groups/agi/subscription
