The input file doesn't have to be in the code, it can be opened from file but I 
didn't for now.

For now just a-z and space ya, i was just testing it. Can add all 256 chars 
later, soon.

It uses Arithmetic Coding yes. It updates the tree and its frequency counts 
using Online Learning, making a 17 letter long tree (each branch is 17 letters 
long) and searches for 1-17 letter matches before saving the predicted byte and 
mixes the 17 models as 1 set of predictions. It starts mixing the 17 models by 
taking the last longest match ex. 14 letters, and eats away remaining space and 
may stop at the 6th model of 6 letters long mixing just 14-6 letter matches 
from tree. The a-z 0.07 you see are the every char vocab given a small % just 
in case it has uncertainty while predicting and has no counts yet but needs to 
predict it. I was going to set it to each char gets 0.000002 but it doesn't 
like that and I think it ends up at that from 0.07 anyway. For some reason it 
likes that certain vocab chars have ex. 0.2 or 0.0003, I don't know why! The 
data counts should work this itself not the backup counts set.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tcfc4df5e57c62b43-M77dc0647da1e67861f05e22f
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to