Machine learning loss functions aren't stated in the AIT-friendly terms of bits of information, which makes it difficult to relate them to algorithmic information.
Taking a baby-step, let's consider classification error in terms of bits: Requires for a vector with an element for each pattern that one stores either a symbol meaning "correct" (say '0') or an index (from 1 to #Classes) of the correct class for that pattern. The first element would be the ceil(log2(#Classes)) to say how big the elements of the rest of the vector are. When everything is correctly classified, everything is 0 and you just have the length of the run length encoding of '0' as your "Bits loss function" for classification. This is a reasonable first cut at an AIT-friendly loss function for classification error. So now let's go one step further to outputs that are numeric, the typical approach is a summation of a function of the errors, such as squaring or taking their absolute value or whatever. But to provide the correction on the outputs to reproduce the actual values requires, again, vector of corrections but this time deltas, the precision of which must be adequate to the original data being losslessly represented. If these deltas have a non-uniform distribution they can be arithmetically encoded. So that seems like a reasonable approach to another AIT-friendly loss function. But now we get to the "model parameters" and find ourselves in the well-defined but ill-founded notions like "L2 regularization" defined in terms of ill-defined "parameters", e.g. "parameter counts"aren't given in bits (and it is even rarely specified that a 32-bit floating point number is only half a "parameter" of a 64-bit floating point number (or that the latter is 2 and the former 1 "parmeter"). L2 regularization sounds like it is heading in the right direction by squaring the weights and summing them up, but when one looks at what is actually being done, it is * applying additional functions to the sum such as mean * asking for a scaling factor to apply * applying the regularization on a "layer" rather than the entire model I'm sure I missed some of the many ways L2 regularization fails to be AIT-friendly. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tcf3881157c1bec31-M5ecb58f47772e723a4e77b14 Delivery options: https://agi.topicbox.com/groups/agi/subscription
