Machine learning loss functions aren't stated in the AIT-friendly terms of
bits of information, which makes it difficult to relate them to algorithmic
information.

Taking a baby-step, let's consider classification error in terms of bits:

Requires for a vector with an element for each pattern that one stores
either a symbol meaning "correct" (say '0') or an index (from 1 to
#Classes) of the correct class for that pattern.   The first element would
be the ceil(log2(#Classes)) to say how big the elements of the rest of the
vector are.  When everything is correctly classified, everything is 0 and
you just have the length of the run length encoding of '0' as your "Bits
loss function" for classification.

This is a reasonable first cut at an AIT-friendly loss function for
classification error.

So now let's go one step further to outputs that are numeric, the typical
approach is a summation of a function of the errors, such as squaring or
taking their absolute value or whatever.  But to provide the correction on
the outputs to reproduce the actual values requires, again, vector of
corrections but this time deltas, the precision of which must be adequate
to the original data being losslessly represented.  If these deltas have a
non-uniform distribution they can be arithmetically encoded.  So that seems
like a reasonable approach to another AIT-friendly loss function.

But now we get to the "model parameters" and find ourselves in the
well-defined but ill-founded notions like "L2 regularization" defined in
terms of ill-defined "parameters", e.g. "parameter counts"aren't given in
bits (and it is even rarely specified that a 32-bit floating point number
is only half a "parameter" of a 64-bit floating point number (or that the
latter is 2 and the former 1 "parmeter").

L2 regularization sounds like it is heading in the right direction by
squaring the weights and summing them up, but when one looks at what is
actually being done, it is

* applying additional functions to the sum such as mean
* asking for a scaling factor to apply
* applying the regularization on a "layer" rather than the entire model

I'm sure I missed some of the many ways L2 regularization fails to be
AIT-friendly.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tcf3881157c1bec31-M5ecb58f47772e723a4e77b14
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to