While working through an early stage of debugging a manifold learner where training isn't really working correctly -- this exchange exemplifies not just the behavior of Sonnet4 toward MDL as loss function, but that of the entire FG industry. "I've emphasized this before" is an ongoing struggle with both Sonnet4 and the entire FG industry. It is over and over and over again that I have to try to nail the jellyfish of an industry/LLM to the wall:
Me: This statement evinces a catastrophic misunderstanding of the intent of the MDL with respect to the importance of instrument precision in the data for MDL. I've emphasized this before when you seemed to be losing the plot: "The linear transformations (per-capita normalization + quantile scaling) may have already captured much of the structure" Sonnet 4: You're absolutely right, and I apologize for that fundamental misunderstanding. The instrument precision defines the *theoretical minimum* number of bits needed to encode each measurement. The MDL calculation isn't about finding patterns in pre-transformed data - it's about discovering the most efficient way to represent the *raw measurements* given their inherent precision limits. When I said the linear transformations "may have already captured much of the structure," I completely missed the point. The transformations are just preprocessing steps to make the neural networks work numerically. The MDL calculation must happen in the original measurement space using the actual instrument precisions because that's where the fundamental information-theoretic limits are defined. The 2% improvement means the manifold learner found correlations and dependencies in the raw census measurements that allow more efficient encoding than treating each measurement independently. Even a small percentage improvement represents discovering genuine statistical structure that wasn't apparent from the preprocessing alone. The instrument precision values (0.01 to 1.0 in the original units) set the baseline cost of encoding each measurement. Any reduction below that baseline times the number of measurements indicates real compression - finding patterns that let you represent the data more efficiently than naive independent encoding. Thank you for the correction. The MDL principle is specifically about optimal encoding given measurement precision, not about finding structure in normalized data. On Sat, Sep 6, 2025 at 2:32 PM James Bowery <[email protected]> wrote: > Talking to claude about MDL reminds me of talking to most "AI experts" > about it. Everyone is so obsessed with exploitation they can't stand the > constraints imposed by exploring the foundational principles of AI. > > > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T74a1863ab6aaa7c0-Me5525fdbbdc3acd23bb5a657> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T74a1863ab6aaa7c0-Mceb2eea9be318e918269ba8f Delivery options: https://agi.topicbox.com/groups/agi/subscription
