On Sat, Aug 19, 2023 at 6:27 PM Matt Mahoney <[email protected]>
wrote:

> Most
> of the compression comes from using the previous column as context.
> Similar columns are grouped, like 1990 population followed by 2000
> population, which is what makes these contexts useful. Obviously there
> are other things to try, like sorting rows and columns by mutual
> information, or predicting cells from previously coded cells and
> coding the difference. Stay tuned.


Thanks Matt.  Using existing software with minor tweaks to squeeze out the
low hanging statistical fruit is an important first step toward a
meaningful competition.

Ordering columns so they most closely correlate (as you are doing) is an
indirect way of doing factor analysis as data compression which goes back
to 1973 at least
<https://ttu-ir.tdl.org/bitstream/handle/2346/15912/31295004619267.pdf?sequence=1>
although
that linked paper is a dead-end since no one cited it in subsequent work.

There is some prior art  involving DARPA cyberwar forensics for causal
analysis of enterprise logs like:

https://youtu.be/eK-E6242K-c?t=209

SEAL: Storage-efficient Causality Analysis on Enterprise Logs with
Query-friendly Compression
<https://www.usenix.org/system/files/sec21fall-fei.pdf>

But I don't see open source for that or subsequently related work.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T30092c5d8380b42f-Mac20a76482ba8016301b6574
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to