[agi] Re: I made a multi-file compressor that beats 7zip on real-world data

stefan.reich.maker.of.eye via AGI Wed, 27 May 2020 07:20:26 -0700

Here's an example to show how the file format works.

$ cat version1
pseudocode bla {
  print("This is version 1");
  String x = "hello";
  print(x);
}
$ cat version2
pseudocode bla {
  print("This is version 2");
  print("Starting now");


  String x = "hello";
  print(x);
}
$ linecomp c test.lc version1 version2
Adding file version1
Adding file version2
Compressing a total of 192 bytes...
Compression done [12 ms]

Archive /home/stefan/linecomp-demo/test.lc stats:
  1K of text compressed into 1K (2 files)
$ gunzip <test.lc
LINECOMP 8

  String x = "hello";
  print("Starting now");
  print("This is version 1");
  print("This is version 2");
  print(x);
pseudocode bla {
}
1 5
7 0
8 9
version1=6 3 10
version2=6 4 2 0 10

As you see, the file consists of these 4 parts:
 1. Magic header and number of literal lines that follow (n)
 2. All the literal lines in lexical order
 3. A list of pairs of indices. Each index points either into the literal table 
(if i < n) or in the table of pairs itself (if i >= n). This way we build 
larger chunks of lines.
 4. A list of the files with the file's name plus a list of indices as defined 
above which are followed and concatenated to yield the file's contents.

Don't tell me that's not one of the simplest formats you have ever seen.

---

[An excursion: Can binary data be compressed this way? You need to define how 
to split it in "lines".

I believe you can actually coerce binary data into LINECOMP 0.1. It will still 
look for line breaks in the data which might be a rather nonsensical exercise, 
but nonetheless it should work. Input is not specifically interpreted as text 
except for the \n's. One might define a different content splitting strategy 
for other file types and actually get usable results there too.]
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tb2cf064c700f181c-Ma6d39bd7be49108424be06e4
Delivery options: https://agi.topicbox.com/groups/agi/subscription

[agi] Re: I made a multi-file compressor that beats 7zip on real-world data

Reply via email to