Your idea is similar to
Compression by induction of hierarchical grammars
by Craig G. Nevill-Manning,
Ian H . Witten & David L. Maulsby,
October, 1993

It looks promising but I don't know of any practical implementations. You
have to make multiple passes to find the most frequent pair of symbols, but
it is faster (but suboptimal) to find many frequent pairs at once in fewer
passes.

On Thu, May 28, 2020, 8:58 AM stefan.reich.maker.of.eye via AGI <
[email protected]> wrote:

> I think I got the theoretical runtime down to O(n) if we skip sorting the
> literal lines in the output file (which doesn't make any difference to the
> core idea).
>
> So the algorithm basically compresses a list of ints to a compact binary
> graph in O(n) time. It does so like this:
>
> while true:
>   (i, j) = most often (and at least twice!) occurring adjacent pair in
> list; end loop if none found
>   k = index of new or existing table entry (i, j)
>   replace all occurrences of (i, j) in list with (k)
>
> Each step of the loop can be done in constant time by clever use of hash
> maps and linked lists. That almost sounds impossible, but there we are,
> just finished the code. In practical terms, for enwik8, it's just about as
> fast than the last version that was more O(n log n)-ish.
>
> I'm truly wondering if this algorithm has been discovered before.
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/Tb2cf064c700f181c-Ma69c283a13a3bf51eb532bef>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tb2cf064c700f181c-M8cdf4daa6a5627de0e2fb67e
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to