Paragraphs and sections in an article share mutual information. However, I saw on the forum that a transform to group footers that link to the article in other languages improves compression. You also have to save the original order. With articles, you can restore the original order by sorting by page ID, which are sequential in enwik9.
About a third of the articles are redirects. It is easy to group these together to improve compression. Another 10-15% are about places that were automatically generated from a US census table. These are highly compressible and can be grouped. -- Matt Mahoney, [email protected] On Sat, Jan 10, 2026, 8:37 AM James Bowery <[email protected]> wrote: > > > On Fri, Jan 9, 2026 at 9:44 PM Matt Mahoney <[email protected]> > wrote: > >> 2. Improved article sort order by Kaitz. I believe this is based on >> k-means clustering on a 1K vector space model. I was never able to >> produce the same result myself so I just used the list he supplied. >> > > I wonder to what extent in-line intra-article reordering may: > > 1) Be reasonably fast > 2) Contribute to both speed and compression > ? > > The most obvious granularity would be paragraphs. > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5-Mb704d1ce04a06824e0334906> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5-Ma4ac8c726e0e7f28299f7acc Delivery options: https://agi.topicbox.com/groups/agi/subscription
