Thanks for the link to PAKDOS, Brian. It's not surprising the exotic and
inscrutable file format was hard to figure out — I'm guessing James Yi
wrote a Lempel-Ziv <https://foldoc.org/LZ77> compressor!

I had been researching LZ compression recently to see if I could implement
a version for this very project. I notice that the documentation
<https://github.com/LivingM100SIG/Living_M100SIG/blob/main/M100SIG/Lib-10-TANDY200/PACK.DOC>
for PACK.200
<https://github.com/LivingM100SIG/Living_M100SIG/blob/main/M100SIG/Lib-10-TANDY200/PACK.200>
(which
James Yi also wrote) mentions using a "window" into the file as its
"dictionary" — which I believe all LZ algorithms back to 1977
<https://dl.acm.org/doi/10.1109/TIT.1977.1055714> use. Probably Yi would
have implemented LZSS <https://dl.acm.org/doi/10.1145/322344.322346> (1983),
a slightly tweaked version of LZ77.

Given the name, it's possible it is related to an incredibly old UNIX
utility called "pack <https://www.vidarholen.net/contents/blog/?p=691>",
but that format is based solely on Huffman coding
<http://compression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf>
(1952), which does not use a sliding window. Another possibility is that Yi
implemented the DEFLATE algorithm (LZSS compression followed by Huffman
coding) which is what PKZIP, gzip, and PNG use. Since PACK.200 was written
around 1990, LZW <https://ieeexplore.ieee.org/document/1659158> (as seen in
GIF and UNIX "compress") is also a possibility as it wasn't until 1993 that
Unisys declared that they would be enforcing their software patent
<https://burnallgifs.org/archives/>.

It is quite astounding to find a gem of a program like this for the
Model-T! Is there a list somewhere of all the programs James Yi wrote?

—b9

P.S. A tangent on the birth of Huffman coding:

While enrolled as a graduate student at MIT in 1951 in a class taught by coding
pioneer Robert Fano,
Huffman and his fellow students were told that they would be exempted from
the final exam if they
solved a coding challenge as part of a term paper. Not realizing that the
task was an open problem
that Fano had been working on himself, Huffman elected to submit the term
paper. After months of
unsuccessful struggle, and with the final exam just days away, Huffman
threw his attempts in the bin
and started to prepare for the exam. But a flash of insight the next
morning had him realize that the
paper he had thrown in the trash was in fact a path to a solution to the
problem. Huffman coding was
born at that moment, and following publication of his paper [...] in 1952,
it quickly replaced the previous
suboptimal Shannon-Fano coding as the method of choice for data compression
applications.
*—ACM Computing Surveys, Vol. 52, No. 4, Article 85. Publication date:
August 2019. 85:4 A. Moffat*


On Tue, Feb 17, 2026 at 8:32 AM Brian K. White <[email protected]> wrote:

> Wow, very nice Steve!
>
> Previously the only more compact things I'd seen were something exotic I
> couldn't figure out to make use of. Well let's say I didn't bother to
> figure it out. It's a 2-stage process where first a small ml program is
> encoded in a simple hex pair way, and then the actual payload is encoded
> some other way that the ml routine decodes. The ml routine also seems to
> do relocating.
>
> At least James Yi and Ron Wiesen used it.
>
> Even if I did figure it all out, I don't like for the loader to have
> that inscrutable binary part. If possible I want it to be fully readable
> and hackable by the end user in case they need to for some reason.
>
>
> https://github.com/LivingM100SIG/Living_M100SIG/blob/main/M100SIG/Lib-07-UTILITIES/PAKDOS.100
>
> https://github.com/bkw777/dl2/blob/master/clients/teeny/TEENY.100
>
> I'd have replaced the teeny loader with the much smaller and simpler hex
> pair one except Ron's loader does relocating and that's useful.
>
> It's one of those countless "someday" things to reinstall teeny a few
> times at different addresses and capture the resulting binaries and
> compare them to see just which bytes change and maybe make a fully BASIC
> installer and see how much larger that actually comes out. And if I get
> that far successfully, then maybe even port teeny to K85, the one
> machine that doesn't have it.
>
> I had also intended to sometime try making something that uses one of
> the newer schemes like z85 but most of them look like they will need a
> larger decoder than I'd like. Either needing more code or more ram or both.
> https://github.com/zeromq/rfc/blob/master/src/spec_32.c
>
>
> But Steve's method there is like yEnc.
> (Which amazingly, apparently does not actually stand for "why encode?"
> according to the author, even thought it's perfect.)
> http://www.yenc.org/
>
> Anyway thanks for the idea 8 years later!
>
> --
> bkw
>
> On 2/17/26 00:55, B 9 wrote:
> > Thanks! Stephen's encoding scheme <https://www.mail-archive.com/
> > [email protected]/msg06926.html> from 2018 seems like it might
> > be efficient enough to work. Just to keep the record straight: I was
> > confused about NULL being a problem, it is ASCII 26 (^Z) which cannot
> > exist in DO files. EDIT has cursor positioning trouble with embedded
> > ASCII 127 (DELETE), but doesn't remove it. BASIC programs, when
> > tokenized, cannot contain DELETE or any characters less than 32 (space)
> > — other than 9 (TAB).
> >
> > —b9
> >
> > On Sat, Feb 14, 2026 at 10:33 AM John R. Hogerhuis <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     You can look up discussions on the list for "alternative relative
> >     branch" and execute in place and position independent code for ideas.
> >
> >     -- John.
> >
> >     On Fri, Feb 13, 2026, 11:14 PM B 9 <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         On Thu, Feb 12, 2026 at 11:26 PM John R. Hogerhuis
> >         <[email protected] <mailto:[email protected]>> wrote:
> >
> >             I mean that programs only access memory they are documented
> >             to use. There is no API for this. One way would be shipping
> >             your program with a relocating loader which communicates to
> >             the user the memory range it will occupy for running and for
> >             data.
> >
> >                 I’m beginning to think a CO file is maybe too difficult
> >                 for my target audience, so I ought to look into how to
> >                 make a BASIC loader that takes as little space as
> possible.
> >
> >             I think that's a good idea, though interestingly BASIC
> >             variants particularly at a tokenized BASIC level might be a
> >             bigger fork in the road than CO files.
> >
> >         Did someone here once tell me that any character, except for a
> >         NULL, could be stored in a BASIC string? What about DATA
> statements?
> >
> >
> >             Maybe only support .DO formatted BASIC.  Or generate
> >             tokenized basic versions for incompatible variants.
> >
> >         A single .DO file sounds like a good idea as I would like there
> >         to be one set of instructions so people don’t have to think and
> >         figure things out. A program that is supposed to identify the
> >         machine type should definitely not require people to know what
> >         kind of machine they have.
> >
> >         If Brian’s co2ba script is typical, BASIC’s filesize expansion
> >         is going to be killer at about 250%. Double that in order to
> >         actually run it. (It is mostly data, not tokenizable BASIC). For
> >         example,
> >
> >         File  Bytes
> >         CRCPSH.CO <http://CRCPSH.CO>  1054
> >         CRCPSH.DO     2691
> >         CRCPSH.BA <http://CRCPSH.BA>  2588
> >
> >         (Brian: you may want to call it co2do. From the name, I had
> >         presumed it output only tokenized BASIC. Reading it into
> >         Virtual-T as a .BA file caused a segmentation fault. Minor typo:
> >         The co2ba script worked once I removed the extraneous quotation
> >         mark being added near the end of line 3. Feature request: it’d
> >         be swell if it did a quick check against MAXRAM and refused to
> >         crash the machine if the program length was too long.)
> >
> >             And I like the idea of embedding ML directly in BASIC REMs
> >             or strings, either with special position independent code or
> >             self-relocating. It can be very compact and require minimal
> >             "load time" overhead.
> >
> >         That sounds intriguing. Do you have any examples you can share?
> >         How do you handle NULLs in the data? How do you handle programs
> >         that are too long to fit in a single line or string?
> >
> >         —-b9
> >
>
>
> --
> bkw
>

Reply via email to