[Monotone-devel] Organizing a retrocomputing project

hendrik Sat, 28 Jun 2008 08:25:53 -0700

I've opened a new project, a68h, at mtn-host.prjek.net.

I'm looking for advice about how to do the initial checkin, preferably 
before I actually do it.  There are a few issues that may be of wider 
interest, and may relate to things planned (or not) for future versions 
of monotone.


The project is to restore an ancient Algol 68 compiler for the IBM/360, 
and make it run on today's popular hardware.

I'm starting from two development snapshots, taken approximately four 
years apart.  In the intervening period, several development directions 
were aborted because of limitations in the toolset being used.  But the 
final stages of each of these are still present in the second snapshot, 
though they were no longer in use.  I do not have complete development 
snapshots of these abandoned development directions -- just 
final snapshots of the files that were discarded.

Now clearly this history can be included in the monotone archive.  They 
affect one major component of the system, the code generator.  The reast 
of the compiler (the majority of the code) was just improved in an 
orderly way between the two snapshots.  It is unlikely that the 
discarded code will ever be of any use, since they are 
machine-dependent hardware has changed radically in the meantime.

Large parts of the code generator that *was* in the final version are 
also slated for replacement, but it is possible that significant parts 
will remain.

So the first question is,

   Is it worthwhile to represent this ancient history in the repository?

Next, the code base is stored in EBCDIC in IBM's FB records, 
and some of it (mostly the test suite) is in IBM's VBS format.
For those not in the know, the FB records are fixed-length records, 80 
bytes each, now concatenated into a long Linux binary file.  In Linux, 
you just read them 80 bytes at a time; each 80 bytes is a line of the 
source code, 72 bytes of ENBDIC text, and 8 bytes of sequence number.
Line boundaries are indicated by counting bytes; there are no newline 
characters of any kind.
 
It's not hard to convert to ASCII, but any reasonable conversion does 
some damage to the data -- there are a few characters that don't have 
ideal translation, and any sane change to Unix-style lines would involve 
the removal of trailing spaces and line numbers.

Does it make sense to try to store the EBCDIC files into the monotone 
repository?  Monotone, I understand, prefers to store everything 
internally in Unicode (possibly UTF-8 to save space).  Now there are 
reversible translations of EbCDIC to and from Unicode, but I don't think 
the standard one plays nicelt with some of the weirder characters on the 
TN print train (such as corners for drawing boxes).  And there's still 
the matter of line endings -- counting bytes won't work after the 
conversion.  Is there a Unicide newline (say) that is not a translation 
of an EBCDIC characer?

Are there any plans for monotone to address character set issues beyond 
CR-LF vs \n ?  Should there be?

Should I just check all this in as binary files?  Should I convert to 
Unicode as if monotone recognised the EBCDIC character set and unicoded 
it?

Or should I just abandon that bit of history and just use a plain ASCII 
version of the latest snapshot and work from there?

---

There are some real questions here, that are likely to be of relevance 
to others trying to work in the archaeology of computing.  One that 
doesn't affect me in this project is:

  If you have reconstructed history from ancient snapshots and checked 
it in accordingly, what do you do when you discover *another* ancient 
snapshot that fits before all of them, or in between two existing ones?
 
-- hendrik




_______________________________________________
Monotone-devel mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/monotone-devel

[Monotone-devel] Organizing a retrocomputing project

Reply via email to