I've opened a new project, a68h, at mtn-host.prjek.net. I'm looking for advice about how to do the initial checkin, preferably before I actually do it. There are a few issues that may be of wider interest, and may relate to things planned (or not) for future versions of monotone.
The project is to restore an ancient Algol 68 compiler for the IBM/360, and make it run on today's popular hardware. I'm starting from two development snapshots, taken approximately four years apart. In the intervening period, several development directions were aborted because of limitations in the toolset being used. But the final stages of each of these are still present in the second snapshot, though they were no longer in use. I do not have complete development snapshots of these abandoned development directions -- just final snapshots of the files that were discarded. Now clearly this history can be included in the monotone archive. They affect one major component of the system, the code generator. The reast of the compiler (the majority of the code) was just improved in an orderly way between the two snapshots. It is unlikely that the discarded code will ever be of any use, since they are machine-dependent hardware has changed radically in the meantime. Large parts of the code generator that *was* in the final version are also slated for replacement, but it is possible that significant parts will remain. So the first question is, Is it worthwhile to represent this ancient history in the repository? Next, the code base is stored in EBCDIC in IBM's FB records, and some of it (mostly the test suite) is in IBM's VBS format. For those not in the know, the FB records are fixed-length records, 80 bytes each, now concatenated into a long Linux binary file. In Linux, you just read them 80 bytes at a time; each 80 bytes is a line of the source code, 72 bytes of ENBDIC text, and 8 bytes of sequence number. Line boundaries are indicated by counting bytes; there are no newline characters of any kind. It's not hard to convert to ASCII, but any reasonable conversion does some damage to the data -- there are a few characters that don't have ideal translation, and any sane change to Unix-style lines would involve the removal of trailing spaces and line numbers. Does it make sense to try to store the EBCDIC files into the monotone repository? Monotone, I understand, prefers to store everything internally in Unicode (possibly UTF-8 to save space). Now there are reversible translations of EbCDIC to and from Unicode, but I don't think the standard one plays nicelt with some of the weirder characters on the TN print train (such as corners for drawing boxes). And there's still the matter of line endings -- counting bytes won't work after the conversion. Is there a Unicide newline (say) that is not a translation of an EBCDIC characer? Are there any plans for monotone to address character set issues beyond CR-LF vs \n ? Should there be? Should I just check all this in as binary files? Should I convert to Unicode as if monotone recognised the EBCDIC character set and unicoded it? Or should I just abandon that bit of history and just use a plain ASCII version of the latest snapshot and work from there? --- There are some real questions here, that are likely to be of relevance to others trying to work in the archaeology of computing. One that doesn't affect me in this project is: If you have reconstructed history from ancient snapshots and checked it in accordingly, what do you do when you discover *another* ancient snapshot that fits before all of them, or in between two existing ones? -- hendrik _______________________________________________ Monotone-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/monotone-devel
