FWIW I'm experimenting with scripts that directly query the GitHub API.
See `external.rkt' in the repo. It might or might not be a cleaner
solution. We'll see!

Using <https://api.github.com/repos/arcfide/chez-srfi/contents> to get the file list in the repo is very nice! Can it get branches or tags other than the default branch (master)?

To get the contents of a particular file in the repo, URLs of the form <https://raw.githubusercontent.com/arcfide/chez-srfi/master/%253a0.sls> let you do it without any JSON or Base64. "master" can be any other branch or tag as well.

I'm all in favour of a standardised, minimal way of expressing SRFI
support, or maybe even general features (eg 'this implementation has
native threads')for each implementation.

This is a good idea. We have started the work at <https://github.com/schemedoc/implementation-metadata/tree/master/schemes>; additions welcome! The schema is not stable yet but it's getting there. None of this data is scraped because it is all so idiosyncratic that there isn't much of anything we could scrape.

The return values of `(features)` could be scraped; the Docker containers at <https://github.com/scheme-containers> would make for a good foundation for that.

It's worth noting that this
won't be possible for unmaintained implementations, but fortunately their
respective feature sets aren't likely to change any time soon. :-)

But it could be possible if such information is not gleaned from their
tar files but from a URL.

The metadata consumer could be pointed to such a URL, which could be
hosted anywhere, and does not need the intervention of an absent maintainer.

Even if it was gleaned from their tar files, unmaintained
implementations could be unarchived and then rearchived with the
requisite metadata, so future queries of it could succeed, and so that
ways of getting metadata could be consistent across implementations.

Sure - for old releases, we could create files with the missing information and store them in one place (for example, as a repo under schemedoc). Erkin's work on the srfi-metadata repo has already covered a good distance toward this.

I'd advise against re-archiving old releases of Scheme implementations. It's clearer policy for their authors' reputation as well as users' convenience if we always use the pristine source releases made by the authors; i.e. identical tar files. Nowadays security is also an increasing concern, and we routinely take hashes of packages. Repackaging changes the hash.

I'd also continue to advise storing the information for new releases in the Scheme implementation's own git repo. There's a fairly strong convention that one Scheme implementation = one repo; the same repo contains the C source files, the Scheme source files, the boot image, documentation, and everything else. And is watched over constantly by that Scheme implementation's author and most active users. So there's a good basis for getting the information right and keeping it well maintained. We have quite a few places around Scheme where information is stored rather far from its origin, and it tends to get out of date quite easily, or its correctness is hard to verify.

A further benefit of storing in the Scheme implementation's repo is that future source releases (tar files) will automatically contain info that is up to date for that release. So even if we lose the version control history, anyone having the tarball for a release can retrieve the metadata. We've found and resurrected quite a few old Scheme implementations for the Docker containers, and empirically, the tarballs are by far the most resilient format for long-term archival.

If the standardized metadata files take off, a future aggregator could read them and then merge in the file containing missing information from old implementations. Does this sound reasonable?

Reply via email to