Hi All
As I understand it (which might be wrong!), Tim is generating a bunch of
reports on things in the corpa / how different tools analyse the corpa /
how Tika works on the stuff there, mostly as SQL databases
Those databases are then available to anyone who is interest to download
and analyse locally from eg
https://corpora.tika.apache.org/base/metadata/mimes/
(though that URL isn't working right now, hopefully fixed soon)
There's a fairly new project called Datasette, which is a really nice
publishing and exploring interface on top of SQL databases, especially
aimed at archivists, journalists etc -
https://github.com/simonw/datasette
I wonder (though I won't have time for a few weeks to try myself...) if
it'd be worth stuffing one or two of the SQL reports into a copy of
datasette hosted on the vm, to let people more easily explore the data?
Cheers
Nick