Hi All

As I understand it (which might be wrong!), Tim is generating a bunch of reports on things in the corpa / how different tools analyse the corpa / how Tika works on the stuff there, mostly as SQL databases

Those databases are then available to anyone who is interest to download and analyse locally from eg https://corpora.tika.apache.org/base/metadata/mimes/
(though that URL isn't working right now, hopefully fixed soon)

There's a fairly new project called Datasette, which is a really nice publishing and exploring interface on top of SQL databases, especially aimed at archivists, journalists etc - https://github.com/simonw/datasette

I wonder (though I won't have time for a few weeks to try myself...) if it'd be worth stuffing one or two of the SQL reports into a copy of datasette hosted on the vm, to let people more easily explore the data?

Cheers
Nick

Reply via email to