On 8 September 2015 at 19:27, Dominic Oldman <[email protected]> wrote:
> > I think there are various approaches you can take depending upon what your > objectives are. > > 1. Identify (describe) the document and provide access to it. Using CRM > this would harmonise with other CRM data. > This is really all I'm aiming to do, though I had to step outside of the CIDOC CRM (and use FRBRoo) to encode the relationship between the E31 Document and the associated HTML content. I'm slightly dissatisfied with that, but perhaps it's to be expected. I'm open to other options! > 2. Identify particular fragments of the text (using FRBRoo). > 3. Tag particular things in the text > > In terms of 3 there is TEI but also the option of using CRM in RDFa tags > to identify entities and relationships in the text that would have > correspondence in the data. This is an approach we have used at the BM. > RDFa tags can be used to identify people, places, subjects etc, and can > link these entities using CRM properties. These can operate on their own as > an extension to the RDF store or be harvested into the RDF store. > In other projects I have used TEI as a source for RDF, with a workflow which harvests RDF from TEI documents and stores them in a SPARQL graph store. It's a powerful technique for aggregating data across a corpus of texts. I would be very interested to read more about how you have used TEI (or RDFa) in this way at the British Museum! But in this particular project I'm trying out a workflow that doesn't involve an RDF store at all. I don't control the source of the data (I don't work for Museum Victoria); I am merely querying it and re-formatting it to produce RDF on the fly (i.e. as requested by a Linked Data client). Their API is not natively RDF, and I'm not harvesting or even caching the RDF data I generate so there's actually no "RDF store" involved at all. It's been an interesting experiment for me; the weaknesses in the approach are that any actual aggregation you need to do has to be quick enough to perform on the fly. The Linked Data resources (RDF graphs) my software produces are all based on 1 or at most 2 queries to the Museum's API, and possibly 1 to dbpedia. On the positive side, the lack of caching and harvesting makes the whole thing very simple. Cheers! Conal -- Conal Tuohy http://conaltuohy.com/ @conal_tuohy +61-466-324297
