Using Dublin Core, is there a way to express size measured in words?

I have a collection of more than 3,000 data sets. Each data set may include as 
many as a few thousand journal articles, a couple hundred books, or a myriad of 
Web pages. These data sets cover two very broad topic areas: COVID-19 and 
"great ideas" like love, honor, truth, justice, beauty, etc.

I am in the process of curating the collection, and I want to rigorously 
describe each item. Modeling the metadata in a relational database is easy, and 
because the data sets (by definition) are well-structured, it is almost trivial 
to fill the database with records. While the database will be my canonical 
container for the metadata, I will ant to expose the metadata in a number of 
different ways. Examples may include; OAI-PMH, flavors of linked data (RDF/XML, 
JSON-LD), Sparql, etc. Ultimately, I will have to map my metadata to something 
like Dublin Core, and for most metadata, the mapping is easy, especially if I 
exploit the terms (http://purl.org/dc/terms/) namespace, which I think used to 
be called "Qualified Dublin Core".

But a few characteristics are throwing me for loop. The first is number of 
words. The size of a data set, measured in words, is very useful information. 
For example, data sets whose size is less than 1,000,000 words does not lend 
itself to semantic indexing, and this would be good to know before the dataset 
is downloaded. The second is number of items in the data set, which is an 
indicator of comprehensiveness. Finally, the data set could be a mere 10 MB 
bytes size where other data sets might come close to a gigabyte. I need/want to 
express extent in a number of ways. 

Using Dublin Core, how can I express the size of a data set measured in number 
of words, number of items, or size in bytes? Here is a snippet of RDF/XML where 
I express size in bytes, but I not satisfied with the result because the units 
are not explicitly expressed:

  <dc:format>
    <dcterms:extent>
      <rdf:value>100</rdf:value>
      <rdfs:label>100 MB (compressed)</rdfs:label>
    </dcterms:extent>
  </dc:format>

What am I missing? How can this snippet be improved? How can I apply the same 
technique to denote size in words or number of items? Can I do this without 
creating my own namespace? Attached ought to be a valid RDF/XML file with bogus 
values for things like creator, title, subjects, etc.

(Once I figure out how to exploit extent, I will want to learn how to exploit 
table of contents notes.)

--
Eric Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame

Attachment: homer.xml
Description: XML document

Reply via email to