I believe that the value would be "100 MB" rather than "100". The units are
part of the value in extent; it has no meaning without the units.
<dcterms:extent> is equivalent to the 300$a in MARC--a text string specifying
the extent, including units.
Steve McDonald
[email protected]
-----Original Message-----
From: Code for Libraries <[email protected]> On Behalf Of Eric Lease
Morgan
Sent: Monday, October 18, 2021 10:02 AM
To: [email protected]
Subject: [CODE4LIB] using dublin core to express size measured in words
Using Dublin Core, is there a way to express size measured in words?
I have a collection of more than 3,000 data sets. Each data set may include as
many as a few thousand journal articles, a couple hundred books, or a myriad of
Web pages. These data sets cover two very broad topic areas: COVID-19 and
"great ideas" like love, honor, truth, justice, beauty, etc.
I am in the process of curating the collection, and I want to rigorously
describe each item. Modeling the metadata in a relational database is easy, and
because the data sets (by definition) are well-structured, it is almost trivial
to fill the database with records. While the database will be my canonical
container for the metadata, I will ant to expose the metadata in a number of
different ways. Examples may include; OAI-PMH, flavors of linked data (RDF/XML,
JSON-LD), Sparql, etc. Ultimately, I will have to map my metadata to something
like Dublin Core, and for most metadata, the mapping is easy, especially if I
exploit the terms (http://purl.org/dc/terms/) namespace, which I think used to
be called "Qualified Dublin Core".
But a few characteristics are throwing me for loop. The first is number of
words. The size of a data set, measured in words, is very useful information.
For example, data sets whose size is less than 1,000,000 words does not lend
itself to semantic indexing, and this would be good to know before the dataset
is downloaded. The second is number of items in the data set, which is an
indicator of comprehensiveness. Finally, the data set could be a mere 10 MB
bytes size where other data sets might come close to a gigabyte. I need/want to
express extent in a number of ways.
Using Dublin Core, how can I express the size of a data set measured in number
of words, number of items, or size in bytes? Here is a snippet of RDF/XML where
I express size in bytes, but I not satisfied with the result because the units
are not explicitly expressed:
<dc:format>
<dcterms:extent>
<rdf:value>100</rdf:value>
<rdfs:label>100 MB (compressed)</rdfs:label>
</dcterms:extent>
</dc:format>
What am I missing? How can this snippet be improved? How can I apply the same
technique to denote size in words or number of items? Can I do this without
creating my own namespace? Attached ought to be a valid RDF/XML file with bogus
values for things like creator, title, subjects, etc.
(Once I figure out how to exploit extent, I will want to learn how to exploit
table of contents notes.)
--
Eric Morgan
Navari Family Center for Digital Scholarship Hesburgh Libraries University of
Notre Dame