[dspace-tech] Re: DC, QDC, and DCTERMS: reviewing our metadata practices

Mark H. Wood Fri, 25 Jan 2019 06:24:19 -0800

On Thursday, January 24, 2019 at 9:10:06 AM UTC-5, Alan Orth wrote:
>
> We have started looking at our use of metadata across our repositories and 
> I have to say that it is very confusing! First some background as I 
> understand it, then the current state of affairs in DSpace 4/5/6, and then 
> my question(s). :)
>
> Dublin Core is the original specification of fifteen elements from 
> 1995[0]. It was amended in 2000 to add element qualifiers like 
> "dc.date.issued" as well as a few new elements[1]. These were both 
> superseded in 2008 with the introduction of the Dublin Core Terms (aka 
> DCTERMS) specification[2], which essentially combines both of them.
>
> By default DSpace makes heavy use of both simple and qualified Dublin Core 
> in its input forms, but also provides crosswalks to translate many of these 
> to DCTERMS that are then exposed as metadata in the XMLUI[3][4]. It is very 
> easy to change the input forms to use different fields and even custom 
> namespaces, though some core fields seem to be dangerous (like dc.date.* 
> and dc.contributor.author).
>
> Our repository is consumed ravenously by search engines, but also by 
> increasingly many harvesters via REST and OAI APIs. If we want to make sure 
> that the metadata these harvesters receive is also standards compliant and 
> interoperable, shouldn't we update our input-forms and existing item 
> metadata to take some of the crosswalks into mind? For example: to start 
> using dc.language or dcterms.language instead of dc.language.iso (I would 
> of course update the crosswalks accordingly). Does any of this change in 
> DSpace 7? Is there any talk of moving away from a flat schema so that 
> authors and institutions could be related, for example?
>
>


I agree that too little attention is given to interchange, and how careful 
we have to be to make M2M communication meaningful without introducing 
errors and unwarranted assumptions.

Since it appears to me that there is no such thing as dc.language.iso, 
using dc.language makes sense.  There are several DSpace inventions 
masquerading as QDC.  Some of them should be moved to a different 
namespace.  No, I haven't yet made a table of my recommended changes.

There is always talk of moving away from a flat namespace.  I think this 
may be unnecessary.  Authors, for example, are not contained by 
institutions; an author writing for or with the sponsorship of an 
institution should cause the authorship of the work to be marked with a 
relationship to the institution -- that is:  an object references another 
by unique identifier.  It may be convenient to express this externally with 
a hierarchial form (e.g. in METS), but that is merely for interchange; 
internally we should represent knowledge more flexibly so that we can 
produce whatever external representation is required without having to 
reverse too many assumptions.  An author's employment and membership 
history would properly belong in a biographical repository, which (were 
such a thing to exist) would have quite a different sort of metadata 
structure.  Given a sufficiently rich set of simple types, most of what we 
know about an author or a work or an institution should be usefully 
representable as simple lists.

We also need to look at enriching our internal representation, but in a 
different way.  I think it was Mark Diggory who observed that we talk about 
"metadata schemas" as if we had them, but what we really have is 
namespaces.  A schema not only tells you what fields are defined and how 
they are named, but what kind of data a field may hold and what values of 
that kind are acceptable.  A well-written schema will guarantee that the 
value stored in a field will make sense.  If we declare that 
dc.date.accessioned is a date, then even if the UI mistakenly accepts a 
value of "Louisiana" as a date, the metadatavalue service would not, 
because that can't be understood as a date.  We might store the value of 
dc.date.accessioned as a string encoding of a date, but then we might store 
it as a serialized Calendar, and the schema then tells us how to interpret 
the byte array and informs external interfaces of how they might represent 
the field's value.  Another way of looking at this is that we encode 
information in form definitions which might be pushed into the metadata 
schemas (if we had them) and fetched by the form interface, allowing us to 
simplify the writing of forms.

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

[dspace-tech] Re: DC, QDC, and DCTERMS: reviewing our metadata practices

Reply via email to