I'd support some explicit clarification via the documentation at least.

If there is an intent that DC is metadata about Fedora objects, for
repository managers, rather than descriptive metadata about the content for
consumers of objects, then this should be clear; and there should be a
recommendation to use a separate datastream for OAI-PMH.

It does seem a little misleading to use an OAI-namespaced element as the
root element if that is the intention.  There's no particular recommendation
to do so[1], though equally there seems to be no recommendation on what to
use for the root element; and arguably it's better to reuse an existing
schema rather than create a new one.

Any changes to the DC datastream's schema (or support for an
alternative/additional schema) would need to be carefully considered in
terms of backward compatibility and existing repositories to make sure we
cater for those that do use the DC datastream for OAI-PMH.

I've just tested (3.4.2) the DC dc:identifier requirements: 

- if there is no dc:identifier containing the PID on ingest, then one is
added (alongside any other dc:identifier fields also present)
- however on editing the content (in this case using the admin UI) it seems
that this requirement isn't enforced:  If there is no dc:identifier then one
is added with the PID, however if there is a non-PID dc:identifier field
then a PID dc:identifier is not added.  Which looks like a bug.

DC datastreams can now be stored as managed content, however there is an
oustanding JIRA issue on this - https://jira.duraspace.org/browse/FCREPO-849
- these datastreams currently must be versioned.

Regards
Steve

[1] http://www.dublincore.org/documents/dc-xml-guidelines/


> -----Original Message-----
> From: Benjamin Armintor [mailto:[email protected]] 
> Sent: 25 February 2011 16:16
> To: [email protected]
> Subject: Re: [fcrepo-dev] oai_dc, reserved namespace, and Fedora
> 
> 
> Deborah-
> 
> > (b) sharing a meaningless internal identifier via OAI PMH 
> as dc:identifier is bad practice, and should never be a 
> fallback default.
> 
> You make a good point: The pid satisfies the requirements for a unique
> identifier as per OAI-PMH section 2.4, but it doesn't hold up to the
> use of the resource identifiers very well on its own.  It seems like a
> natural extension to the features requested in:
> https://jira.duraspace.org/browse/FCREPO-650
> https://jira.duraspace.org/browse/FCREPO-655
> ... to work towards the OAI-PMH interfaces include a dereferenceable
> URI.  That seems like an appropriate default to me, but it still may
> not be what you want!
> 
> > The DC element's dc:identifier field is restricted to being 
> the fedora pid.
> 
> This is a misunderstanding; at least as of Fedora 3.3 you can add
> additional dc:identifier's.  I'd have to check the code to see whether
> you can actually eliminate the pid's inclusion, and what the
> consequences of that would be (I wouldn't recommend it in any case).
> 
> > Thorny has repeatedly said that we shouldn't be loading up 
> the DC datastream with metadata because it's bad practice 
> (mostly because of the performance problems he discusses in 
> that e-mail you cite). Is that not true?
> 
> If you load inline DC datastreams up with large data, you may incur a
> performance penalty; especially if you version the DC datastream.  So
> there are three factors there: Is it inline, is it large, is it
> versioned.  These three factors relate to any datastream in Fedora,
> though, not just the DC datastream.
> 
> Thorny also said you shouldn't try to include qualified dublin core
> data, but that is a separate issue.
> 
> Thanks for hashing this out! These lists are an important source of
> documentation for me, and it's important to revisit long-standing
> issues to make sure received wisdom is still aligned with the state of
> the application.
> 
> - Ben
> 
> On 2/25/11, Kaplan, Deborah <[email protected]> wrote:
> >> Fedora's "internal"
> >> datastream is not called oai_dc; it's called "DC", and 
> uses an element
> >> with the namespace prefix "oai_dc" as a container.
> >
> > Fair enough; I was sloppy with my terminology and I 
> shouldn't have been.
> >
> > That meeting said, it's clear that this is a blocker for 
> any number of
> > people who are getting set up either with Fedora itself for 
> the first time,
> > or with Fedora and OAI PMH. The namespace prefix oai_dc 
> *means* something
> > out in the world, and for that matter so does the 
> identifier "DC" (even if
> > not as formally). Asking it in Fedora to mean something 
> else confuses new
> > users, it is one more roadblock between concept and 
> production for tool
> > which is nontrivial to configure at the best of times.
> >
> > Quoting Thorny: "I wish we had called the DC datastream 
> "repoMeta" or
> > something. It was just intended to be the base metadata 
> needed for the
> > repository manager to be able to function and was not 
> intended to be exposed
> > externally."
> >
> > People setting up a Fedora instance, or implementing OAI 
> PMH for an existing
> > Fedora instance, see the apparently meaningful prefix 
> oai_dc, on top of the
> > apparently meaningful datastream name "DC", and get confused. Yes, I
> > understand that oai_dc is reserved in such a way that it 
> can't be used
> > internal to Fedora, it's reserved for an OAI PMH 
> implementation. Yet normal
> > humans are going to see the namespace oai_dc and react 
> accordingly. At a
> > minimum, the documentation should be updated to highlight 
> this potential
> > confusion.
> >
> >>  It is a default, so it
> >> seems appropriate for proai to fallback to that in the absence of
> >> other configuration.
> >
> > Except it doesn't, because (a) the message on the mailing 
> lists for the last
> > year has repeatedly been "don't put any metadata into DC 
> datastream, except
> > what Fedora requires", and more importantly (b) sharing a 
> meaningless
> > internal identifier via OAI PMH as dc:identifier is bad 
> practice, and should
> > never be a fallback default.
> >
> >>   The "internal" character of the DC datastream isn't that 
> dramatic-
> >> your objects can certainly have identifiers, titles, and 
> formats that
> >> you define for them.
> >
> > Actually, that's not true. They can't have identifiers you 
> define for them.
> > The DC element's dc:identifier field is restricted to being 
> the fedora pid.
> > Unless there is something I am very much missing.
> >
> > Thorny has repeatedly said that we shouldn't be loading up 
> the DC datastream
> > with metadata because it's bad practice (mostly because of 
> the performance
> > problems he discusses in that e-mail you cite). Is that not true?
> >
> >
> > -Deborah
> > 
> --------------------------------------------------------------
> ----------------
> > Free Software Download: Index, Search & Analyze Logs and 
> other IT data in
> > Real-Time with Splunk. Collect, index and harness all the 
> fast moving IT
> > data
> > generated by your applications, servers and devices whether 
> physical,
> > virtual
> > or in the cloud. Deliver compliance at lower cost and gain 
> new business
> > insights. http://p.sf.net/sfu/splunk-dev2dev
> > _______________________________________________
> > Fedora-commons-developers mailing list
> > [email protected]
> > 
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> >
> 
> --------------------------------------------------------------
> ----------------
> Free Software Download: Index, Search & Analyze Logs and 
> other IT data in 
> Real-Time with Splunk. Collect, index and harness all the 
> fast moving IT data 
> generated by your applications, servers and devices whether 
> physical, virtual
> or in the cloud. Deliver compliance at lower cost and gain 
> new business 
> insights. http://p.sf.net/sfu/splunk-dev2dev 
> _______________________________________________
> Fedora-commons-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> 


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to