Hello, in regards to this discussion and its relationship to the "DSpace
with Fedora Inside" aspirations.

1.) Fedora dc datastream is really just for there for indexing in the local
fedora search, everyone I've talked to has suggested not paying too much
attention to this DC datastream as a resource for use outside Fedora.
Likewise, as Gert and Scott point out, the real ambition is to get the
metadata in a more capable search engine, such as Solr.

2.) DSpace QDC metadata fields are used for UI search, and so they are not
very well aligned with Fedora's DC datastream or native query interface,  in
fact, they are better aligned with Solr and with GSearch.

3.) This said, we know QDC will never give us the Hierarchical levels we
need to capturing hierarchical mods or your specialized xml or other very
structurally formated data, we need to come up with something better for
DSpace than flat fixed QDC and this has been a recent topic in developer
meetings.

4.) We know that projects such as Hydra recommend the Hydra Common Model for
structuring metadata and content so that various types of metadata,
descriptive, structural, administrative, technical, provenance, etc can be
somewhat more predictable across Fedora Instances.

https://wiki.duraspace.org/display/hydra/Hydra+content+models+and+disseminators

5.) We know we eventually want to align DSpace and Fedora to allow a "DSpace
with Fedora Inside".

Given all these points, my recommendation is the following, in both cases,
your original metadata should be stored as Bitstreams/Datastreams so that
the applications can both utilize common tooling like GSearch and Solr to
provide consistent transformation of that metadata to search engines such as
Solr.  However, this leaves you in a postion of having to replicate and map
that metadata to DSpace QDC on your own for the short term.  In the longer
term, I believe that the following can be targeted and fixed to enhance
DSpace's capabilities and prepare us for an integration with Fedora.

a.) we need write Curation Tools or MediaFilters that parse metadata stored
into Bitstreams and map it into DSpace Metadata fields.

b.) We need to come up with a simple means to update metadata bitstreams for
which the simplest is replacing the metadata bitstream with new content.

c.) Rather than trying to map to QDC to get to the point content can be
indexed into Solr, we need to write a means create custom Solr/Discovery
indexers for your metadata content type that will take the bitstream and
update the fields of the solr record being placed into solr enabling a
direct crosswalk from your metadata format to Solrs Document format (similar
to the behavior in GSearch)

Interestingly, I think we see that Fedora and GSearch may be able to provide
us with the solutions to a number of these points someday in the future.

Overall, I think the best advice I can give until we evolve DSpace in this
directions is to "Store your Original Metadata in your Original Format In
DSpace Bitstreams" so that you still have it when the applications have
attained these capabilities and likewise, because its just a
good preservation practice all around.

Cheers,
Mark Diggory





On Thu, Oct 13, 2011 at 2:37 PM, Scott Hammel <sc...@clemson.edu> wrote:

> Yeah ... a vote of confidence for the work Gert and his team are doing:
> GSearch takes a lot of the headache out of indexing any XML datastream
> (or combination of them) on your objects into a powerful search index
> (and with Solr you get some geospatial index/query helpers).
>
> Scott
>
> On 10/13/2011 03:41 PM, Gert Schmeltz Pedersen wrote:
> > I could add, that if you want to use Solr (with Lucene inside) the
> straightforward way to make your Fedora objects searchable is to generate
> Solr index documents with Fedora GSearch.
> >
> > Gert
> >
> >
> > On 13/10/2011, at 16.57, Kevin P. Foote wrote:
> >
> >> Thanks for the feed back ..
> >>
> >> Main goal is to make this data available and search-able to a larger
> >> audience.
> >>
> >> to browser - yes (needs the plugin)
> >>
> >> to specialized clients - yes
> >>
> >> georef - i believe so .. more detail shortly :-)
> >>
> >>
> >> ------
> >> thanks
> >>   kevin.foote
> >>
> >> On Thu, 13 Oct 2011, aj...@virginia.edu wrote:
> >>
> >> ->  Putting MrSid images into Fedora objects will not be particularly
> hard, unless they are remarkably large. I suspect that your concerns will
> end up being centered more around the issue of getting them out to users in
> a useful way, because MrSid is not a very open format, to say the least.
> >> ->
> >> ->  What is it that you need to do with this material? Is it to be
> delivered to browsers? To specialized clients? Is it georeferenced imagery
> for use with GIS software, or simply scans of maps?
> >> ->
> >> ->  ---
> >> ->  A. Soroka
> >> ->  Online Library Environment
> >> ->  the University of Virginia Library
> >> ->
> >> ->
> >> ->
> >> ->
> >> ->  On Oct 13, 2011, at 10:42 AM, aj...@virginia.edu wrote:
> >> ->
> >> ->  >  Fedora does include a simple DC metadata stream with each object
> in a repository. This is to support basic administration and maintenance. It
> is _not_ meant to provide a platform for discovery or search.
> >> ->  >
> >> ->  >  Fedora's abilities to store metadata for an object are amongst
> the most flexible you will find in the sphere of object repository software.
> Anything you like can be stored in a datastream. Many institutions prefer to
> use XML serializations, but that is not a constraint.
> >> ->  >
> >> ->  >  Fedora also offers special treatment for RDF data with automatic
> indexing to a triple store available.
> >> ->  >
> >> ->  >  If your use case amounts to storing some specialized geospatial
> metadata in an allocated datastream, you will have no problem doing that.
> You probably will _not_ want to rely on the repository-maintained DC
> metadata for anything other than administration and simple harvesting.
> Creating a discovery service around a repository is an entirely separate
> question, and there are lots of good resources and solution packages
> available. You may want to examine some of the web application frameworks
> for Fedora, like Islandora or Hydra.
> >> ->  >
> >> ->  >  ---
> >> ->  >  A. Soroka
> >> ->  >  Online Library Environment
> >> ->  >  the University of Virginia Library
> >> ->  >
> >> ->  >
> >> ->  >
> >> ->  >
> >> ->  >  On Oct 13, 2011, at 10:22 AM, Kevin P. Foote wrote:
> >> ->  >
> >> ->  >>  Hi all,
> >> ->  >>
> >> ->  >>  Apologies for the xpost .. but sort of relevant to both
> repository
> >> ->  >>  implementations (at least for me).
> >> ->  >>
> >> ->  >>  We are currently using DSpace (moving to latest version soon). I
> have
> >> ->  >>  a general high level type metadata question and did not know
> where else
> >> ->  >>  to post, so here goes. (perhaps someone can point me to a better
> list)
> >> ->  >>
> >> ->  >>  We have a largish (in our terms) project that involves map data
> or rather
> >> ->  >>  (.sid) images[1] produced from said map data.
> >> ->  >>
> >> ->  >>  We currently have an in-house application that catalogs these
> images and
> >> ->  >>  stores some crazy 90 field metadata info within it.
> >> ->  >>
> >> ->  >>  My question is what is the best way (read any way) to handle
> getting
> >> ->  >>  this content into dspace (or fedora commons) in an intelligent
> manor.
> >> ->  >>
> >> ->  >>  My understanding is that dspace and fedora use the dc-metadata
> >> ->  >>  standard to search, catalog, and provide a common way for
> libraries and
> >> ->  >>  repository software get at content.
> >> ->  >>
> >> ->  >>
> >> ->  >>  Would this additional metadata get in the way with operation?
> >> ->  >>
> >> ->  >>  Would it be best to create dc records for each item and then
> augment the
> >> ->  >>  dc info with this complete additional metadata set in a new type
> of metadata
> >> ->  >>  (not in the dc)?
> >> ->  >>
> >> ->  >>  Is there a common standard for map type metadata? (USGS?)
> >> ->  >>
> >> ->  >>
> >> ->  >>  Any help pointers appreciated..
> >> ->  >>
> >> ->  >>
> >> ->  >>  [1] images are 'Multi-resolution Seamless Image Database' files
> from
> >> ->  >>  what I gather. Related to ArcGIS, ERDAS software..
> >> ->  >>
> >> ->  >>  ------
> >> ->  >>  thanks
> >> ->  >>   kevin.foote
> >> ->  >>
> >> ->  >>
>  
> ------------------------------------------------------------------------------
> >> ->  >>  All the data continuously generated in your IT infrastructure
> contains a
> >> ->  >>  definitive record of customers, application performance,
> security
> >> ->  >>  threats, fraudulent activity and more. Splunk takes this data
> and makes
> >> ->  >>  sense of it. Business sense. IT sense. Common sense.
> >> ->  >>  http://p.sf.net/sfu/splunk-d2d-oct
> >> ->  >>  _______________________________________________
> >> ->  >>  Fedora-commons-users mailing list
> >> ->  >>  Fedora-commons-users@lists.sourceforge.net
> >> ->  >>
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> >> ->  >
> >> ->  >
> >> ->  >
>  
> ------------------------------------------------------------------------------
> >> ->  >  All the data continuously generated in your IT infrastructure
> contains a
> >> ->  >  definitive record of customers, application performance, security
> >> ->  >  threats, fraudulent activity and more. Splunk takes this data and
> makes
> >> ->  >  sense of it. Business sense. IT sense. Common sense.
> >> ->  >  http://p.sf.net/sfu/splunk-d2d-oct
> >> ->  >  _______________________________________________
> >> ->  >  Fedora-commons-users mailing list
> >> ->  >  Fedora-commons-users@lists.sourceforge.net
> >> ->  >
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> >> ->
> >> ->
> >> ->
>  
> ------------------------------------------------------------------------------
> >> ->  All the data continuously generated in your IT infrastructure
> contains a
> >> ->  definitive record of customers, application performance, security
> >> ->  threats, fraudulent activity and more. Splunk takes this data and
> makes
> >> ->  sense of it. Business sense. IT sense. Common sense.
> >> ->  http://p.sf.net/sfu/splunk-d2d-oct
> >> ->  _______________________________________________
> >> ->  Fedora-commons-users mailing list
> >> ->  Fedora-commons-users@lists.sourceforge.net
> >> ->  https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> >> ->
> >>
> >>
> ------------------------------------------------------------------------------
> >> All the data continuously generated in your IT infrastructure contains a
> >> definitive record of customers, application performance, security
> >> threats, fraudulent activity and more. Splunk takes this data and makes
> >> sense of it. Business sense. IT sense. Common sense.
> >> http://p.sf.net/sfu/splunk-d2d-oct
> >> _______________________________________________
> >> Fedora-commons-users mailing list
> >> Fedora-commons-users@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> >
> >
> ------------------------------------------------------------------------------
> > All the data continuously generated in your IT infrastructure contains a
> > definitive record of customers, application performance, security
> > threats, fraudulent activity and more. Splunk takes this data and makes
> > sense of it. Business sense. IT sense. Common sense.
> > http://p.sf.net/sfu/splunk-d2d-oct
> > _______________________________________________
> > Fedora-commons-users mailing list
> > Fedora-commons-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> >
>
>
> --
> CCIT
> Clemson University
> 864-656-8118
> Free/Busy Calendar: http://bit.ly/dBeBzo
>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>



-- 
[image: @mire Inc.]
*Mark Diggory*
*2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010*
*Esperantolaan 4, Heverlee 3001, Belgium*
http://www.atmire.com
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to