Ah, thank you, helix.  That's exactly what I needed.  I hadn't noticed
the command-line metadata export (I've only been using the web one).

After importing via AIP (to capture community/collection hierarchy and
bitstreams), I exported via CSV and reset the dc.date.accessioned and
dc.date.available to a sensible string manually, like
2014-01-21T18:58:46Z.  Minor detail, but will probably help some future
travelers.

Cheers,

Alan

On 01/21/2014 12:13 AM, helix84 wrote:
>
> See the -a flag and the configuration option.
>
> https://wiki.duraspace.org/display/DSDOC4x/Batch+Metadata+Editing#BatchMetadataEditing-ExportFunction
>
> On Jan 20, 2014 8:25 PM, "Alan Orth" <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Hi,
>
>     I've just decided I will export the metadata (CSV) and clean it up
>     manually, then re-import before I export via AIP.  This works
>     great for
>     the dc.identifier.uri (handle link), but I just realized that
>     dc.date.accessioned and dc.date.available aren't in the exported
>     metadata.
>
>     I assume these fields are in the database, so I'll have to use SQL to
>     clean them up after importing via AIP?  I'm not sure where to look in
>     the DB...
>
>     Thanks,
>
>     Alan
>
>     On 01/17/2014 09:12 AM, Alan Orth wrote:
>     > Thanks, both Tim and Helix.
>     >
>     > Yes, I initially looked into the "-r" mode, but then realized
>     that, as
>     > Tim mentioned, our development instance doesn't necessarily create
>     > proper handles.  Our development instance is more of a code-testing
>     > ground, and we don't sync the content very frequently.  Also, the
>     > date-related meta data isn't necessarily correct either, as the
>     > accession into the development instance (for quality assurance)
>     isn't
>     > necessarily the accession date we'd want.
>     >
>     > I think I'll have to rely on a two-step approach: first
>     ingesting via
>     > AIP to get community/collection hierarchy and bitstreams, then
>     meta data
>     > cleanup of the resulting community to clean the "old" URIs and
>     accession
>     > dates etc.
>     >
>     > Thanks for bouncing some ideas around!
>     >
>     > Alan
>     >
>     > On 01/16/2014 06:39 PM, Tim Donohue wrote:
>     >> Hi Alan,
>     >>
>     >> On 1/16/2014 9:10 AM, Alan Orth wrote:
>     >>> Hi,
>     >>>
>     >>> I've got a development instance where we uploaded a few
>     hundred items
>     >>> (in one community and several collections).  Our editors spent
>     some time
>     >>> manually uploading bit streams to many of these items.  Now I
>     want to
>     >>> migrate the community and its hierarchy to the production
>     instance.  We
>     >>> can't use the CSV via "Export Metadata" because of the bit
>     streams, so
>     >>> I've been looking at using AIP, ie:
>     >>>
>     >>> dspace packager -s -a -t AIP -e [email protected] <mailto:[email protected]>
>     -p 10568/0 33474.zip
>     >>>
>     >>> This works great, but the resulting items now have two of each
>     of the
>     >>> following fields:
>     >>>      - dc.date.accessioned
>     >>>      - dc.date.available
>     >>>      - dc.identifier.uri
>     >>>
>     >>> I can't figure out a work flow that doesn't produce this effect...
>     >> These three fields are unfortunately auto-generated by DSpace
>     whenever
>     >> you treat an AIP as a submission information package (SIP),
>     which is
>     >> what the -s option. Essentially, the '-s' option assumes this
>     is new
>     >> content, so DSpace defines these fields as:
>     >>     * dc.date.accessioned - the date this new content was added
>     to DSpace
>     >>     * dc.date.available - the date this new content became
>     available
>     >> in DSpace (i.e. finished approval workflow)
>     >>     * dc.identifier.uri - the assigned Handle for this object
>     >>
>     >> For your situation, you may need to consider some metadata related
>     >> questions.
>     >>
>     >> * Does your development instance assign proper Handles?  If
>     not, then
>     >> you *need* Production to assign a new dc.identifier.uri.  This may
>     >> mean that you'll have to unfortunately do some post-metadata
>     cleanup
>     >> (perhaps via the Bulk Metadata Editor) of the invalid "development"
>     >> handles in the dc.identifier.uri fields. DSpace never overwrites or
>     >> removes existing metadata.
>     >>
>     >> * Do you want the "date.accessioned" and "date.available"
>     fields to be
>     >> set to the dates the Item was added to *development* or to
>     >> *production*? If the latter, again, you may unfortunately need
>     to do
>     >> some post-metadata cleanup, as DSpace specifically *never*
>     >> removes/overwrites existing metadata fields.
>     >>
>     >>
>     >> Depending on your setup/answers to your questions, there are three
>     >> possible AIP import options I can see:
>     >>
>     >> 1a) Use "Restore/Replace" option instead (-r) when migrating to
>     >> Production.
>     >>
>     >> If you treat this as an AIP "restoration" then DSpace will skip
>     >> creating "date.accessioned", "date.available" and "identifier.uri"
>     >> fields and assume that the provided values in the AIPs are
>     correct (as
>     >> it assumes you are restoring a set of deleted objects).
>      WARNING: If
>     >> the 'dc.identifier.uri' in the AIP does NOT correspond to a valid
>     >> Handle, then you will end up with invalid Handles in
>     Production! (See
>     >> next option.)
>     >>
>     >> More on Restore/Replace:
>     >>
>     
> https://wiki.duraspace.org/display/DSDOC4x/AIP+Backup+and+Restore#AIPBackupandRestore-Restoring/ReplacingusingAIP(s)
>     
> <https://wiki.duraspace.org/display/DSDOC4x/AIP+Backup+and+Restore#AIPBackupandRestore-Restoring/ReplacingusingAIP%28s%29>
>     >>
>     >>
>     >> 1b) When using "Restore/Replace", you may want/need to override
>     some
>     >> of the default options. For example, restoration will always assume
>     >> the 'dc.identifier.uri' is a valid Handle (so a new Handle will
>     not be
>     >> assigned). Restoration will also always attempt to restore an
>     object
>     >> under the *specified* parent object in the AIP -- so, this
>     means if a
>     >> Collection was under a Community with ID "123456789/1" in your
>     >> development instance, then it will be restored under a Community of
>     >> the *same ID* in Production
>     >>
>     >> Luckily, these defaults can be overridden. See the
>     'ignoreHandle' and
>     >> 'ignoreParent' Advanced options documented here:
>     >>
>     >>
>     
> https://wiki.duraspace.org/display/DSDOC4x/AIP+Backup+and+Restore#AIPBackupandRestore-AdditionalPackagerOptions
>     >>
>     >>
>     >> 2) The other option is to still use Submission (-s) option, but use
>     >> one or more of the Advanced options (in 1b) to tweak the defaults.
>     >>
>     >> I know this is a lot of info, but hopefully it gives you some
>     ideas to
>     >> go on.
>     >>
>     >> - Tim
>
>     --
>     Alan Orth
>     [email protected] <mailto:[email protected]>
>     http://alaninkenya.org
>     http://mjanja.co.ke
>     "I have always wished for my computer to be as easy to use as my
>     telephone; my wish has come true because I can no longer figure
>     out how to use my telephone." -Bjarne Stroustrup, inventor of C++
>     GPG Public Key: 0xf92c4bd91084bb5de14e20be9470dd588dd1026c
>
>
>
>     
> ------------------------------------------------------------------------------
>     CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>     Learn Why More Businesses Are Choosing CenturyLink Cloud For
>     Critical Workloads, Development Environments & Everything In Between.
>     Get a Quote or Start a Free Trial Today.
>     
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>     _______________________________________________
>     DSpace-tech mailing list
>     [email protected]
>     <mailto:[email protected]>
>     https://lists.sourceforge.net/lists/listinfo/dspace-tech
>     List Etiquette:
>     https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>

-- 
Alan Orth
[email protected]
http://alaninkenya.org
http://mjanja.co.ke
"I have always wished for my computer to be as easy to use as my telephone; my 
wish has come true because I can no longer figure out how to use my telephone." 
-Bjarne Stroustrup, inventor of C++
GPG Public Key: 0xf92c4bd91084bb5de14e20be9470dd588dd1026c

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to