Re: [Dspace-general] Uploading 600+ CSV Records With "local" Definitions

helix84 Tue, 20 Aug 2013 13:53:38 -0700

On Tue, Aug 20, 2013 at 9:43 PM, Thomas Ronayne
<[email protected]> wrote:
> I have one question about the Metadata Schema Registry, specifically
> about "local" Namespace Elements. I have a bunch of date data types that
> are not found in the DC codes. What I'm wondering is is it possible to
> create a local element and qualifier? Something like
>
>     local:date:datepurchased or local:date:purchased
>
> without causing trouble (there's a bunch of date fields to be loaded)?


Yes, that certainly is possible, DSpace uses the
namespace.element.qualifier "schema" internally, so it works for all
namespaces, not just dc.

Secondly, there's nothing special about dates, to DSpace they're just
strings, so use whatever format you need. The only exception to that
is if the date is used for indexing/sorting  by default only
dc.date.issued and dc.date.accessioned are:
webui.browse.index.1 = dateissued:item:dateissued
webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
recent.submissions.sort-option = dateaccessioned


> Now I have to upload 600+ book records that will contain both DC and
> "local" elements and qualifiers. I believe that I can simply list the
> fields in the correct order at the top of a CSV file, mixing DC and
> "local" in the order the fields appear in the CSV. They will be loaded
> into a "clean" data base because they're the staring records. There are
> no records other than books and there will be 60,000+ of those that will
> be manually entered (these are books dating from the 13th - 19th and
> early 20th centuries in multiple languages).
>
> I did create the local registry as namespace "HICL" and  name "local"
> rather than adding on to the DC registry.
>
> If that sounds about right, I'd like to know -- or, if it's not right,
> I'd really appreciate knowing that.
>
> I manually entered a couple of records  then exported them to CSV to see
> what the format needs to be; that leads to a question or two:
>
>     the first filed, "id," is a number, 18 -- is there something special
>     about 18? The DC is 1, so why 18?

This is just the value of the item_id column of the item table and is
autoincremented using the item_seq sequence. Use "+" to insert new
items.

>     the collection is 123456789/7 (I was fiddling around trying to get
>     it work, probably 6 times); is that number going to be something
>     anything will care about later? Should I make it something else for
>     an initial load? Can it be reset to 123456789/1 somehow (or should I
>     just use some thing else)?

123456789 is the handle prefix, defined by "handle.prefix" in
dspace.cfg. You shouldn't change it unless you want to actually
register with Handle.net.

Handle.net is meant to be a globally unique identifier. 123456789 is
just the default value until you decide to set up the global part.

7 is the handle postfix and is maintained locally. What DSpace does is
treat it sequentially, so even if you delete one, it never gets
assigned again.

Check out the handle table and the handle_seq sequence. If you want to
start over, a handy shortcut is
dspace/etc/postgres/update-sequences.sql

> I'm going to be using AWK to "rewrite" the CSV data I have (exported
> from an old FoxBase data base) is proper form; i.e., strings in double
> quotes, put author names in first and last name fields and the like.
> That's not a big deal, it's actually pretty easy but I'd really like to
> know what the gotchas are beforehand (don't want to do this another 6 or
> 60 times). For example, all the publication dates are the year only --
> like 1375, 1749, 1810, etc. I'm going to append 06-30 (e.g., 1375-06-30)
> to every year-only date (so there's no problem with ISO date or the
> Gregorian Calendar switch at various times; just thinking ahead here.
>
> As an aside, I have preferred to use vertical bars (|) as separators in
> CSV files for bulk loading data; e.g., a 10,000 row file of geographic
> names (nothing to do with DSpace). Vertical bars do not appear in any
> known language and there's no need to enclose string in double quotes
> (with any DBMS I've ever used, including PostgreSQL). I'm wondering if
> there is some way to define the field separator with the CSV loading
> utility just to make my life a little, teeny bit easier?

No, DSpace just uses CSV with commas and double quotes. Internally, an
Apache Commons CSV library handles that. There are no config or
command line options to change that, you'd have to dig into the code.
Anyway, I'm sure you know a dozen of ways to convert one CSV format to
another.


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Re: [Dspace-general] Uploading 600+ CSV Records With "local" Definitions

Reply via email to