On Tue, Nov 13, 2012 at 10:13 PM, Nick <[email protected]> wrote:
> Has anyone gone about the process of collecting open-access articles (from
> your university's faculty) from online respotiories or databases and then
> bringing all that data to your institutional repository? I was looking
> around on Google and could find very little information about such projects.
> Can anyone provide a broad and simple overview of the process and scripts
> that they used?

Hi Nick,

we've been periodically importing data from Scopus and WoS. They allow
you to export the results of a search query, in our case it's
different forms of the institution name, but it can also be a list of
authors, or anything, really. Then I created some scripts which
convert the exported CSVs to one CSV in a format that DSpace can
ingest using the Batch Metadata Editor. Importantly, there are the UT
and Scopus ID identifiers included and my script looks up, whether
they already are in repository (a lookup in the full export from the
repository in CSV format) and eliminates these from the current round
of import.

Then you _have to_ do some manual work. Namely find duplicates by name
between the two databases (and between each and the records in
repository), but it's not always an exact match. I've been thinking of
making it easier by finding the candidates using Levenshtein distance
of titles. Next thing in manual editing is fixing author names, which
usually lose most non-ASCII characters. Also fix other metadata like
volume, issue, starting and ending page, conference date format etc.,
add links to full text online and DOI. You also have to ask the
authors for any preprints or postprints and upload the bitstreams to
DSpace manually (after import).

I also enrich the data from other sources, like the SNIP and SJR
citation metrics.

My scripts are quite customized, so you'd have to edit them a little
to suit your needs, but I can share them with you if you want. They're
written in Python and work only with CSV files (no connection to the
repository necessary) - Scopus export, WoS export,
last complete export of DSpace - and output the DSpace import CSV. I
can say it works to our satisfaction and almost everything that can be
is automated (except the lookup and export from the websites, which
could be done with something like Mechanize). Does that sound like
something you're interested in?


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to