On Tue, Nov 13, 2012 at 10:13 PM, Nick <[email protected]> wrote: > Has anyone gone about the process of collecting open-access articles (from > your university's faculty) from online respotiories or databases and then > bringing all that data to your institutional repository? I was looking > around on Google and could find very little information about such projects. > Can anyone provide a broad and simple overview of the process and scripts > that they used?
Hi Nick, we've been periodically importing data from Scopus and WoS. They allow you to export the results of a search query, in our case it's different forms of the institution name, but it can also be a list of authors, or anything, really. Then I created some scripts which convert the exported CSVs to one CSV in a format that DSpace can ingest using the Batch Metadata Editor. Importantly, there are the UT and Scopus ID identifiers included and my script looks up, whether they already are in repository (a lookup in the full export from the repository in CSV format) and eliminates these from the current round of import. Then you _have to_ do some manual work. Namely find duplicates by name between the two databases (and between each and the records in repository), but it's not always an exact match. I've been thinking of making it easier by finding the candidates using Levenshtein distance of titles. Next thing in manual editing is fixing author names, which usually lose most non-ASCII characters. Also fix other metadata like volume, issue, starting and ending page, conference date format etc., add links to full text online and DOI. You also have to ask the authors for any preprints or postprints and upload the bitstreams to DSpace manually (after import). I also enrich the data from other sources, like the SNIP and SJR citation metrics. My scripts are quite customized, so you'd have to edit them a little to suit your needs, but I can share them with you if you want. They're written in Python and work only with CSV files (no connection to the repository necessary) - Scopus export, WoS export, last complete export of DSpace - and output the DSpace import CSV. I can say it works to our satisfaction and almost everything that can be is automated (except the lookup and export from the websites, which could be done with something like Mechanize). Does that sound like something you're interested in? Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

