On Wed, Aug 13, 2008 at 8:13 AM, Mark H. Wood <[EMAIL PROTECTED]> wrote:
> Sad to say, there are probably as many automated ways of building > batches as there are DSpace sites. What you do will depend on the > form in which you can get the data. This is my experience too. I wrote a tiny Python library of DSpace-automation-stuff (with classes for building a contents file, a dublin_core.xml file, a mapfile [yes, that's rare, but it has happened], breaking up a namelist from a citation or from HTML, and parsing a name) that I remix as needed for new projects. (Next on the list to add to it: better file/folder management, because I'm so error-prone when I write that stuff...) I can see that I'll have to rewrite a lot of this to create SWORD packages instead. So be it; I think SWORD is a better way and I'll be able to do more with it. (I got to talking with some people long ago about drop boxes for the repository, and it just plain broke my brain, how hard that was going to be. SWORD makes it a good deal more feasible to write drop boxes and hands-off gateways, I think.) For my sins, I do a lot of HTML screenscraping -- back issues of e-periodicals, mostly. That's all ad-hoc, as no two e-periodicals have the same HTML. It tends to be an 80/20 problem (give or take 10% based on HTML quality and consistency); I can whack out most of the metadata with regular expressions and my namelist/name parsers, but not all of it. Information is often lurking in PDFs, which means handwork. I say all this to (I hope) help people understand what the bounds around what's feasible look like for untalented scripters. Dorothea -- Dorothea Salo [EMAIL PROTECTED] Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

