Hi. I think you can use the mapfile and --resume to import only items not in the mapfile.
(mapfile is just a list of handle/folder pairs - one for each item imported) --replace may also be useful for updating items dsrun org.dspace.app.itemimport.ItemImport --replace [EMAIL PROTECTED] --collection=collectID --source=items_dir --mapfile=mapfile "Replacing items uses the map file to replace the old items and still retain their handles." See http://dspace.org/technology/system-docs/application.html#itemimporter I hope this helps. Cheers, Stephen On 2/27/07, Pan Family <[EMAIL PROTECTED]> wrote: > Yes, I can import items in batch mode now. Thanks! > I have also tried to import two items under two directories, > item_001 and item_002, and DSpace imported them all > at once, which is what I wanted. But DSpace does not > seem to know that the items are already in its database > and it will import them as many times as I asked it to. > So it looks that for automatically importing only the delta > of a document collection spred out under directories and > sub-directories, I'll need to write some code. > Has anyone done this before? > > FYI, I am using DSpace for a distributed data center > at JPL, a Caltech laboratory. > > > Thanks, > > -Pan > > > On 2/23/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote: > > > > > > > > Your import is fine now ? > > > > (1) It's fine if u have used none.I edited the metadata registry and added > the conference qualifier for a second creator element. You can refer > w3schools.com for basic XML. > > (2) No problem. > > > > (1) mapfile stores the details of files imported using batch import. You > can note that incase u need to remove those imported files this mapfile is > required. > > (2) For each item we have created a directory structure in > archive_directory. i.e item_001, item_002 etc. > > > > You are using Dspace for individual use or corporate organization. > > > > Jayan > > > > ________________________________ > From: Pan Family [mailto:[EMAIL PROTECTED] > > Sent: Sat 2/24/2007 12:27 PM > > To: Jayan Chirayath Kurian > > > > Cc: [email protected] > > Subject: Re: [Dspace-tech] how can I find out the collectionID? > > > > > > > > Yes, it did help!!! > > > > Still two problems: > > (1) ... element="creator" qualifier="conference" or qualifier="email" ... > > caused some exception until I changed qualifier="none" > > But in your example, "conference" was the qualifier. > > Where can I find more info. on how to write good Dublin_core.xml? > > (2) what is this about? Can I ignore it? > > Processing handle file: handle > > It appears there is no handle file -- generating one > > > > Questions: > > (1) A map file is gnereated, but what is it for? > > (2) What if I have several documents, each is an item, > > under one directory, say Items_001? Do I prepare > > multiple corresponding .xml files? Do I list all the > > file names in the file contents? > > > > Thanks! > > > > -Pan > > > > > > > > > > > > > > > > > > On 2/23/07, Jayan Chirayath Kurian < [EMAIL PROTECTED]> wrote: > > > > > > > > > > > > i have Dspace 1.4.1 on windows 2003. > > > > > > (1)My directory structure is C:\DSpace\bin\archive_directory > > > (2)The "archive_directory" contains the folder Item_001 > > > (3) Item_001 folder contains (1) Dublin_core.XML (2) contents file and > (3) test.pdf > > > please check the name of the file. It should be contents and not > contents.txt > > > To rename contents.txt to contents, i used REN contents.txt contents at > command prompt. > > > (4) dsrun org.dspace.app.itemimport.ItemImport -a > [EMAIL PROTECTED] -c=123456789/2 -s=C:\DSpace\bin\archive_directory > -m=mapfile10 > > > > > > I hope this helps. > > > > > > Jayan > > > > > > > > > ________________________________ > > > > From: Pan Family [mailto:[EMAIL PROTECTED] > > > Sent: Sat 2/24/2007 11:02 AM > > > To: Jayan Chirayath Kurian > > > Cc: [email protected] ; > [EMAIL PROTECTED] > > > > > > Subject: Re: [Dspace-tech] how can I find out the collectionID? > > > > > > > > > > > > Hi Jayan (or anyone who knows how to do batch submission): > > > > > > I am still unable to do batch submission. Here is what I did: > > > (1) Created a directory, /Users/pan/tmp and put 3 files under it: > > > Content (a text file, attached); Dublin_core.xml (attached); and > > > batch_import.pdf (the doc I wanted to submit to DSpace); > > > (2) Ran: > > > pan$ dsrun org.dspace.app.itemimport.ItemImport --add > [EMAIL PROTECTED] --collection=123456789/2 > --source=/Users/pan/tmp --mapfile=/Users/pan/test_map > > > Destination collections: > > > Owning Collection: PODAAC collection > > > Adding items from directory: /Users/pan/tmp > > > Generating mapfile: /Users/pan/test_map > > > > > > No error message was shown, but the pdf file was not imported. > > > An empty test_map file was generated. I also ran filter-media > > > and found that all bitstreams were skipped because no new > > > doc has been added. > > > > > > I found out from 1.4.1 beta 1 System Doc (pp. 22) that > > > there are batch tools and registration is an althernate means > > > to upload bitstreams, but no details or examples are provided. > > > Can you provide links to more details or examples please? > > > > > > Thanks a lot for your help! > > > > > > -Pan > > > > > > > > > > > > > > > > > > On 2/1/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > > > > > You solved your problem in importing documents or are u using the > interface to upload documents into the repository. > > > > > > > > > > > > > > > > Jayan > > > > > > > > > > > > > > > > ________________________________ > > > > > > > > > From: Pan Family [mailto:[EMAIL PROTECTED] > > > > Sent: Friday, February 02, 2007 5:19 AM > > > > To: Jayan Chirayath Kurian > > > > > > > > Subject: Re: [Dspace-tech] how can I find out the collectionID? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks a lot! > > > > > > > > -Pan > > > > > > > > > > > > On 1/31/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > <? xml version="1.0" encoding="iso-8859-1" ?> > > > > > > > > - <!-- title of pdf AMIC_1984_10_CM_03.pdf > > > > > > > > > > > > --> > > > > > > > > - <dublin_core> > > > > > > > > <dcvalue element=" creator" qualifier ="conference ">AMIC-Chiangmai > University Refresher Course on Communication Research Methodology : > Chiangmai, Oct 29-Nov 2, 1984.</dcvalue > > > > > > > > > <dcvalue element=" title" qualifier ="none ">The Logic of Social > Science Research. </dcvalue > > > > > > > > > <dcvalue element=" contributor" qualifier ="author ">Atal, Yogesh. > </dcvalue > > > > > > > > > <dcvalue element=" date" qualifier ="issued ">1984-10-29 </ dcvalue > > > > > > > > > > </dublin_core> > > > > > > > > > > > > > > > > > > > > > > > > ________________________________ > > > > > > > > > From: Pan Family [mailto: [EMAIL PROTECTED] > > > > Sent: Thursday, February 01, 2007 3:52 AM > > > > To: Jayan Chirayath Kurian > > > > Cc: [email protected] > > > > > > > > > > > > > > > > Subject: Re: [Dspace-tech] how can I find out the collectionID? > > > > > > > > > > > > > > > > > > > > Could you please kindly provide a sample Dublin_core.xml? > > > > > > > > I assumed that dsrun would recursively go through the > > > > directories and index all the files under them. Apparently > > > > I was wrong. The requirement of Dublin_core.xml and > > > > the content file makes the process much less automatic. > > > > Is there a way around this? > > > > > > > > Thanks a lot! > > > > > > > > -Pan > > > > > > > > > > > > > > > > > > > > On 1/30/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ________________________________ > > > > > > > > > From: Pan Family [mailto: [EMAIL PROTECTED] > > > > Sent: Wednesday, January 31, 2007 1:15 PM > > > > To: Jayan Chirayath Kurian > > > > Cc: Dorothea Salo; [email protected] > > > > Subject: Re: [Dspace-tech] how can I find out the collectionID? > > > > > > > > > > > > > > > > Ok. I will give this a try. > > > > > > > > Still two questions: > > > > (1) Where can I get the file Dublin_core.XML? > > > > > > > > Dublin_core.xml contains the meta data descriptions of the resource > (e.g. title, date published etc). You have to create the xml file using a > notepad. > > > > > > > > (2) Let's say I only want to index one file named: foo.pdf, and I put > > > > it under /Users/pan/tmp/foo.pdf and pass src=/Users/pan to dsrun > > > > Is foo.pdf considered the content file or the resource? And > which is > > > > the third type of file? > > > > > > > > foo.pdf is the resource (i.e. pdf or ppt or jpeg…..) > > > > > > > > Content file is a text file that just contains the name of the > resource i.e. foo.pdf > > > > > > > > > > > > > > > > Thanks a lot! > > > > > > > > -Pan > > > > > > > > > > > > On 1/30/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > I feel the tmp directory should have (1) the Dublin_core.XML (2) > contents file and (3) actual resource. The tmp directory should have all > these files without any more subdirectories for these files. Can you try > with source=/Users/pan/ and removing all subdirectories under tmp and having > only these 3 files listed above. Hope it works. > > > > > > > > > > > > > > > > My structure is src = C:\DSpace\bin\archive_directory > > > > > > > > The archive_directory contains the directory Item_001 > > > > > > > > Item_001 contains (1) Dublin_core.XML (2) contents file and (3) actual > resource. > > > > > > > > There are no more subdirectories under Item_001. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Jayan > > > > > > > > > > > > > > > > ________________________________ > > > > > > > > > From: Pan Family [mailto: [EMAIL PROTECTED] > > > > Sent: Wednesday, January 31, 2007 4:06 AM > > > > To: Jayan Chirayath Kurian > > > > Cc: Dorothea Salo; [email protected] > > > > > > > > > > > > > > > > Subject: Re: [Dspace-tech] how can I find out the collectionID? > > > > > > > > > > > > > > > > > > > > Thanks for your help! > > > > > > > > I am working on Mac OS X. Yes, "pan" contains "tmp" > > > > > > > > It seems that for me the dir that I give to source= cannot contain any > > > > subdirs. For example, if I give it "/Users/pan/" I got an error > > > > complaining about the missing file ".fvwm/dublin_core.xml" > > > > .fvwm is a subdir under "Users/pan/" > > > > > > > > If I give it "/Users/pan/tmp/" > > > > then it complains about the same missing file under the subdirs > > > > of "tmp" until I removed all the subdirs under "tmp" > > > > But I still don't get the files under "tmp" imported to my collection, > > > > even if no error shows after I removed all subdirs. > > > > > > > > bubba:$ dsrun org.dspace.app.itemimport.ItemImport > --add [EMAIL PROTECTED] --collection=123456789/2 > --source=/Users/pan/ --mapfile=/Users/pan/test_map --test > > > > **Test Run** - not actually importing items. > > > > Destination collections: > > > > Owning Collection: PODAAC collection > > > > Adding items from directory: /Users/pan/ > > > > Generating mapfile: /Users/pan/test_map > > > > Adding item from directory .fvwm > > > > java.io.FileNotFoundException : > /Users/pan/.fvwm/dublin_core.xml (No such file or > directory) > > > > at java.io.FileInputStream.open(Native Method) > > > > at java.io.FileInputStream.<init>(FileInputStream.java:106) > > > > at java.io.FileInputStream .<init>(FileInputStream.java:66) > > > > at > sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) > > > > at > sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java > :161) > > > > at > org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown > Source) > > > > at > org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown > Source) > > > > at > org.apache.xerces.parsers.XML11Configuration.parse (Unknown > Source) > > > > at > org.apache.xerces.parsers.XML11Configuration.parse(Unknown > Source) > > > > at > org.apache.xerces.parsers.XMLParser.parse(Unknown Source) > > > > at org.apache.xerces.parsers.DOMParser.parse > (Unknown Source) > > > > at > org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown > Source) > > > > at > javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172) > > > > at > org.dspace.app.itemimport.ItemImport.loadXML > (ItemImport.java:1269) > > > > at > org.dspace.app.itemimport.ItemImport.loadDublinCore(ItemImport.java:795) > > > > at > org.dspace.app.itemimport.ItemImport.loadMetadata(ItemImport.java:780) > > > > at > org.dspace.app.itemimport.ItemImport.addItem > (ItemImport.java:626) > > > > at > org.dspace.app.itemimport.ItemImport.addItems(ItemImport.java:498) > > > > at > org.dspace.app.itemimport.ItemImport.main(ItemImport.java:407) > > > > java.io.FileNotFoundException: > /Users/pan/.fvwm/dublin_core.xml (No such file or > directory) > > > > ***End of Test Run*** > > > > > > > > > > > > On 1/29/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > Can you please try with source=/Users/pan/ > > > > > > > > I encountered the same problem on windows platform. This was rectified > by giving the main folder name with the import command. I assume that "pan" > contains the subfolder "tmp" which infact contains the pdf file. Hope you > will let me know if this works with you. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Jayan > > > > > > > > > > > > > > > > ________________________________ > > > > > > > > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] ] On > Behalf Of Pan Family > > > > Sent: Tuesday, January 30, 2007 8:02 AM > > > > To: Dorothea Salo > > > > Cc: [email protected] > > > > Subject: Re: [Dspace-tech] how can I find out the collectionID? > > > > > > > > > > > > > > > > > > > > Hi Dorothea: > > > > > > > > Thanks a lot for your help! > > > > In my case, the handle is 123456789/2. > > > > So I used the following command to add > > > > a pdf file under /User/pan/tmp, but somehow > > > > the pdf file was not added into the collection > > > > and the file test_map is empty. No error > > > > message was shown either. I wonder what > > > > I did wrong. Could you give me some ideas > > > > on how to debug? > > > > > > > > Thanks again, > > > > > > > > -Pan > > > > > > > > bubba:~/dspace-1.4.1-source /bin pan$ dsrun > org.dspace.app.itemimport.ItemImport --add > [EMAIL PROTECTED] --collection=123456789/2 > --source=/Users/pan/tmp/ --mapfile=/Users/pan/tmp/test_map > > > > Destination collections: > > > > Owning Collection: PODAAC collection > > > > Adding items from directory: /Users/pan/tmp/ > > > > Generating mapfile: /Users/pan/tmp/test_map > > > > > > > > > > > > On 1/29/07, Dorothea Salo <[EMAIL PROTECTED]> wrote: > > > > > > > > Pan Family wrote: > > > > > dsrun org.dspace.app.itemimport.ItemImport --add > > > > > [EMAIL PROTECTED] --collection=collectionID --source=items_dir > > > > > --mapfile=mapfile > > > > > > > > > > Hi, > > > > > > > > > > The above command for batch import requires > > > > > the collectionID as input. I wonder how > > > > > I can find out this ID? Is it the string > > > > > that I used to name my collection, or an ID > > > > > that DSpace uses internally? > > > > > > > > You can use the collection's handle for this; go to the > collection's home page > > > > and use the numbers after "handle/" in the URL. > > > > > > > > If you should need the internal DSpace collection ID for some > reason, though, > > > > log in, surf to the collection page, and then use the "Edit" button > under Admin > > > > Tools. From there, choose "Collection's Authorizations," and DSpace > will pop up > > > > the "DB ID" in the title of the page. > > > > > > > > (I hope there's an easier way to do this! There certainly > should be.) > > > > > > > > Dorothea > > > > > > > > -- > > > > Dorothea Salo, Digital Repository Services Librarian > > > > (703)993-3742 [EMAIL PROTECTED] AIM: gmumars > > > > MSN 2FL, Fenwick Library > > > > George Mason University > > > > 4400 University Drive, Fairfax VA 22031 > > > > > > > > > ------------------------------------------------------------------------- > > > > Take Surveys. Earn Cash. Influence the Future of IT > > > > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > > > > opinions on IT & business topics through brief surveys - and earn cash > > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > _______________________________________________ > > > > DSpace-tech mailing list > > > > [email protected] > > > > > https://lists.sourceforge.net/lists/listinfo/dspace-tech > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech > > -- -- Stephen De Gabrielle ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

