Thanks, Stephen!
I used --add --resume and it worked: If the items under my archive_dir
are the same, nothing is added. But if I add new items under
the archive_dir, only the new items are added.
I assume that I can use the same mapfile in this way, and as
I grow the number of items under the archive_dir, my mapfile
will have more and more items listed in the file. Correct?
--replace did not work for me. I got NullPointerException,
as shown below. What is the right way of using --replace?
Thanks,
-Pan
-------- error from --replace -------------
dsrun org.dspace.app.itemimport.ItemImport --replace --eperson=
[EMAIL PROTECTED] --collection=123456789/2 --source=/Users/pan/tmp/
--mapfile=/Users/pan/matfile2.txt
Destination collections:
Owning Collection: PODAAC collection
Replacing: 123456789/18
java.lang.NullPointerException
at org.dspace.app.itemimport.ItemImport.deleteItem(ItemImport.java
:692)
at org.dspace.app.itemimport.ItemImport.replaceItems(ItemImport.java
:567)
at org.dspace.app.itemimport.ItemImport.main(ItemImport.java:411)
java.lang.NullPointerException
On 2/27/07, Stephen De Gabrielle <[EMAIL PROTECTED]> wrote:
Hi.
I think you can use the mapfile and --resume to import only items not
in the mapfile.
(mapfile is just a list of handle/folder pairs - one for each item
imported)
--replace may also be useful for updating items
dsrun org.dspace.app.itemimport.ItemImport --replace
[EMAIL PROTECTED] --collection=collectID --source=items_dir
--mapfile=mapfile
"Replacing items uses the map file to replace the old items and still
retain their handles."
See http://dspace.org/technology/system-docs/application.html#itemimporter
I hope this helps.
Cheers,
Stephen
On 2/27/07, Pan Family <[EMAIL PROTECTED]> wrote:
> Yes, I can import items in batch mode now. Thanks!
> I have also tried to import two items under two directories,
> item_001 and item_002, and DSpace imported them all
> at once, which is what I wanted. But DSpace does not
> seem to know that the items are already in its database
> and it will import them as many times as I asked it to.
> So it looks that for automatically importing only the delta
> of a document collection spred out under directories and
> sub-directories, I'll need to write some code.
> Has anyone done this before?
>
> FYI, I am using DSpace for a distributed data center
> at JPL, a Caltech laboratory.
>
>
> Thanks,
>
> -Pan
>
>
> On 2/23/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote:
> >
> >
> >
> > Your import is fine now ?
> >
> > (1) It's fine if u have used none.I edited the metadata registry and
added
> the conference qualifier for a second creator element. You can refer
> w3schools.com for basic XML.
> > (2) No problem.
> >
> > (1) mapfile stores the details of files imported using batch import.
You
> can note that incase u need to remove those imported files this mapfile
is
> required.
> > (2) For each item we have created a directory structure in
> archive_directory. i.e item_001, item_002 etc.
> >
> > You are using Dspace for individual use or corporate organization.
> >
> > Jayan
> >
> > ________________________________
> From: Pan Family [mailto:[EMAIL PROTECTED]
> > Sent: Sat 2/24/2007 12:27 PM
> > To: Jayan Chirayath Kurian
> >
> > Cc: [email protected]
> > Subject: Re: [Dspace-tech] how can I find out the collectionID?
> >
> >
> >
> > Yes, it did help!!!
> >
> > Still two problems:
> > (1) ... element="creator" qualifier="conference" or qualifier="email"
...
> > caused some exception until I changed qualifier="none"
> > But in your example, "conference" was the qualifier.
> > Where can I find more info. on how to write good Dublin_core.xml?
> > (2) what is this about? Can I ignore it?
> > Processing handle file: handle
> > It appears there is no handle file -- generating one
> >
> > Questions:
> > (1) A map file is gnereated, but what is it for?
> > (2) What if I have several documents, each is an item,
> > under one directory, say Items_001? Do I prepare
> > multiple corresponding .xml files? Do I list all the
> > file names in the file contents?
> >
> > Thanks!
> >
> > -Pan
> >
> >
> >
> >
> >
> >
> >
> >
> > On 2/23/07, Jayan Chirayath Kurian < [EMAIL PROTECTED]> wrote:
> > >
> > >
> > >
> > > i have Dspace 1.4.1 on windows 2003.
> > >
> > > (1)My directory structure is C:\DSpace\bin\archive_directory
> > > (2)The "archive_directory" contains the folder Item_001
> > > (3) Item_001 folder contains (1) Dublin_core.XML (2) contents file
and
> (3) test.pdf
> > > please check the name of the file. It should be contents and not
> contents.txt
> > > To rename contents.txt to contents, i used REN contents.txt contents
at
> command prompt.
> > > (4) dsrun org.dspace.app.itemimport.ItemImport -a
> [EMAIL PROTECTED] -c=123456789/2
-s=C:\DSpace\bin\archive_directory
> -m=mapfile10
> > >
> > > I hope this helps.
> > >
> > > Jayan
> > >
> > >
> > > ________________________________
>
> > > From: Pan Family [mailto:[EMAIL PROTECTED]
> > > Sent: Sat 2/24/2007 11:02 AM
> > > To: Jayan Chirayath Kurian
> > > Cc: [email protected] ;
> [EMAIL PROTECTED]
> > >
> > > Subject: Re: [Dspace-tech] how can I find out the collectionID?
> > >
> > >
> > >
> > > Hi Jayan (or anyone who knows how to do batch submission):
> > >
> > > I am still unable to do batch submission. Here is what I did:
> > > (1) Created a directory, /Users/pan/tmp and put 3 files under it:
> > > Content (a text file, attached); Dublin_core.xml (attached); and
> > > batch_import.pdf (the doc I wanted to submit to DSpace);
> > > (2) Ran:
> > > pan$ dsrun org.dspace.app.itemimport.ItemImport --add
> [EMAIL PROTECTED] --collection=123456789/2
> --source=/Users/pan/tmp --mapfile=/Users/pan/test_map
> > > Destination collections:
> > > Owning Collection: PODAAC collection
> > > Adding items from directory: /Users/pan/tmp
> > > Generating mapfile: /Users/pan/test_map
> > >
> > > No error message was shown, but the pdf file was not imported.
> > > An empty test_map file was generated. I also ran filter-media
> > > and found that all bitstreams were skipped because no new
> > > doc has been added.
> > >
> > > I found out from 1.4.1 beta 1 System Doc (pp. 22) that
> > > there are batch tools and registration is an althernate means
> > > to upload bitstreams, but no details or examples are provided.
> > > Can you provide links to more details or examples please?
> > >
> > > Thanks a lot for your help!
> > >
> > > -Pan
> > >
> > >
> > >
> > >
> > >
> > > On 2/1/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote:
> > > >
> > > >
> > > >
> > > >
> > > > You solved your problem in importing documents or are u using the
> interface to upload documents into the repository.
> > > >
> > > >
> > > >
> > > > Jayan
> > > >
> > > >
> > > >
> > > > ________________________________
>
> > > >
> > > > From: Pan Family [mailto:[EMAIL PROTECTED]
> > > > Sent: Friday, February 02, 2007 5:19 AM
> > > > To: Jayan Chirayath Kurian
> > > >
> > > > Subject: Re: [Dspace-tech] how can I find out the collectionID?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Thanks a lot!
> > > >
> > > > -Pan
> > > >
> > > >
> > > > On 1/31/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote:
> > > >
> > > >
> > > >
> > > > <? xml version="1.0" encoding="iso-8859-1" ?>
> > > >
> > > > - <!-- title of pdf AMIC_1984_10_CM_03.pdf
> > > >
> > > >
> > > > -->
> > > >
> > > > - <dublin_core>
> > > >
> > > > <dcvalue element=" creator" qualifier ="conference
">AMIC-Chiangmai
> University Refresher Course on Communication Research Methodology :
> Chiangmai, Oct 29-Nov 2, 1984.</dcvalue >
> > > >
> > > > <dcvalue element=" title" qualifier ="none ">The Logic of Social
> Science Research. </dcvalue >
> > > >
> > > > <dcvalue element=" contributor" qualifier ="author ">Atal,
Yogesh.
> </dcvalue >
> > > >
> > > > <dcvalue element=" date" qualifier ="issued ">1984-10-29 </
dcvalue
> >
> > > >
> > > > </dublin_core>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ________________________________
>
> > > >
> > > > From: Pan Family [mailto: [EMAIL PROTECTED]
> > > > Sent: Thursday, February 01, 2007 3:52 AM
> > > > To: Jayan Chirayath Kurian
> > > > Cc: [email protected]
> > > >
> > > >
> > > >
> > > > Subject: Re: [Dspace-tech] how can I find out the collectionID?
> > > >
> > > >
> > > >
> > > >
> > > > Could you please kindly provide a sample Dublin_core.xml?
> > > >
> > > > I assumed that dsrun would recursively go through the
> > > > directories and index all the files under them. Apparently
> > > > I was wrong. The requirement of Dublin_core.xml and
> > > > the content file makes the process much less automatic.
> > > > Is there a way around this?
> > > >
> > > > Thanks a lot!
> > > >
> > > > -Pan
> > > >
> > > >
> > > >
> > > >
> > > > On 1/30/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ________________________________
>
> > > >
> > > > From: Pan Family [mailto: [EMAIL PROTECTED]
> > > > Sent: Wednesday, January 31, 2007 1:15 PM
> > > > To: Jayan Chirayath Kurian
> > > > Cc: Dorothea Salo; [email protected]
> > > > Subject: Re: [Dspace-tech] how can I find out the collectionID?
> > > >
> > > >
> > > >
> > > > Ok. I will give this a try.
> > > >
> > > > Still two questions:
> > > > (1) Where can I get the file Dublin_core.XML?
> > > >
> > > > Dublin_core.xml contains the meta data descriptions of the
resource
> (e.g. title, date published etc). You have to create the xml file using
a
> notepad.
> > > >
> > > > (2) Let's say I only want to index one file named: foo.pdf, and I
put
> > > > it under /Users/pan/tmp/foo.pdf and pass src=/Users/pan to
dsrun
> > > > Is foo.pdf considered the content file or the resource? And
> which is
> > > > the third type of file?
> > > >
> > > > foo.pdf is the resource (i.e. pdf or ppt or jpeg…..)
> > > >
> > > > Content file is a text file that just contains the name of the
> resource i.e. foo.pdf
> > > >
> > > >
> > > >
> > > > Thanks a lot!
> > > >
> > > > -Pan
> > > >
> > > >
> > > > On 1/30/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote:
> > > >
> > > >
> > > >
> > > > I feel the tmp directory should have (1) the Dublin_core.XML (2)
> contents file and (3) actual resource. The tmp directory should have all
> these files without any more subdirectories for these files. Can you try
> with source=/Users/pan/ and removing all subdirectories under tmp and
having
> only these 3 files listed above. Hope it works.
> > > >
> > > >
> > > >
> > > > My structure is src = C:\DSpace\bin\archive_directory
> > > >
> > > > The archive_directory contains the directory Item_001
> > > >
> > > > Item_001 contains (1) Dublin_core.XML (2) contents file and (3)
actual
> resource.
> > > >
> > > > There are no more subdirectories under Item_001.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Jayan
> > > >
> > > >
> > > >
> > > > ________________________________
>
> > > >
> > > > From: Pan Family [mailto: [EMAIL PROTECTED]
> > > > Sent: Wednesday, January 31, 2007 4:06 AM
> > > > To: Jayan Chirayath Kurian
> > > > Cc: Dorothea Salo; [email protected]
> > > >
> > > >
> > > >
> > > > Subject: Re: [Dspace-tech] how can I find out the collectionID?
> > > >
> > > >
> > > >
> > > >
> > > > Thanks for your help!
> > > >
> > > > I am working on Mac OS X. Yes, "pan" contains "tmp"
> > > >
> > > > It seems that for me the dir that I give to source= cannot contain
any
> > > > subdirs. For example, if I give it "/Users/pan/" I got an error
> > > > complaining about the missing file ".fvwm/dublin_core.xml"
> > > > .fvwm is a subdir under "Users/pan/"
> > > >
> > > > If I give it "/Users/pan/tmp/"
> > > > then it complains about the same missing file under the subdirs
> > > > of "tmp" until I removed all the subdirs under "tmp"
> > > > But I still don't get the files under "tmp" imported to my
collection,
> > > > even if no error shows after I removed all subdirs.
> > > >
> > > > bubba:$ dsrun org.dspace.app.itemimport.ItemImport
> --add [EMAIL PROTECTED] --collection=123456789/2
> --source=/Users/pan/ --mapfile=/Users/pan/test_map --test
> > > > **Test Run** - not actually importing items.
> > > > Destination collections:
> > > > Owning Collection: PODAAC collection
> > > > Adding items from directory: /Users/pan/
> > > > Generating mapfile: /Users/pan/test_map
> > > > Adding item from directory .fvwm
> > > > java.io.FileNotFoundException :
> /Users/pan/.fvwm/dublin_core.xml (No such file or
> directory)
> > > > at java.io.FileInputStream.open(Native Method)
> > > > at java.io.FileInputStream.<init>(FileInputStream.java
:106)
> > > > at java.io.FileInputStream .<init>(FileInputStream.java
:66)
> > > > at
> sun.net.www.protocol.file.FileURLConnection.connect(
FileURLConnection.java:70)
> > > > at
> sun.net.www.protocol.file.FileURLConnection.getInputStream(
FileURLConnection.java
> :161)
> > > > at
> org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown
> Source)
> > > > at
> org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown
> Source)
> > > > at
> org.apache.xerces.parsers.XML11Configuration.parse (Unknown
> Source)
> > > > at
> org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
> > > > at
> org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> > > > at org.apache.xerces.parsers.DOMParser.parse
> (Unknown Source)
> > > > at
> org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown
> Source)
> > > > at
> javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
> > > > at
> org.dspace.app.itemimport.ItemImport.loadXML
> (ItemImport.java:1269)
> > > > at
> org.dspace.app.itemimport.ItemImport.loadDublinCore(ItemImport.java:795)
> > > > at
> org.dspace.app.itemimport.ItemImport.loadMetadata(ItemImport.java:780)
> > > > at
> org.dspace.app.itemimport.ItemImport.addItem
> (ItemImport.java:626)
> > > > at
> org.dspace.app.itemimport.ItemImport.addItems(ItemImport.java:498)
> > > > at
> org.dspace.app.itemimport.ItemImport.main(ItemImport.java:407)
> > > > java.io.FileNotFoundException:
> /Users/pan/.fvwm/dublin_core.xml (No such file or
> directory)
> > > > ***End of Test Run***
> > > >
> > > >
> > > > On 1/29/07, Jayan Chirayath Kurian <[EMAIL PROTECTED]> wrote:
> > > >
> > > >
> > > >
> > > > Can you please try with source=/Users/pan/
> > > >
> > > > I encountered the same problem on windows platform. This was
rectified
> by giving the main folder name with the import command. I assume that
"pan"
> contains the subfolder "tmp" which infact contains the pdf file. Hope
you
> will let me know if this works with you.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Jayan
> > > >
> > > >
> > > >
> > > > ________________________________
>
> > > >
> > > > From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] ] On
> Behalf Of Pan Family
> > > > Sent: Tuesday, January 30, 2007 8:02 AM
> > > > To: Dorothea Salo
> > > > Cc: [email protected]
> > > > Subject: Re: [Dspace-tech] how can I find out the collectionID?
> > > >
> > > >
> > > >
> > > >
> > > > Hi Dorothea:
> > > >
> > > > Thanks a lot for your help!
> > > > In my case, the handle is 123456789/2.
> > > > So I used the following command to add
> > > > a pdf file under /User/pan/tmp, but somehow
> > > > the pdf file was not added into the collection
> > > > and the file test_map is empty. No error
> > > > message was shown either. I wonder what
> > > > I did wrong. Could you give me some ideas
> > > > on how to debug?
> > > >
> > > > Thanks again,
> > > >
> > > > -Pan
> > > >
> > > > bubba:~/dspace-1.4.1-source /bin pan$ dsrun
> org.dspace.app.itemimport.ItemImport --add
> [EMAIL PROTECTED] --collection=123456789/2
> --source=/Users/pan/tmp/ --mapfile=/Users/pan/tmp/test_map
> > > > Destination collections:
> > > > Owning Collection: PODAAC collection
> > > > Adding items from directory: /Users/pan/tmp/
> > > > Generating mapfile: /Users/pan/tmp/test_map
> > > >
> > > >
> > > > On 1/29/07, Dorothea Salo <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Pan Family wrote:
> > > > > dsrun org.dspace.app.itemimport.ItemImport --add
> > > > > [EMAIL PROTECTED] --collection=collectionID
--source=items_dir
> > > > > --mapfile=mapfile
> > > > >
> > > > > Hi,
> > > > >
> > > > > The above command for batch import requires
> > > > > the collectionID as input. I wonder how
> > > > > I can find out this ID? Is it the string
> > > > > that I used to name my collection, or an ID
> > > > > that DSpace uses internally?
> > > >
> > > > You can use the collection's handle for this; go to the
> collection's home page
> > > > and use the numbers after "handle/" in the URL.
> > > >
> > > > If you should need the internal DSpace collection ID for
some
> reason, though,
> > > > log in, surf to the collection page, and then use the "Edit"
button
> under Admin
> > > > Tools. From there, choose "Collection's Authorizations," and
DSpace
> will pop up
> > > > the "DB ID" in the title of the page.
> > > >
> > > > (I hope there's an easier way to do this! There certainly
> should be.)
> > > >
> > > > Dorothea
> > > >
> > > > --
> > > > Dorothea Salo, Digital Repository Services Librarian
> > > > (703)993-3742 [EMAIL PROTECTED] AIM: gmumars
> > > > MSN 2FL, Fenwick Library
> > > > George Mason University
> > > > 4400 University Drive, Fairfax VA 22031
> > > >
> > > >
>
-------------------------------------------------------------------------
> > > > Take Surveys. Earn Cash. Influence the Future of IT
> > > > Join SourceForge.net's Techsay panel and you'll get the chance to
> share your
> > > > opinions on IT & business topics through brief surveys - and earn
cash
> > > >
>
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > > > _______________________________________________
> > > > DSpace-tech mailing list
> > > > [email protected]
> > > >
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
>
-------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
your
> opinions on IT & business topics through brief surveys-and earn cash
>
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
>
--
--
Stephen De Gabrielle
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech