I think Magnus is looking for a tool that reads the pdf and extracts subjects, title, author metadata from it. We do not have something like that at this time. However, someone could write either a mediafilter, curation task or consumer that would parse the documents and pull out possible metadata fields, though this would be heuristically complex if it is not in PDF or DOC metadata/bibliographies/etc. I believe there are some third party tools out there that do make an attempt to extract such data, but a review is necessary and we could engineer something that supports this capability for DSpace.
http://www.zotero.org/support/retrieve_pdf_metadata <http://www.zotero.org/support/retrieve_pdf_metadata> http://meta-extractor.sourceforge.net/ <http://meta-extractor.sourceforge.net/> http://www.pdfa.org/doku.php?id=artikel:en:pdfa_metadata Other formats... http://www.forensicswiki.org/wiki/Document_Metadata_Extraction Magnus, we do complete a full text extract of the document which is indexed into the DSpace search system. Cheers, Mark On Wed, Aug 10, 2011 at 7:39 AM, Tom De Mulder <[email protected]> wrote: > On Wed, 10 Aug 2011, Magnus Norberg wrote: > > > does anyone know if there are any tools for automatic creation of dublin > >core files and "contents" files? > > > > One need these files for batch import, one for each object. But if I > >have like a thousand files (for example PDF files) on my harddrive that I > >want to import into DSpace in a batch import, I do not want to create all > >these "Item1", "Item2" and so on directories one by one, and then create > >dublin core and content files one by one for each object, it would take > >too much time... > > We created a tool that will do that work for you, all you need is the list > of filenames and the metadata in a csv file, such as can be created by any > spreadsheet program (Excel or OpenOffice, for example). It'll then create > the batch import structure for you. This might be one way to help with > your problem. > > http://tools.dspace.cam.ac.uk/metadatamapper/ > > > Best, > > -- > Tom De Mulder <[email protected]> - Cambridge University Computing Service > +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH > -> 10/08/2011 : The Moon is Waxing Gibbous (75% of Full) > > > ------------------------------------------------------------------------------ > uberSVN's rich system and user administration capabilities and model > configuration take the hassle out of deploying and managing Subversion and > the tools developers use with it. Learn more about uberSVN and get a free > download at: http://p.sf.net/sfu/wandisco-dev2dev > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech > -- Mark R. Diggory @mire - www.atmire.com 2888 Loker Avenue East - Suite 305 - Carlsbad - CA - 92010 Esperantolaan 4 - Heverlee 3001 - Belgium
------------------------------------------------------------------------------ uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

