I think Magnus is looking for a tool that reads the pdf and extracts
subjects, title, author metadata from it.  We do not have something like
that at this time. However, someone could write either a mediafilter,
curation task or consumer that would parse the documents and pull out
possible metadata fields, though this would be heuristically complex if it
is not in PDF or DOC metadata/bibliographies/etc. I believe there are some
third party tools out there that do make an attempt to extract such data,
but a review is necessary and we could engineer something that supports this
capability for DSpace.

http://www.zotero.org/support/retrieve_pdf_metadata
<http://www.zotero.org/support/retrieve_pdf_metadata>
http://meta-extractor.sourceforge.net/
<http://meta-extractor.sourceforge.net/>
http://www.pdfa.org/doku.php?id=artikel:en:pdfa_metadata

Other formats...

http://www.forensicswiki.org/wiki/Document_Metadata_Extraction

Magnus, we do complete a full text extract of the document which is indexed
into the DSpace search system.

Cheers,
Mark

On Wed, Aug 10, 2011 at 7:39 AM, Tom De Mulder <[email protected]> wrote:

> On Wed, 10 Aug 2011, Magnus Norberg wrote:
>
> > does anyone know if there are any tools for automatic creation of dublin
> >core files and "contents" files?
> >
> > One need these files for batch import, one for each object. But if I
> >have like a thousand files (for example PDF files) on my harddrive that I
> >want to import into DSpace in a batch import, I do not want to create all
> >these "Item1", "Item2" and so on directories one by one, and then create
> >dublin core and content files one by one for each object, it would take
> >too much time...
>
> We created a tool that will do that work for you, all you need is the list
> of filenames and the metadata in a csv file, such as can be created by any
> spreadsheet program (Excel or OpenOffice, for example). It'll then create
> the batch import structure for you. This might be one way to help with
> your problem.
>
> http://tools.dspace.cam.ac.uk/metadatamapper/
>
>
> Best,
>
> --
> Tom De Mulder <[email protected]> - Cambridge University Computing Service
> +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
> -> 10/08/2011 : The Moon is Waxing Gibbous (75% of Full)
>
>
> ------------------------------------------------------------------------------
> uberSVN's rich system and user administration capabilities and model
> configuration take the hassle out of deploying and managing Subversion and
> the tools developers use with it. Learn more about uberSVN and get a free
> download at:  http://p.sf.net/sfu/wandisco-dev2dev
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>



-- 
Mark R. Diggory
@mire - www.atmire.com
2888 Loker Avenue East - Suite 305 - Carlsbad - CA - 92010
Esperantolaan 4 - Heverlee 3001 - Belgium
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model 
configuration take the hassle out of deploying and managing Subversion and 
the tools developers use with it. Learn more about uberSVN and get a free 
download at:  http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to