Hi Stuart,
Example assetstore file:
${dspace.dir}/assetstore/95/80/98/95809816172544348784747013964495251419
The filename itself is in bitstream.internal_id in the dspace database, and the
directory names are just the first 6 numbers of the internal ID.
Here's a SQL query that resolves internal_ids to item_id (aka record ID) and
handle (which should tie into URL):
select item.item_id,handle,bitstream.internal_id from
item,item2bundle,bundle2bitstream,handle,bitstream where item.item_id =
item2bundle.item_id and item2bundle.bundle_id = bundle2bitstream.bundle_id and
bundle2bitstream.bitstream_id = bitstream.bitstream_id and handle.resource_id =
item.item_id;
I've never looked at writing a script based on this (we are just doing the
standard checksum checking at the moment) but it shouldn't be too difficult.
(if you want to cut down on analysing non-PDFs with 'file', you could use
bitstream.bitstream_format_id to build a list of PDFs before running the
filesystem-level tools, too..)
Cheers,
Kim.
--
Kim Shepherd
IRR Technical Specialist
ITS Systems & Development
The University of Waikato
DDI +64 7 838 4025
> -----Original Message-----
> From: stuart yeates [mailto:[email protected]]
> Sent: Wednesday, 25 February 2009 9:03 a.m.
> To: [email protected]
> Subject: [Dspace-tech] script to validate all PDFs ?
>
> Does anyone have a script that checks all of the previously uploaded
> PDFs and find ones that are malformed and reports their URLs/record IDs?
>
> I can see how to write a script that uses the unix command line 'file'
> and 'pdftops' tools to check that every file that looks like a PDF is a
> good and valid PDF. Going from a file on the disk to a database record
> I'm not too sure of.
>
> cheers
> stuart
> --
> Stuart Yeates
> http://www.nzetc.org/ New Zealand Electronic Text Centre
> http://researcharchive.vuw.ac.nz/ Institutional Repository
>
> -----------------------------------------------------------------------
> -------
> Open Source Business Conference (OSBC), March 24-25, 2009, San
> Francisco, CA
> -OSBC tackles the biggest issue in open source: Open Sourcing the
> Enterprise
> -Strategies to boost innovation and cut costs with open source
> participation
> -Receive a $600 discount off the registration fee with the source code:
> SFAD
> http://p.sf.net/sfu/XcvMzF8H
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech