Hi Stuart,

Example assetstore file:

${dspace.dir}/assetstore/95/80/98/95809816172544348784747013964495251419

The filename itself is in bitstream.internal_id in the dspace database, and the 
directory names are just the first 6 numbers of the internal ID.

Here's a SQL query that resolves internal_ids to item_id (aka record ID) and 
handle (which should tie into URL):

select item.item_id,handle,bitstream.internal_id from 
item,item2bundle,bundle2bitstream,handle,bitstream where item.item_id = 
item2bundle.item_id and item2bundle.bundle_id = bundle2bitstream.bundle_id and 
bundle2bitstream.bitstream_id = bitstream.bitstream_id and handle.resource_id = 
item.item_id;

I've never looked at writing a script based on this (we are just doing the 
standard checksum checking at the moment) but it shouldn't be too difficult.

(if you want to cut down on analysing non-PDFs with 'file', you could use 
bitstream.bitstream_format_id to build a list of PDFs before running the 
filesystem-level tools, too..)

Cheers,

Kim.

--
Kim Shepherd
IRR Technical Specialist
ITS Systems & Development
The University of Waikato
DDI +64 7 838 4025




> -----Original Message-----
> From: stuart yeates [mailto:[email protected]]
> Sent: Wednesday, 25 February 2009 9:03 a.m.
> To: [email protected]
> Subject: [Dspace-tech] script to validate all PDFs ?
> 
> Does anyone have a script that checks all of the previously uploaded
> PDFs and find ones that are malformed and reports their URLs/record IDs?
> 
> I can see how to write a script that uses the unix command line 'file'
> and 'pdftops' tools to check that every file that looks like a PDF is a
> good and valid PDF. Going from a file on the disk to a database record
> I'm not too sure of.
> 
> cheers
> stuart
> --
> Stuart Yeates
> http://www.nzetc.org/       New Zealand Electronic Text Centre
> http://researcharchive.vuw.ac.nz/     Institutional Repository
> 
> -----------------------------------------------------------------------
> -------
> Open Source Business Conference (OSBC), March 24-25, 2009, San
> Francisco, CA
> -OSBC tackles the biggest issue in open source: Open Sourcing the
> Enterprise
> -Strategies to boost innovation and cut costs with open source
> participation
> -Receive a $600 discount off the registration fee with the source code:
> SFAD
> http://p.sf.net/sfu/XcvMzF8H
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to