I also agree that modifying/discarding the originals is a bad idea. FWIW, in
our workflow (a small DSpace instance for institutional e-records) we convert
documents to PDF/A prior to ingest, then submit both copies in a single bundle
with "original" and "preservation copy" in their respective description fields.
We also include a note in dc.description.provenance that contains the basic
PREMIS semantic units describing the reformatting -- date, action taken, staff
member (Agent) who did the conversion, etc.
What tools do you plan to use to create and verify the PDF/As? When I was
setting up our workflow (~2 years ago), I found that only Acrobat XI could
consistently create valid (as per the Preflight compliance check) PDF/A out of
arbitrary PDF input. In our particular case, we receive PDFs from our
communications department which were created in InDesign and contain extensive
data on things like prepress color spaces. These require much subtle tweaking
to produce compliant PDF/A, and even previous versions of Acrobat tend to choke
on this.
Manual conversion in Acrobat doesn't scale, of course, so in the long run I'd
like to find an alternative option. I haven't experimented with the method
described in this Stack Exchange thread, but it might be helpful if you want to
build a curation task out of open source components:
http://unix.stackexchange.com/questions/79516/converting-pdf-to-pdf-a
Nicholas Webb
Digital Archivist
Icahn School of Medicine at Mount Sinai
Box 1102 - One Gustave L. Levy Place
New York, NY 10029-6574
(o) 212-241-7239
(f) 212-241-7864
(e) [email protected]<mailto:[email protected]>
From: emilio lorenzo [mailto:[email protected]]
Sent: Wednesday, September 17, 2014 12:43 PM
To: [email protected]
Subject: Re: [Dspace-tech] Reconverting PDF's in assetstore to PDF/A
Subscribing Peter´s opinion, you should never loss the original
For the controversy, I rather move the original PDF to another bundle and the
converted PDF/A would left at the ORIGINAL Bundle
And add the needed metadata to document the change, at least add some lines
into a new dc.description.provenance (I know, there are better solutions, but
this is simpler). They would serve as a pseudo preservation-metadata-record
Thanks
Emilio Lorenzo
El 17/09/2014 16:39, Peter Dietz escribió:
I wouldn't recommend altering/damaging your existing PDF's. That just sounds
like its inviting risk.
A better route would be to build a DSpace curation task, that is capable of
doing the automated PDF conversion, leaving the old PDF's in the ORIGINAL
bundle, and the PDF/A's going into a bundle such as CONVERTED.
Also, what conversion tools are you looking at? i.e. a Java library / remote
web service? Also, I'd take a look at the license, hopefully a compatible
license with DSpace (i.e. non GPL, non AGPL).
________________
Peter Dietz
Longsight
www.longsight.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.longsight.com&d=AAMCAw&c=4R1YgkJNMyVWjMjneTwN5tJRn8m8VqTSNCjYLg1wNX4&r=Lew1DkXdB19_Zkwg_KoW3kDkYXW1whmdQqFp3wITCiw&m=ggDKBFKwUeV-EsALcWh-I6mgy6KHEJ7ISSKgZe32S_k&s=JZUQ0ObxNlu5haEUFzqB7ugp0XQpCcZ2qkQU7jHEMBo&e=>
[email protected]<mailto:[email protected]>
p: 740-599-5005 x809
On Wed, Sep 17, 2014 at 7:56 AM, helix84
<[email protected]<mailto:[email protected]>> wrote:
I forgot - you'd also need to update the "size_bytes" column.
Regards,
~~helix84
Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette<https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.duraspace.org_display_DSPACE_Mailing-2BList-2BEtiquette&d=AAMCAw&c=4R1YgkJNMyVWjMjneTwN5tJRn8m8VqTSNCjYLg1wNX4&r=Lew1DkXdB19_Zkwg_KoW3kDkYXW1whmdQqFp3wITCiw&m=ggDKBFKwUeV-EsALcWh-I6mgy6KHEJ7ISSKgZe32S_k&s=R7KF6p0gA6_qpP98IiodZRefTATksqXQUvfkYY4RRdo&e=>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk<https://urldefense.proofpoint.com/v2/url?u=http-3A__pubads.g.doubleclick.net_gampad_clk-3Fid-3D157508191-26iu-3D_4140_ostg.clktrk&d=AAMCAw&c=4R1YgkJNMyVWjMjneTwN5tJRn8m8VqTSNCjYLg1wNX4&r=Lew1DkXdB19_Zkwg_KoW3kDkYXW1whmdQqFp3wITCiw&m=ggDKBFKwUeV-EsALcWh-I6mgy6KHEJ7ISSKgZe32S_k&s=i7SzOauAvPoNX52XDl92ryk27YrwvILS7JQlHBlTemE&e=>
_______________________________________________
DSpace-tech mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dspace-tech<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_dspace-2Dtech&d=AAMCAw&c=4R1YgkJNMyVWjMjneTwN5tJRn8m8VqTSNCjYLg1wNX4&r=Lew1DkXdB19_Zkwg_KoW3kDkYXW1whmdQqFp3wITCiw&m=ggDKBFKwUeV-EsALcWh-I6mgy6KHEJ7ISSKgZe32S_k&s=DRxCIPKfA_sR5Hcdi4yOVre0fiPwFVwchjzNgJ4xLp4&e=>
List Etiquette:
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette<https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.duraspace.org_display_DSPACE_Mailing-2BList-2BEtiquette&d=AAMCAw&c=4R1YgkJNMyVWjMjneTwN5tJRn8m8VqTSNCjYLg1wNX4&r=Lew1DkXdB19_Zkwg_KoW3kDkYXW1whmdQqFp3wITCiw&m=ggDKBFKwUeV-EsALcWh-I6mgy6KHEJ7ISSKgZe32S_k&s=R7KF6p0gA6_qpP98IiodZRefTATksqXQUvfkYY4RRdo&e=>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk<https://urldefense.proofpoint.com/v2/url?u=http-3A__pubads.g.doubleclick.net_gampad_clk-3Fid-3D157508191-26iu-3D_4140_ostg.clktrk&d=AAMCAw&c=4R1YgkJNMyVWjMjneTwN5tJRn8m8VqTSNCjYLg1wNX4&r=Lew1DkXdB19_Zkwg_KoW3kDkYXW1whmdQqFp3wITCiw&m=ggDKBFKwUeV-EsALcWh-I6mgy6KHEJ7ISSKgZe32S_k&s=i7SzOauAvPoNX52XDl92ryk27YrwvILS7JQlHBlTemE&e=>
_______________________________________________
DSpace-tech mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dspace-tech<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_dspace-2Dtech&d=AAMCAw&c=4R1YgkJNMyVWjMjneTwN5tJRn8m8VqTSNCjYLg1wNX4&r=Lew1DkXdB19_Zkwg_KoW3kDkYXW1whmdQqFp3wITCiw&m=ggDKBFKwUeV-EsALcWh-I6mgy6KHEJ7ISSKgZe32S_k&s=DRxCIPKfA_sR5Hcdi4yOVre0fiPwFVwchjzNgJ4xLp4&e=>
List Etiquette:
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette<https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.duraspace.org_display_DSPACE_Mailing-2BList-2BEtiquette&d=AAMCAw&c=4R1YgkJNMyVWjMjneTwN5tJRn8m8VqTSNCjYLg1wNX4&r=Lew1DkXdB19_Zkwg_KoW3kDkYXW1whmdQqFp3wITCiw&m=ggDKBFKwUeV-EsALcWh-I6mgy6KHEJ7ISSKgZe32S_k&s=R7KF6p0gA6_qpP98IiodZRefTATksqXQUvfkYY4RRdo&e=>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette