Thanks a lot to the down-under whistle-blower!
This issue reminds me the TIFF format, seen decades ago as a good
preservation format, also an envelope for a myriad of other formats.
It ends up that they became badly supported by new Microsoft OSs and
finally we had to convert all of them to PDF.
In a decade, we will have to do it all over again I suppose! Millions of
files...
Archivists/Librarians are supposed to at least cope with the "40 years
disinterest time range": not an easy job and very difficult to fund in
these days of info-obesity!
Christophe Dupriez
DESTIN-Informatique.com
Twitter @ChristopheDupri
Le 2/01/2014 17:16, Hilton Gibson a écrit :
Ok. I have done my awareness thing. Good luck for future researchers.
Cheers
hg
*Hilton Gibson*
Ubuntu Linux Systems Administrator
JS Gericke Library
Room 1025D
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa
Tel: +27 21 808 4100 | Cell: +27 84 646 4758
http://library.sun.ac.za
http://za.linkedin.com/in/hiltongibson
On 2 January 2014 18:11, Graham Triggs <[email protected]
<mailto:[email protected]>> wrote:
On 2 January 2014 13:59, Hilton Gibson <[email protected]
<mailto:[email protected]>> wrote:
PDF/A-3 makes only a single, fairly monumental change. In the
PDF/A-2 specification users were allowed to embed files, but
only PDF/A files. PDF/A-3 now allows the embedding of any
arbitrary file format, including XML, CSV, CAD, images and any
others.
At first glance this sounds like a gigantic betrayal of
everything that the format has stood for. Why define a subset
of PDF attributes to ensure the long-term comprehension of the
file if you're going to turn around and allow the kitchen sink
to be embedded within it? (You can follow some of the original
discussion of this change here.)
http://blogs.loc.gov/digitalpreservation/2012/11/all-in-embedded-files-in-pdfa/?loclr=blogsig
This is very bad news for digital preservation because it is
now possible to "hide" proprietary digital inside the PDF/A
digital container. What will future researchers think when
they stumble upon these "hidden" closed formats that they will
not be able to use?
What were they thinking??
There are probably nice, inventive ways to abuse this. Probably by
having a proprietary application that uses the format as a
container, but then has all the meat of what it's doing in
embedded files - although that wouldn't really be usable as a
PDF/A in the standard way, anyway. But taking a step back, the
alternative to not being allowed to embed arbitrary file data is
that all of that data must be held separately. Yes, that means you
can easily perform preservation activities around those files. But
it also increases the likelihood that someone will get the PDF/A
file, and not the additional arbitrary files.
Given the choice between not having the files at all, and having
the files embedded in the PDF/A - albeit possibly in a 'dead'
format - then for many people having the files will be a clear
winner. Dead formats can generally still be resurrected by some
means (get an emulator, run a file conversion, etc.). It's still
more useful than having no file.
If you are actively involved in preserving PDF/A files, then the
"static readable" component remains the same regardless. You've
just got the possibility of extra, arbitrary files inside the
PDF/A - in which case, treat it like an archive (like zip, tar,
etc.). Index the embedded files, extract the embedded files and
run preservation tasks against them as necessary. Create new
PDF/A-3 bundles.
At no point have you degraded what is comprehensible about the
PDF/A - you've just added stuff that might not be.
Rule No 1 in digital preservation - capture everything. If you
don't capture it, you can't preserve it. To that end, this should
be a good thing for preservation. We just need to be aware of an
extra hoop that we can / should jump through for format migration.
G
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general