August 6, 2011
When Data Disappears

http://www.nytimes.com/2011/08/07/opinion/sunday/when-data-disappears.html

By KARI KRAUS

Kari Kraus is an assistant professor in the College of Information Studies and 
the English department at the University of Maryland.


LAST spring, the Harry Ransom Center at the University of Texas acquired the 
papers of Bruce Sterling, a renowned science fiction writer and futurist. But 
not a single floppy disk or CD-ROM was included among his notes and 
manuscripts. When pressed to explain why, the prophet of high-tech said digital 
preservation was doomed to fail. “There are forms of media which are just 
inherently unstable,” he said, “and the attempt to stabilize them is like the 
attempt to go out and stabilize the corkboard at the laundromat.”

Mr. Sterling has a point: for all its many promises, digital storage is 
perishable, perhaps even more so than paper. Disks corrode, bits “rot” and 
hardware becomes obsolete.

But that doesn’t mean digital preservation is pointless: if we’re going to save 
even a fraction of the trillions of bits of data churned out every year, we 
can’t think of digital preservation in the same way we do paper preservation. 
We have to stop thinking about how to save data only after it’s no longer 
needed, as when an author donates her papers to an archive. Instead, we must 
look for ways to continuously maintain and improve it. In other words, we must 
stop preserving digital material and start curating it.

At first glance, digital preservation seems to promise everything: nearly 
unlimited storage, ease of access and virtually no cost to making copies. But 
the practical lessons of digital preservation contradict the notion that bits 
are eternal. Consider those 5 1/4-inch floppies stockpiled in your basement. 
When you saved that unpublished manuscript on them, you figured it would be 
accessible forever. But when was the last time you saw a floppy drive?

And even if you could find the right drive, there’s a good chance the disk’s 
magnetic properties will have decayed beyond readability. The same goes, 
generally speaking, for CD-ROMs, DVDs and portable drives.

Even the software needed to read the bits may prove elusive. Like Egyptian 
hieroglyphs, whose code was indecipherable until the rediscovery of the Rosetta 
Stone, the string of 1s and 0s on a floppy is meaningless in the absence of a 
set of computer instructions for translating them. If you don’t have a copy of 
WordPerfect 2 around, you’re out of luck. No wonder preservationists often wax 
ominous about the “digital dark ages.”

Of course, there’s always the option of migrating data from old to new media. 
But migration isn’t as simple as copying files — it’s more like translating 
from Japanese to Hungarian. Information is invariably lost; do it enough times 
and the result will be like the garbled message at the end of a game of 
telephone.

Another option is emulation, in which a software program impersonates a retro 
hardware environment; essentially, an emulator temporarily “downgrades” a 
modern computer to act like an old one. But over time, emulation becomes 
unwieldy: because the host systems for which emulators  are designed will 
themselves become obsolete, emulators must eventually be moved to new computer 
platforms — emulators to run emulators, ad infinitum.

Nor is the problem just with the medium. We generate over 1.8 zettabytes of 
digital information a year. By some estimates, that’s nearly 30 million times 
the amount of information contained in all the books ever published. Even if we 
had perfectly stable storage, could we ever have enough to preserve everything?

The short answer is no — but only because we’re trying to replicate the 
practices used for decades to maintain paper archives. In this model, 
preservation begins only after a record is past its use. With data, 
intervention needs to happen earlier, ideally at an object’s creation. And 
tough decisions need to be made, early on, regarding what needs to be saved. We 
must replace digital preservation with digital curation.

Perhaps the most impressive effort to curate digital information is taking 
place in the realm of video games. In the face of negligence from the game 
industry, fans of “Super Mario Bros.” and “Pac-Man” have been creating 
homegrown solutions to collecting, documenting, reading and rendering games, 
creating an evolving archive of game history. They coordinate efforts and share 
the workload — sometimes in formal groups, sometimes as loose collectives. Nor 
does the data just sit around. These are gamers, after all, so they are 
constantly engaged with the files. In the process, they update them, create 
duplicates and fix bugs.

Despite often operating in legal gray areas, such curatorial activism can be a 
model for other digital domains. A similar pattern is emerging in 
data-intensive fields like genetics, where published data sets are often 
“cleaned” by third-party curators to purge them of inaccuracies.

It might seem silly to look to video-game fans for lessons on how to save our 
informational heritage, but in fact complex interactive games represent the 
outer limit of what we can do with digital preservation. By figuring out how to 
keep a complex game, like a classic first-person shooter, alive, we develop a 
better idea of how to preserve simulations of genetic evolution or the behavior 
of star systems.

True, not all data is worth saving. But that’s as true for bits as it is for 
sheets of paper. In this model, at least, the decisions on what to save are 
informed by a deep knowledge of the field, while the cost is shared by everyone 
involved.

Above all, the model allows us to see preservation as active and continuing: 
managing change to data rather than trying to prevent it, while viewing data as 
a living resource for the future rather than a relic of the past.


 
_______________________________________________
Infowarrior mailing list
[email protected]
https://attrition.org/mailman/listinfo/infowarrior

Reply via email to