Study finds that we could lose science if publishers go bankrupt

A scan of archives shows that lots of scientific papers aren't backed up.

By JOHN TIMMER - 3/9/2024,  
https://arstechnica.com/science/2024/03/study-finds-that-we-could-lose-science-if-publishers-go-bankrupt/


Back when scientific publications came in paper form, libraries played a key 
role in ensuring that knowledge didn't disappear.

Copies went out to so many libraries that any failure—a publisher going 
bankrupt, a library getting closed—wouldn't put us at risk of losing 
information.

But, as with anything else, scientific content has gone digital, which has 
changed what's involved with preservation.

Organizations have devised systems that should provide options for preserving 
digital material. But, according to a recently published survey, lots of 
digital documents aren't consistently showing up in the archives that are meant 
to preserve it.

And that puts us at risk of losing academic research—including science paid for 
with taxpayer money.

Tracking down references

The work was done by Martin Eve, a developer at Crossref. That's the 
organization that organizes the DOI system, which provides a permanent pointer 
toward digital documents, including almost every scientific publication. If 
updates are done properly, a DOI will always resolve to a document, even if 
that document gets shifted to a new URL.

But it also has a way of handling documents disappearing from their expected 
location, as might happen if a publisher went bankrupt. There are a set of 
what's called "dark archives" that the public doesn't have access to, but 
should contain copies of anything that's had a DOI assigned. If anything goes 
wrong with a DOI, it should trigger the dark archives to open access, and the 
DOI updated to point to the copy in the dark archive.

For that to work, however, copies of everything published have to be in the 
archives. So Eve decided to check whether that's the case.

Using the Crossref database, Eve got a list of over 7 million DOIs and then 
checked whether the documents could be found in archives. He included 
well-known ones, like the Internet Archive at archive.org, as well as some 
dedicated to academic works, like LOCKSS (Lots of Copies Keeps Stuff Safe) and 
CLOCKSS (Controlled Lots of Copies Keeps Stuff Safe).

The results were... not great.

Not well-preserved

When Eve broke down the results by publisher, less than 1 percent of the 204 
publishers had put the majority of their content into multiple archives. (The 
cutoff was 75 percent of their content in three or more archives.) Fewer than 
10 percent had put more than half their content in at least two archives. And a 
full third seemed to be doing no organized archiving at all.

At the individual publication level, under 60 percent were present in at least 
one archive, and over a quarter didn't appear to be in any of the archives at 
all. (Another 14 percent were published too recently to have been archived or 
had incomplete records.)

The good news is that large academic publishers appear to be reasonably good 
about getting things into archives; most of the unarchived issues stem from 
smaller publishers.

Eve acknowledges that the study has limits, primarily in that there may be 
additional archives he hasn't checked.

There are some prominent dark archives that he didn't have access to, as well 
as things like Sci-hub, which violates copyright in order to make material from 
for-profit publishers available to the public. Finally, individual publishers 
may have their own archiving system in place that could keep publications from 
disappearing.

Should we be worried?

The risk here is that, ultimately, we may lose access to some academic 
research. As Eve phrases it, knowledge gets expanded because we're able to 
build upon a foundation of facts that we can trace back through a chain of 
references. If we start losing those links, then the foundation gets shakier. 
Archiving comes with its own set of challenges: It costs money, it has to be 
organized, consistent means of accessing the archived material need to be 
established, and so on.

But, to an extent, we're failing at the first step. "An important point to 
make," Eve writes, "is that there is no consensus over who should be 
responsible for archiving scholarship in the digital age."

A somewhat related issue is ensuring that people can find the archived 
material—the issue that DOIs were designed to solve. In many cases, the authors 
of the manuscript place copies in places like the arXiv/bioRxiv, or the NIH's 
PubMed Centra (this sort of archiving is increasingly being made a requirement 
by funding bodies).

The problem here is that the archived copies may not include the DOI that's 
meant to ensure it can be located. That doesn't mean it can't be identified 
through other means, but it definitely makes finding the right document much 
more difficult.

Put differently, if you can't find a paper or can't be certain you're looking 
at the right version of it, it can be just as bad as not having a copy of the 
paper at all.

None of this is to say that we've already lost important research documents.

But Eve's paper serves a valuable function by highlighting that the risk is 
real. We're well into the era where print copies of journals are irrelevant to 
most academics, and digital-only academic journals have proliferated.

It's long past time for us to have clear standards in place to ensure that 
digital versions of research have the endurance that print works have enjoyed.



Journal of Librarianship and Scholarly Communication, 2024. DOI: 
10.31274/jlsc.16288  (About DOIs).

READER COMMENTS  54

John is Ars Technica's science editor. He has a Bachelor of Arts in 
Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell 
Biology from the University of California, Berkeley. When physically separated 
from his keyboard, he tends to seek out a bicycle, or a scenic location for 
communing with his hiking boots.
--

_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link

Reply via email to