Re: Central vs. Distributed Archives
PubMed Central will host individual OA articles PubMed Central http://www.pubmedcentral.gov/index.html has launched an About Open Access page http://www.pubmedcentral.gov/about/openaccess.html drawing attention to the journals that provide open access to their contents through PMC. The page also announces an important new policy: [I]n October 2003, PMC began accepting individual open access articles from journals that do not participate in PMC on a routine basis. For the specific conditions under which PMC accepts these articles, see the relevant PMC agreement (in Microsoft Word format) http://www.pubmedcentral.gov/pmcdoc/pmc-openaccs-agree.doc . The offer is open to all authors in the life sciences willing to release their work to open access as defined by the Bethesda Statement on Open Access Publishing http://www.earlham.edu/~peters/fos/bethesda.htm. (Thanks to George Porter.) Posted to Open Access News 12 November 2003 by Peter Suber http://www.earlham.edu/~peters/fos/2003_11_09_fosblogarchive.html#a106866889488739033 Relevant Prior Subject Threads: E-Biomed: Very important NIH Proposal http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0240.html http://www.nih.gov/about/director/ebiomed/com0509.htm NIH's Public Archive for the Refereed Literature: PUBMED CENTRAL http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0372.html Just two comments: (1) More central open-access archives in which authors can self-archive their articles are always welcome and helpful (especially if they are OAI-interoperable) and it is gratifying to see what was originally the E-Biomed proposal -- which at first unfortunately backed away from individual author self-archiving of toll-access journal articles -- now ready to accept author self-archiving at last! It has to be added, though, that since 1999, with the advent of distributed eprint archiving, integrated by the glue of OAI-interoperability http://www.openarchives.org/ , it has become apparent that institutional self-archiving is a more promising route than central self-archiving, because researchers and their instutions share the benefits of maximizing the impact of their own research output, and share the costs of impact-loss because of toll-based access-denial to would-be users everywhere. Institutions also wield the carrot/stick of publish or perish over their own researchers and are hence in the position to mandate and monitor compliance with their own self-archiving policy. Central archives share no such common costs/benefits with researchers, and are not in a position to mandate self-archiving or to monitor compliance. http://www.ecs.soton.ac.uk/~harnad/Temp/archpolnew.html http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0043.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0023.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0044.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0005.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0006.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0013.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0015.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0016.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0018.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0022.gif (2) The Bethesda statement on open access publishing http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2878.html is indeed a statement on open-access *publishing* and not on *open access,* i.e., only on the golden and not the green (self-archiving) road to open access. http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/3147.html It is a potentially useful document, but only if this one-sidedness is conscientiously and decisively remedied, for as it stands, the Bethesda Statement is simply missing out on 95% of the immediate potential for open access. (In addition, the Bethesda definition of open is over-determined, again because of its one-sided focus on open-access journal publishingalone. All that research and researchers need is free online full-text access to all research; the rest comes automatically with the online territory: See the subject-thread: Free Access vs. Open Access http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2956.html ) http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0021.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0024.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0026.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0027.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0028.gif http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0029.gif Stevan Harnad NOTE: Complete archive of the ongoing discussion of providing open access to
Re: Central vs. Distributed Archives
Yet another piece of evidence has appeared that seems to confirm that whereas central archiving was historically the way in which self-archiving began, it is not the fastest or best form for it to grow and spread today: The Nature headline is (as usual for the press) an exaggeration: Critical comments threaten to open libel floodgate for physics archive http://www.nature.com/cgi-taf/Dynapage.taf?file=/nature/journal/v426/n6962/full/426007b_fs.html and so is SciDevNet's: Legal concerns plague open access physics archive http://www.scidev.net/news/index.cfm?fuseaction=readnewsitemid=1087language=1 but the facts seem to be that, across the years, some papers that contained plagiarism or libel might have found their way into ArXiv's vast (250,000 papers) and unvetted collection. http://www.arxiv.org I said unvetted, but of course almost all those papers are also submitted to peer-reviewed journals, which *do* vet them, and when there have been any corrections to the unrefereed preprint, the authors self-archive the refereed postprint too: http://opcit.eprints.org/tdb198/opcit/ So the (tiny) problem of plagiarism and libel is with papers that have *not* been peer-reviewed. ArXiv can make an effort to vet its daily submissions for plagiarism or libel, but at nearly 4000 per month, this would be quite a task: http://arxiv.org/show_monthly_submissions So the natural conclusions to draw from this seem to be the following: (1) OAI-interoperability has now made all OAI-compliant archives equivalent: They can all be harvested and jointly searched. It no longer makes any difference which archive a paper is actually deposited in: http://oaister.umdl.umich.edu/o/oaister/ (2) Not only are institutions in the best position to vet their own research output before approving deposits in their own institutional archives (probably on a departmental basis, optimally) http://www.ecs.soton.ac.uk/~harnad/Temp/archpolnew.html but this vetting load is much better shouldered in a distributed way, rather than having one centralized vettor for all of the planet's research output (in physics, mathematics, or other disciplines). (3) Having institutional self-archived research output housed in the institution's own archives also immunizes the archive from external liabilities (such as plagiarizers from other institutions) but it also makes it even more clear that -- contrary to what the Nature article says it is, and perhaps contrary even to what the Physics ArXiv *thinks* it is -- open-access archives are not *publishers*! They are merely a means of providing open access to (refereed) publications (as well as to their precursor unrefereed preprints). Garfield: 'Acknowledged Self-Archiving is Not Prior Publication' http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2239.html For those who needed a reminder of it, research's publish or perish mandate is *not* self-archive or perish! Publication refers to certification as having met the known peer-review quality standards of a journal, not to having pressed the click button to self-archive an unrefereed draft in an open-access archive! That meets the (trivial) legal definition of publishing, to be sure -- even hand-writing it on paper once and showing it to someone does! But it certainly doesn't meet the definition of what the research community (and promotion/salary committees, and research-funding councils) means by publication, which is to be certified by a qualified, neutral third-party as having met its known standards of peer review. At best, the self-archiving of an unrefereed draft qualifies as vanity-press *self-publication* -- but that is precisely what researchers' institutions and their publish or perish mandates are there in order to *protect* their researchers from doing! (Or rather, to ensure that they go on to get their papers properly peer-reviewed and certified as having met the peer-review standards of the particular journal that accepted the paper.) By the same token, it is each researcher's own institution -- not a centralized entity like ArXiv -- that is in the best position to prevent its own researchers (and themselves) from self-archiving plagiarized or libellous papers -- and to take action if they do. Having said that, the Physics ArXiv's legal concerns are all a tempest in a teapot anyway. A central archive is a service provider. The service it provides is to operate an archive for authors to self-archive in. If an author self-archives a piece of plagiarism or libel therein, the only legal responsibility of the archive is to *remove* that item as soon as it is drawn to its attention. This is exactly the same rule as the one applied to other Internet service providers: If someone posts or emails pornography in an AOL discussion list or bulletin board, AOL does not become liable as a pornographer if it immediately removes the item as soon as it is drawn to its attention and blocks further postings from the poster. (The poster,
Re: Central vs. Distributed Archives
I agree with Stevan: ArXiv just needs a note clarifying that it is only a time stamp and archiving machine, and takes no legal responsibility for its content because it does not 'read the content' (as referees do). It acts as a gateway provider. So the risk stays with the author. Within-arxiv plagiarism can easily be checked within the arxiv. Plagiarized papers will have a later time stamp, and thus the original author can be spotted and the later one(s) blamed. In contrast, scientific journals, serving to 'read and referee and check the content of the paper' and gaining the ownership are responsible in case the paper turns out to be plagiarized. So, journal publishers run a real legal risk, in that they do not check for plagiarism, - and they have to check this across all journals of all publishers, since they claimed it's new. The Schoen case and many others confirm: plagiarism in the e-age is a real and formidable because it is so easy to-do. Plagiarism only seemed to be rare, because it was not checked by the journals. An still wider spread abuse is self-plagiarism, copy-and-pasting from one's own older papers. Easy, 'legal', but a piece of misconduct by the author from the standpoint of the reader. http://www.iupap.org lists the recent London conference on plagiarism, misconduct of authors, referees, journal editors. Ebs . Eberhard R. Hilf, Dr. Prof.; CEO (Geschaeftsfuehrer) Institute for Science Networking Oldenburg GmbH an der Carl von Ossietzky Universitaet Ammerlaender Heerstr.121; D-26129 Oldenburg ISN-home: http://www.isn-oldenburg.de/ homepage: http://isn-oldenburg.de/~hilf email : h...@isn-oldenburg.de tel : +49-441-798-2884 fax : +49-441-798-5851 On Thu, 6 Nov 2003, Stevan Harnad wrote: Yet another piece of evidence has appeared that seems to confirm that whereas central archiving was historically the way in which self-archiving began, it is not the fastest or best form for it to grow and spread today: The Nature headline is (as usual for the press) an exaggeration: Critical comments threaten to open libel floodgate for physics archive http://www.nature.com/cgi-taf/Dynapage.taf?file=/nature/journal/v426/n6962/full/426007b_fs.html Legal concerns plague open access physics archive http://www.scidev.net/news/index.cfm?fuseaction=readnewsitemid=1087language=1 but the facts seem to be that, across the years, some papers that contained plagiarism or libel might have found their way into ArXiv's vast (250,000 papers) and unvetted collection. http://www.arxiv.org I said unvetted, but of course almost all those papers are also submitted to peer-reviewed journals, which *do* vet them, and when there have been any corrections to the unrefereed preprint, the authors self-archive the refereed postprint too: http://opcit.eprints.org/tdb198/opcit/ So the (tiny) problem of plagiarism and libel is with papers that have *not* been peer-reviewed. ArXiv can make an effort to vet its daily submissions for plagiarism or libel, but at nearly 4000 per month, this would be quite a task: http://arxiv.org/show_monthly_submissions So the natural conclusions to draw from this seem to be the following: (1) OAI-interoperability has now made all OAI-compliant archives equivalent: They can all be harvested and jointly searched. It no longer makes any difference which archive a paper is actually deposited in: http://oaister.umdl.umich.edu/o/oaister/ (2) Not only are institutions in the best position to vet their own research output before approving deposits in their own institutional archives (probably on a departmental basis, optimally) http://www.ecs.soton.ac.uk/~harnad/Temp/archpolnew.html but this vetting load is much better shouldered in a distributed way, rather than having one centralized vettor for all of the planet's research output (in physics, mathematics, or other disciplines). (3) Having institutional self-archived research output housed in the institution's own archives also immunizes the archive from external liabilities (such as plagiarizers from other institutions) but it also makes it even more clear that -- contrary to what the Nature article says it is, and perhaps contrary even to what the Physics ArXiv *thinks* it is -- open-access archives are not *publishers*! They are merely a means of providing open access to (refereed) publications (as well as to their precursor unrefereed preprints). Garfield: 'Acknowledged Self-Archiving is Not Prior Publication' http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2239.html For those who needed a reminder of it, research's publish or perish mandate is *not* self-archive or perish! Publication refers to certification as having met the known peer-review quality standards of a journal, not to having pressed the click button to self-archive an unrefereed draft in an open-access archive! That meets the (trivial) legal
Re: Central vs. Distributed Archives
Trends in Self-Posting of Research Material Online by Academic Staff Theo Andrew supplies a case study from the University of Edinburgh. http://www.ariadne.ac.uk/issue37/andrew/ This is a survey preceding a series of SHERPA eprint self-archiving projects http://www.sherpa.ac.uk/ to be implemented at Edinburgh. Prior to the implementation of these projects at the University of Edinburgh, it was decided that a baseline survey of research material already held on departmental and personal Web pages in the ed.ac.uk domain The main conclusion of this advance survey was that: (1) an unexpectedly high volume of research material (over 1000 peer-reviewed journal articles) exists online in the ed.ac.uk domain and (2) there is a direct correlation between willingness to self-archive and the [prior] existence of subject-based [non-Edinburgh] repositories It is perhaps unsurprising that the Edinburgh disciplines that are the most advanced in self-archiving are the ones that are also most advanced globally, having their own central, discipline-based archives (elsewhere). That said, 1000 is still a small number (relative to Edinburgh's annual output), and now going on to establish departmental eprint archives at Edinburgh will further promote self-archiving at Edinburgh, especially if Edinburgh and the UK Research Funding Councils adopt a systematic open-access policy along the lines of the Berlin Declaration: http://www.ecs.soton.ac.uk/~harnad/Temp/berlin.htm http://www.ecs.soton.ac.uk/~harnad/Temp/archpolnew.html http://www.eprints.org/self-faq/#institution-facilitate-filling http://www.ariadne.ac.uk/issue35/harnad/ The article goes on to note: The big problem is that this material is widely dispersed and therefore not easily found. This is not very useful for the wider dissemination of scholarly work. Also, personal Web sites tend to be ephemeral... This refers to the 1000 articles self-archived at Edinburgh *before* the forthcoming Edinburgh eprint archives are implemented. The upcoming archives will presumably be OAI-compliant -- http://www.openarchives.org -- thereby solving the problem of dispersal and interoperability that besets arbitrary websites. As these self-archived articles will be duplicates of the published version, self-archived in order to provide immediate open access, the primary preservation problem will not be theirs; it will be the problem of the producers and purchasers of the publishers' proprietary version. The self-archived versions in the Physics ArXiv, for example, have lasted twelve years now, and been successfully retrofitted for OAI-compliance. There is every reason to belief that the growth of self-archived content itself will be the best guarantor that we will see for its perennity. Oddly, there is no reference in this article to Edinburgh's own most important existing eprint archive, already OAI-compliant, and containing 10% of Edinburgh's current self-archived articles: http://archive.ling.ed.ac.uk/ (There seems to be some confusion of its contents with those of a non-Edinburgh archive -- http://cogprints.ecs.soton.ac.uk/ -- which overlaps with it in subject matter). There is also no reference to any prior usage surveys, such as: http://www.eprints.org/results/ http://opcit.eprints.org/opcitevaluation.shtml It is unfortunate that the title refers to self-posting whereas the more widely used term self-archiving throughout the text itself: Why proliferate needless and confusing synonyms? [The title may have been been an unwise editorial suggestion that the author should have declined!]) Stevan Harnad NOTE: Complete archive of the ongoing discussion of providing open access to the peer-reviewed research literature online is available at the American Scientist September Forum (98 99 00 01 02 03): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/index.html Posted discussion to: american-scientist-open-access-fo...@amsci.org Dual Open-Access Strategy: BOAI-2: Publish your article in a suitable open-access journal whenever one exists. BOAI-1: Otherwise, publish your article in a suitable toll-access journal and also self-archive it. http://www.soros.org/openaccess/read.shtml http://www.eprints.org/signup/sign.php
Re: Central vs. Distributed Archives
Stevan Harnad wrote: Just as it was counterproductive to villify toll-access publishers (instead of either founding open-access journals or self-archiving), so it is counterproductive to villify open-access publishers (instead of either founding competing open-acecss journals or self-archiving). It is also counterproductive to ignore the authors from the developing world who have been always kept away from the mainstream. I am not against the author pays model, but just against the lack of flexibility in operation.Majority of researchers in developing countries have never had the luxury of being funded. [our own study (unpublished) on authors publishing in top Indian Journals indexed in MEDLINE shows more than 90% have had no funding for their research and those who had it , had something like a miniscule fraction of what is considered as *funding* in the developed countries]. This would simply mean they would never be able to pay from their funds!. There could be other viable models- like paying a fixed percentage of funds for publishing. This would sound more aesthetic to researchers too. This would also mean publishers could easily subsidize for research from developing countries as well as researchers from Developed countries who are not funded. So is the monopolistic objection that BMC and PLoS have more start-up support, giving them an advantage over journals without that support, or is the objection that they have an author pays model, unaffordable for some authors? The heavy start up support gives them a clear edge over new and existing publishers. PLoS Biology would not have received the popularity and access [the traffic nearly broght down their elegant homepage to just a couple of links on the day of inauguration]. And the PLoS fund was better used to support lobbying -- http://bmj.bmjjournals.com/cgi/content/full/326/7392/766#art -- rather than entering into neck-to-neck fight with existing publishers. If it was really interested in supporting open Access, it should have supported Journal of Biology, an Open Access Journal from BMC. And the same can be said about volunteer-service-based journals: It is too early to say whether they can last on volunteerism alone, let alone whether volunteerism can scale up to all 24,000 refereed journals! Just imagine the scalability if the Internet was monopolised by come company! The whole spectrum of resources we access with a click was created by volunteerism, donations and public money. Does PubMed/PubMedCentral make any profit? Perhaps a far better choice would have been to require all your authors to (1) try to self-archive their articles at their own institutions, and only in those cases where that failed, (2) to self-archive them in CogPrints or another suitable OAI-compliant archive. Offloading the self-archiving task onto the distributed authorship instead of the journal staff would take some of the load off the volunteer efforts (hence costs) involved! That policy would also have the benefit of spreading the practise of self-archiving by authors, as well as archive-provision by their institutions. And yes! we actually plan to provide the authors with PDF reprints which they could archive on their own. We did it ourselves just because we need to see the whole thing gets started. We are also encouraging authors to republish them on their institutional websites/repositories or their own websites in addition to our existing archive at Cogprints. These are the vulnerabilities of new journals; they have nothing to do with open-access. The sudden disappearance of a journal website would not have made it so desparate if it was open access and someone would have copied it somewhere [ some of the JMIR articles are available at http://www.cybermedicine.netfirms.com [I own and maintain this site] after it became open. I have also seen a number of similar websites offering JMIR content]. This would mean one could access it just by searching for the keywords on Google or any major search engine for that matter. At the same time, that would not be the situation in a journal which is toll-access. Dr. Vinod Scaria http://www.drvinod.netfirms.com MAIL:vinodsca...@yahoo.co.in Tel: +91 98474 65452
Re: Central vs. Distributed Archives
a couple of months back that they would go Open. [I am in the Editorial board of OJHAS from Sept 2003]. OJHAS is edited and published by a small group of scholars with no external support. Everything from Web Design to Editing and Review are done by voluntarily by the Editorial team. It also stands as a fine example of the fact that Open Access Journals can indeed be successfully organised and can indeed survive without an author pays model. There exists no firm evidence at all at the moment as to whether or not Open Access journals can survive, with or without an author pays model. Subsidized journals are subsidized journals, and depend on the survival of the subsidy, not the journal. Author pays journals have been around for far too short a time for us to know whether they can survive. And the same can be said about volunteer-service-based journals: It is too early to say whether they can last on volunteerism alone, let alone whether volunteerism can scale up to all 24,000 refereed journals! Now coming to the Archival, Cogprints was our first choice for many reasons 1] It offers interoperability [as mentioned by Harnad] 2] It offers unmatched popularity 3] It has been there for years and we can be sure of the permanence 4] It is of course FREE. Perhaps a far better choice would have been to require all your authors to (1) try to self-archive their articles at their own institutions, and only in those cases where that failed, (2) to self-archive them in CogPrints or another suitable OAI-compliant archive. Offloading the self-archiving task onto the distributed authorship instead of the journal staff would take some of the load off the volunteer efforts (hence costs) involved! That policy would also have the benefit of spreading the practise of self-archiving by authors, as well as archive-provision by their institutions. And as Harnad suggested, there is no reason why Journals should not be archived at Open Archives, be it self maintained repositories or Centralised ones. In fact Open Archiving of electronic journals is the need of the hour because our own studies [unpublished] show that Electronic journals are just as ephemeral as websites. Scholarly communication should never be lost at the cost of copyright restrictions. Many of these journals have perhaps done more harm than good by locking the access by copyright restrictions. This is too vague: For toll-access journals, the preservation burden for their contents (both the paper version and the online version) is squarely on the shoulders of the journals that sell them and the libraries that buy them. The self-archived versions of toll-access journal articles are merely *duplicates,* provided for access, and it is a strategic mistake to make an issue of concerns about their long-term preservation. Those duplicates have lasted over 12 years already and they will continue to last long enough to be retrofitted with whatever solution the open-access era may eventually generate, if/when it prevails. But the fact that new journals (whether paper or online) come and go is a different problem. Journals should be archival in the sense that they continue to exist. If they just make an appearance for a few months or years and then vanish, then they are merely scattered collections of items, and the preservation of such orphan items is a problem independent of the problem of open access. Moreover, electronic journals are equally vulnerable to the vagaries of the Internet. For example, JMIR www. jmir. org went suddenly offline some time back [i think it was an year or so] making the whole content inaccessible. [But it reappeared later and now is an Open Access Journal]. These are the vulnerabilities of new journals; they have nothing to do with open-access. Thus in short, OPen Archiving of Journals as a whole is perhaps to be discussed in a wider perspective than just making it OPEN. The major emphasis should be the PERMANENCE of Open Archiving. I hope this post will surely trigger a debate on the topic. Preservation and access are -- for the time being -- very different matters. The pressing problem for authors of the toll-access literature today is access-denial and impact-loss, not preservation. It is a mistake to conflate the open access problem with the digital preservation problem, and it helps neither open access nor digital preservation. Stevan Harnad Kind regards Dr. Vinod Scaria Executive Editor: Calicut Medical Journal Assoc Editor: Online Journal of Health and Allied Sciences Editor in Chief: Internet He@ lth WEB: www. drvinod. netfirms. com MAIL: vinodscaria@yahoo. co. in Mobile: +91 98474 65452 - Original Message - From: Stevan Harnad To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM@LISTSERVER. SIGMAXI. ORG Sent: Wednesday, October 29, 2003 3:38 AM Subject: Re: Central vs. Distributed Archives The two items that follow below are by Vinod Scario from Peter Suber's Open Access News http://www
Re: Central vs. Distributed Archives
I would like you to defend your claim that PLoS is crunching small publishers. Can you provide an example? - Original Message - From: Dr. Vinod Scaria drvi...@hotpop.com To: american-scientist-open-access-fo...@listserver.sigmaxi.org Sent: Thursday, October 30, 2003 9:07 AM Subject: Re: Central vs. Distributed Archives CALICUT MEDICAL JOURNAL http://www.calicutmedicaljournal.org ARCHIVES AT COGPRINTS *** As we all know, Open Access Publishing is not gaining the momentum as far as Journals published from Developing Countries are concerned [with reference to western Journals]. Many reasons can be attributed like: 1. Monopolistic nature of Open Access Publishers like BioMedCentral http://www. biomedcentral.com which pursues the author pays and would drive away any author from Developing countries. Thus obviously publishers from Developing countries would have second thoughts before starting one at BMC. By meaning monopolistic, I refer to the almost complete control over open access publishing- say about 75% of open Access Journals in Medicine.and Mega organisations like PLOS are crunching the small publishers, as they can easily override the smaller ones with the mega funding they have. see: http://bmj.bmjjournals.com/cgi/content/full/326/7392/766#art 2. As I previously stated in my Editorial in Internet Health- www. virtualmed. netfirms. com/internethealth/articleapril03. html , the fear of losing revenue, which are the sole source of sustenance of many Journals [though some make a meagre profit]. 3. Lack of sufficient expertise and exposure to Open Access Publishing. www. virtualmed. netfirms. com/internethealth/opinion0303. html http://bmj. com/cgi/eletters/326/7382/182/b But recent developments are worth mentioning - at least from India. Online Journal of Health and Allied Sciences www. ojhas. org , India's first Online BioMedical journal declared a couple of months back that they would go Open. [I am in the Editorial board of OJHAS from Sept 2003]. OJHAS is edited and published by a small group of scholars with no external support. Everything from Web Design to Editing and Review are done by voluntarily by the Editorial team. It also stands as a fine example of the fact that Open Access Journals can indeed be successfully organised and can indeed survive without an author pays model. Now coming to the Archival, Cogprints was our first choice for many reasons 1] It offers interoperability [as mentioned by Harnad] 2] It offers unmatched popularity 3] It has been there for years and we can be sure of the permanence 4] It is of course FREE. And as Harnad suggested, there is no reason why Journals should not be archived at Open Archives, be it self maintained repositories or Centralised ones. In fact Open Archiving of electronic journals is the need of the hour because our own studies [unpublished] show that Electronic journals are just as ephemeral as websites. Scholarly communication should never be lost at the cost of copyright restrictions. Many of these journals have perhaps done more harm than good by locking the access by copyright restrictions. Moreover, electronic journals are equally vulnerable to the vagaries of the Internet. For example, JMIR www. jmir. org went suddenly offline some time back [i think it was an year or so] making the whole content inaccessible. [But it reappeared later and now is an Open Access Journal]. Thus in short, OPen Archiving of Journals as a whole is perhaps to be discussed in a wider perspective than just making it OPEN. The major emphasis should be the PERMANENCE of Open Archiving. I hope this post will surely trigger a debate on the topic. Kind regards Dr. Vinod Scaria Executive Editor: Calicut Medical Journal Assoc Editor: Online Journal of Health and Allied Sciences Editor in Chief: Internet He@ lth WEB: www. drvinod. netfirms. com MAIL: vinodscaria@yahoo. co. in Mobile: +91 98474 65452 - Original Message - From: Stevan Harnad To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM@LISTSERVER. SIGMAXI. ORG Sent: Wednesday, October 29, 2003 3:38 AM Subject: Re: Central vs. Distributed Archives The two items that follow below are by Vinod Scario from Peter Suber's Open Access News http://www. earlham. edu/~peters/fos/fosblog. html It provides an interesting and inspiring example of the power and value of OAI-interoperability http://www. openarchives. org/ and the interdependence of the two open-access strategies (open-access self-archiving and open-access journal publishing) that this new online open-access journal, produced in India, is being made accessible by archiving it http://calicutmedicaljournal. org/archives. html in a specially created sector of CogPrints in the UK, http://cogprints. ecs. soton. ac. uk/view/subjects/JOURNALS. html
Re: Central vs. Distributed Archives
The two items that follow below are by Vinod Scario from Peter Suber's Open Access News http://www.earlham.edu/~peters/fos/fosblog.html It provides an interesting and inspiring example of the power and value of OAI-interoperability http://www.openarchives.org/ and the interdependence of the two open-access strategies (open-access self-archiving and open-access journal publishing) that this new online open-access journal, produced in India, is being made accessible by archiving it http://calicutmedicaljournal.org/archives.html in a specially created sector of CogPrints in the UK, http://cogprints.ecs.soton.ac.uk/view/subjects/JOURNALS.html a multidisciplinary central archive created in 1997 for author self-archiving (which is now being done more via distributed institutional eprint archives -- to which the CogPrints software was adapted by Rob Tansley, creator of eprints http://software.eprints.org/#ep2 and then of dspace http://www.dspace.org/ -- rather than via central ones like CogPrints). Yet there is no reason a central archive like CogPrints (or, for that matter, any of the distributed institutional archives) cannot provide a locus for open-access journals too! OAI-interoperability means that they will all be picked up and integrated by cross-archive harvesters like OOAster! http://oaister.umdl.umich.edu/o/oaister/ - 1. The Editorial of the Inaugural issue of Calicut Medical Journal- Online, open access journals: the only hope for the future http://calicutmedicaljournal.org/2003;1(1)e1.htm discusses in detail how and why Calicut Medical Journal supports the Open Access initiatives.In his editorial, Dr Ramachandran, stresses the need to disseminate knowledge in the widest possible sphere, and especially between scholars of other developing countries and asserts that Open Access is the best possible solution to achieve this goal.The Editorial also criticises the widely publicised author pays model as discouraging for scholars from developing world and states it would badly affect the already low level of publications from these countries. It also discusses the various advantages of being Online and Open. He also asserts the need for more regional Open Access Journals to meet the specific demands of scholars and clinicians and for the maintenance and enhancement of the quality of health services.The editorial concludes with the statement that Calicut Medical Journal would play a dual role - being International by being online ,Open and upholding the highest standards of publication,and at the same time catering to the needs of Indian Scholars and Clinicians. Posted by Vinod Scaria at 12:27 PM. 2. The Calicut Medical Journal is Online http://calicutmedicaljournal.org/ The much awaited Calicut Medical Journal is Online. The new Open Access BioMedical Journal published by the Calicut Medical College Alumni Association, is the second Indian Open Access BioMedical Journal. With new Open Access medical Journals coming up in India, existing publishers are already feeling the heat of competetion . While these two Open Access Journals offer online acceptance of manuscripts, speedy peer review and almost instant publication, with a host of utilities, and ofcourse without a pricetag, other publishers are still in dark with their outdated modes of peer review and publication. The web statistics of these Journals are telltele signs of the fact that Open Access Publications are widely embraced. Being Open Access, these Journals also aim to have an International impact, which was hitherto virtually impossible in the conventional publishing model. Posted by Vinod Scaria at 12:22 PM.
Re: Central vs. Distributed Archives
Dear Stevan and the list members, here are some arguments for 1. All physicists will publish in the ArXiv not before the year 2050, although the arxiv size is growing quadratically, not linearly with time. Earlier estimates [St. Harnad, http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.htm slide 25 are to be revised]. 2. Usage of repositories seem to be proportional to their size, but independent of absolute size. The full text you find at http://www.isn-oldenburg.de/~hilf/publications/arxiv-analyis.ps physicists will publish in the ArXiv not before the year 2050 Here are some more elaborate but rather audacious risky estimates (P.Ginsparg would know better). The ArXiv is unique in that it serves its own usage and submission logs. At present (after 146 months of service) there are 246.555 documents stored. The monthly rate of incoming new documents are at present 3.500. It rises linearly with time, see http://arxiv.org/show_monthly_submissions Next month there will be 24 papers more per month handed in than this month. This allows to integrate it to get an estimate, at which future time virtually all physicists would send in their prime papers to the ArXiv. Let us estimate the number of physicists worldwide to be 1.000.000 of which 10 % might be active as researchers, producing, say 2 papers per year. Then we have 200.000 prime physics papers per year. Integrating this yields to see them all in ArXiv to be in 44 years and six months from now, that is in the year 2050. Clearly, by then we will have passed more technical revolutions, so that this steady state extrapolation is not likely to happen. Other new developments may have a much steeper rise of spreading, notably the selfarchiving by the authors, their institutes or Universities and their libraries forming a distributed net of repositories. The advantage is its scalability, flexibility, the business model (distributed funding by the institutions of the creators of the documents), the retaining of the author's rights, the update possibility, and the acceptance spreading: to convince a large body such as a learned community to set up a central service such as the ArXiv for physics is much harder, then to convince a percentage of local distributed institutions and institutes (the multiple small versus one large barrier chance). The challenges are to set up the needed international standards, to allow intelligent search engines to serve the retrieval, to stimulate the discussion and communication between the authors, -known in the past of beeing very conservative but not considerate of their working habits, and not very colloquial about it, used that they are being taken care of and that someone else pays.. At present, the ArXiv is still unique in serving unconditional time stamp, and long term readability. Is the usage is proportional to the size of a repository? Reachout to and satisfaction of users of a repository may be estimated by the ratio of pageviews per month divided by the number of documents, This ratio is astonishingly similar for different respositories even of widely different size, may they contain documents or links. For Marenet with its 1.595 links it is 1.9 for MPIVwith its 3.027 links it is 3.6 for Physnet with its 5.759 links it is 4.2 for VAB with its 2.655 links it is 10.4 for ArXiv with its 245.056 docs it is 16.3 All numbers are astonishingly low, as we know from libraries usage of journals and books. Eberhard Hilf, h...@isn-oldenburg.de Institute for Science Networking Oldenburg GmbH at the Carl von Ossietzky University http://www.isn-oldenburg.de i On Tue, 9 Sep 2003, Stevan Harnad wrote: On Mon, 8 Sep 2003, Eberhard R. Hilf wrote: the physics ArXiv has a linear increase of the number of papers put in per month, this gives a quadratic acceleration of the total content (growth rate of Data base), not linear. Maybe so. But slide 25 of http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.htm (slide 25) still looks pretty linear to me. And it looks as if 100% was not only *not* reached at this rate 10 years after self-archiving started in physics in 1991, but it won't be reached for another 10 years or so... Total amount by now may be at 10-15 % of all papers in physics. I count that as appallingly low, considering what is so easily feasible (though stunningly higher than any other field!)... Linear growth of input rate means the number of physicists and fields using it rises, while in each field (and physicist) a saturation is reached after a first exponential individual rise. Interesting, but the relevant target is 100% of physics (and all other disciplines) -- yesterday! Never there will be a saturation such that all papers will go this way, since in different fields culture and habits and requirements are different. -- I couldn't follow that: Never 100%? Even at this rate? I can't
Re: Central vs. Distributed Archives
Ebs Hilf -- who will host a meeting on the subject next week: http://physnet.physik.uni-oldenburg.de/projects/SINN/sinn03/programme.html -- confirms that the rate of growth of the biggest and oldest open-access archive -- the Physics Arxiv -- is still far, far too slow. I entirely agree. This does not diminish from the credit from Arxiv's having been the first; but now, 12 years down the road, this unchangingly slow rate suggests that something more may be needed than what has been feeding Arxiv across the years, and my own guess (and Ebs's) is that that something more may well be distributed institution-based self-archiving, instead of Arxiv's central discipline-based self-archiving. http://www.ecs.soton.ac.uk/~harnad/Temp/archpolnew.html The reason institutional self-archiving is more likely to speed up self-archiving and to generalize it across disciplines is that researchers and their institutions both share the benefits of the impact of their research output, whereas researchers and their disciplines do not. It is not the discipline that exercises the incentive of the publish-or-perish carrot-and-stick on researchers, it is their research institutions. As the co-investor in and co-beneficiary of the rewards of research impact (research funding, overheads, reputation, prizes) the researcher's institution is in a position to mandate not only publish or perish but publish with maximal impact -- which means maximal access, which means open access, which means self-archiving. http://www.ariadne.ac.uk/issue35/harnad/ I think on all this we agree with Ebs Hilf. Ebs too notes the likely remedy for the sluggish growth rate of self-archiving in physics: institutional (indeed, departmental) self-archiving. What is needed to accelerate that is compelling empirical demonstrations of the correlation between access and impact, to make researchers and their institutions realize that self-archiving is in their own interest (and how much so) -- in all disciplines. There is, however, in Ebs's summary below, a rather important and potentially misleading ambiguity: He conflates self-archiving with publishing -- referring to depositing papers in Arxiv as publishing them, in contrast to self-archiving them in institutional eprint archives. But surely *both* of these are self-archiving and not publishing! The publishing is done in the journals (in both cases). The self-archiving is merely the provision of a supplementary version of the paper, its full-text accessible online toll-free for all would be users webwide (in either a central discipline-based eprint archive or in distributed institution-based eprint archives). Both central disicplinary archives like Arxiv and distributed institutional archives include, in addition to the all important peer-reviewed, published version of each article (the postprint) also the pre-peer-review preprint version(s) and sometimes also postpublication updated and enhanced versions (post-postprints). But the critical version, and the one that counts as the publication, is of course the published postprint: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2239.html . That (and not unpublished preprints or revisions) is what publish-or-perish is all about! http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm#1.4 But apart from these minor points, I don't think Ebs and I disagree. Here is the quote/commentary: On Wed, 10 Sep 2003, Eberhard R. Hilf wrote: Dear Stevan and the list members, here are some arguments for 1. All physicists will publish in the ArXiv not before the year 2050, although the arxiv size is growing quadratically, not linearly with time. Earlier estimates [St. Harnad, http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.htm slide 25 are to be revised]. [see http://isn-oldenburg.de/~hilf/ ] If readers look at slide 25 above, they will find that according to Ebs's estimate (which I accept!), it would have to be revised to extend the linear growth from 2020 instead to 2050. According to Ebs, at the present growth rate, 2050 would be the first year in which *all* physics articles published in that year are self-archived in Arxiv. But note that that's *self-archived* in Arxiv, not *published* in Arxiv: There is absolutely no reason to believe that all those articles will not continue (*exactly* as they all do now) being published in the appropriate peer-reviewed journal for their area and their quality-level. (Publication will continue to mean, as it does now, peer-review and certification of having met that journal-name's quality standards.) And the rate of growth of the portion of total annual published journal article output in physics that is self-archived will grow (linearly!) from now till it reaches 100% in 2050, at exactly the same unchanging rate at which it has been growing for 12 years now. 2. Usage of repositories seems to be proportional to their size, but independent of absolute size. The full text you find at
Re: Central vs. Distributed Archives
On Mon, 8 Sep 2003, Eberhard R. Hilf wrote: the physics ArXiv has a linear increase of the number of papers put in per month, this gives a quadratic acceleration of the total content (growth rate of Data base), not linear. Maybe so. But slide 25 of http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.htm (slide 25) still looks pretty linear to me. And it looks as if 100% was not only *not* reached at this rate 10 years after self-archiving started in physics in 1991, but it won't be reached for another 10 years or so... Total amount by now may be at 10-15 % of all papers in physics. (10-15% of the annual output, I assume.) I count that as appallingly low, considering what is so easily feasible (though stunningly higher than any other field!)... Linear growth of input rate means the number of physicists and fields using it rises, while in each field (and physicist) a saturation is reached after a first exponential individual rise. Interesting, but the relevant target is 100% of the annual output of physics (and all other disciplines) -- yesterday! Never there will be a saturation such that all papers will go this way, since in different fields culture and habits and requirements are different. -- I couldn't follow that: Never 100%? Even at this rate? I can't imagine why not. Cultural differences? Do any of the cultural differences between fields correspond to indifference or antipathy toward research impact -- toward having their research output read, used, cited? Unless the cultural differences are specifically with respect to that, then they are irrelevant. Requirement differences? Are any universities or research funders indifferent or averse to their researchers' impact? Unless they are, any remaining requirement-differences are irrelevant. Habit differences? Well, yes, there are certainly those. But that is just what this is all about *changing*! Are any field's current access/impact practises optimal? or unalterable for some reason? If not, then habit-change is (and always has been) the target! And the point is that the rate of habit-change is still far too slow -- relative to what is not only possible, but easily done, and immensely beneficial to research, researchers, etc. -- in all disciplines. [That is why it is e.g. best, to keep letter distribution by horses at a remote island (Juist) alive since the medieval times]. That I really couldn't follow! If you mean paper is still a useful back-up, sure. But we're not talking about back-up. We are talking about open online access, which has been reachable for at least a decade and a half now, and OAI-interoperably since 1999. What more is the research cavalry waiting for, before it will stoop to drink? Stevan Harnad NOTE: A complete archive of the ongoing discussion of providing open access to the peer-reviewed research literature online is available at the American Scientist September Forum (98 99 00 01 02 03): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html or http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/index.html Discussion can be posted to: american-scientist-open-access-fo...@amsci.org
Re: Central vs. Distributed Archives
?iso-8859-1?Q?Hugo_Fjelsted_Alr=F8e?= writes By community-building, I mean that such archives can contribute to the creation or development of the identity of a scholarly community in research areas that go across the established disciplinary matrix of the university world. This crucial if self-archiving is to take off. I know the same thing can in principle be done with OAI-compliant university archives and a disciplinary hub or research area hub, and in ten years time, we may not be able to tell the difference. But today, it is still not quite the same thing. Correct. This is a point that is too many times overlooked. RePEc (see http://repec.org) prodives an example for this in the area of economics. RePEc archives are not OAI compliant but an OAI gateway export all the RePEc data. Many RePEc services are in the business of community building. The crucial part, though, it RePEc's author registration service. Cheers, Thomas Krichel mailto:kric...@openlib.org from Espoo, Finlandhttp://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel
Re: Central vs. Distributed Archives
Stevan Harnad wrote: Those are all OAI-compliant archives, and they include both central, discipline-based archives and distributed institutional archives. With OAI-interoperability, it doesn't matter which kind of OAI archive a paper is in, but I am promoting university archives http://www.eprints.org/self-faq/#institution-facilitate-filling http://www.eprints.org/ rather than central ones (even though I founded a central one myself http://cogprints.ecs.soton.ac.uk/ ) because researchers' institutions (and their research funders) all share in the joint publish-or-perish interests (and rewards) of maximizing the impact of their research output. Central repositories and disciplines do not. (They are the common locus for research that is competing for impact.) Hence research institutions (and their funders) are in a position to encourage, facilitate, and even mandate (through an extension of the publish-or-perish carrot-and-stick) open-access self-archiving of their own research output in their own OAI archive by their researchers, whereas disciplines and central organizations (e.g., WTO, WHO, UNESCO) are not: http://www.ecs.soton.ac.uk/~harnad/Temp/archpolnew.html http://www.ariadne.ac.uk/issue35/harnad/ I think it is still too early to write off any of the possible paths to open access within the field of self-archiving (not that you do that). I see a potentially very fruitful role for community-building archives that focus on certain research areas. These could be facilitated or mandated by some of the specialized public research institutions that, together with universities and private companies, inhabit the research landscape. I think of research institutions oriented towards applied research within for instance environmental research, agriculture, public health, education, community development, etc. Here, there is a clear two-sided research communication: towards the public and towards other researchers in the field. Open access thus serves two communicative purposes, improving scholarly communication and improving public access to research results, besides the complementary purpose of institutional self-promotion. By community-building, I mean that such archives can contribute to the creation or development of the identity of a scholarly community in research areas that go across the established disciplinary matrix of the university world. I have myself inititated an archive in research in organic agriculture (http://orgprints.org), which we hope will become a center for international communication and cooperation in this area. Scientific papers from research in organic agriculture are published in many different specialized disciplinary journals as well as in general scientific journals and journals focused at organic agriculture, and it is not easy for researchers to keep track of all that is being published. I know the same thing can in principle be done with OAI-compliant university archives and a disciplinary hub or research area hub, and in ten years time, we may not be able to tell the difference. But today, it is still not quite the same thing. Contributing to the community would be detached from the usage of what is there, since the depositing of papers would take place somewhere outside the hub. This makes it dependent on the widespread existence of university archives. So if one wants to establish such an open-archive-based scholarly community hub, the way to do it is to make an eprint archive with the scope that one wants. Having said that, it is still a historical fact that the first and still-biggest open-access OAI archive is a central, discipline-based one, the Physics Archive founded in 1991 http://arxiv.org/. But Arxiv's growth rate has been steadily linear since 1991, and shows no sign of either accelerating or generalizing to all the other disciplines. So clearly something else was needed to hasten the open-access era, and my own hunch is that a concerted policy university-based archiving was what was needed. http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.ppt What's wrong with linear growth? It must be the SIZE of the growth rate that is important. And how long it will take to realize some satisfying level of open access with this growth rate. When you are looking for exponential growth, I take it that you are looking for something that MIGHT turn out to have a higher maximum growth rate than, for instance, arXiv. And that is all well, but it might be exponential and still have a slower maximum growth than the linear growth we see in arXiv. In the presentation that you refer to above, you write: At that rate, it would still take a decade before we reach the first year that all physics papers for that year are openly accessible. I think that this is an impressive and very satisfying growth. And I don't think that a decade is too long - the great news is that physics is getting there! Kind regards Hugo Alroe,
Re: Central vs. Distributed Archives
dear Colleagues, the physics ArXiv has a linear increase of the number of papers put in per month, this gives a quadratic acceleration of the total content (growth rate of Data base), not linear. Total amount by now may be at 10-15 % of all papers in physics. Linear growth of input rate means the number of physicists and fields using it rises, while in each field (and physicist) a saturation is reached after a first exponential individual rise. Never there will be a saturation such that all papers will go this way, since in different fields culture and habits and requirements are different. -- [That is why it is e.g. best, to keep letter distribution by horses at a remote island (Juist) alive since the medieval times]. Ebs . Eberhard R. Hilf, Dr. Prof.; CEO (Geschaeftsfuehrer) Institute for Science Networking Oldenburg GmbH an der Carl von Ossietzky Universitaet Ammerlaender Heerstr.121; D-26129 Oldenburg ISN-home: http://www.isn-oldenburg.de/ homepage: http://isn-oldenburg.de/~hilf email : h...@isn-oldenburg.de tel : +49-441-798-2884 fax : +49-441-798-5851 On Mon, 8 Sep 2003, ?iso-8859-1?Q?Hugo_Fjelsted_Alr=F8e?= wrote: Stevan Harnad wrote: Those are all OAI-compliant archives, and they include both central, discipline-based archives and distributed institutional archives. With OAI-interoperability, it doesn't matter which kind of OAI archive a paper is in, but I am promoting university archives http://www.eprints.org/self-faq/#institution-facilitate-filling http://www.eprints.org/ rather than central ones (even though I founded a central one myself http://cogprints.ecs.soton.ac.uk/ ) because researchers' institutions (and their research funders) all share in the joint publish-or-perish interests (and rewards) of maximizing the impact of their research output. Central repositories and disciplines do not. (They are the common locus for research that is competing for impact.) Hence research institutions (and their funders) are in a position to encourage, facilitate, and even mandate (through an extension of the publish-or-perish carrot-and-stick) open-access self-archiving of their own research output in their own OAI archive by their researchers, whereas disciplines and central organizations (e.g., WTO, WHO, UNESCO) are not: http://www.ecs.soton.ac.uk/~harnad/Temp/archpolnew.html http://www.ariadne.ac.uk/issue35/harnad/ I think it is still too early to write off any of the possible paths to open access within the field of self-archiving (not that you do that). I see a potentially very fruitful role for community-building archives that focus on certain research areas. These could be facilitated or mandated by some of the specialized public research institutions that, together with universities and private companies, inhabit the research landscape. I think of research institutions oriented towards applied research within for instance environmental research, agriculture, public health, education, community development, etc. Here, there is a clear two-sided research communication: towards the public and towards other researchers in the field. Open access thus serves two communicative purposes, improving scholarly communication and improving public access to research results, besides the complementary purpose of institutional self-promotion. By community-building, I mean that such archives can contribute to the creation or development of the identity of a scholarly community in research areas that go across the established disciplinary matrix of the university world. I have myself inititated an archive in research in organic agriculture (http://orgprints.org), which we hope will become a center for international communication and cooperation in this area. Scientific papers from research in organic agriculture are published in many different specialized disciplinary journals as well as in general scientific journals and journals focused at organic agriculture, and it is not easy for researchers to keep track of all that is being published. I know the same thing can in principle be done with OAI-compliant university archives and a disciplinary hub or research area hub, and in ten years time, we may not be able to tell the difference. But today, it is still not quite the same thing. Contributing to the community would be detached from the usage of what is there, since the depositing of papers would take place somewhere outside the hub. This makes it dependent on the widespread existence of university archives. So if one wants to establish such an open-archive-based scholarly community hub, the way to do it is to make an eprint archive with the scope that one wants. Having said that, it is still a historical fact that the first and still-biggest open-access OAI archive is a central, discipline-based one, the Physics Archive founded in 1991
Re: Central vs. Distributed Archives
On Wed, 3 Sep 2003, [identity deleted] wrote: Dear Mr. Harnad, I am also one of these stressed diploma-writers -- but very curious and enthusiastic. My subject is the future of institutional archives. I would be very pleased, if you could answer my questions: 1) Do you know anything about non university archives, such as NonGovernmentOrganisations (i.e., WTO, WHO, UNESCO). Do these kinds of repositories already exist? There are countless digital archives. You have to specify what *content* you have in mind. This Forum (soon to be re-named the American Scientist Open-Access Forum) is concerned *only* with scientific and scholarly *research*, before and after peer-review (preprints and postprints). Assuming that that is the content you are inquiring about, I suggest that you have a look at the archives listed by the Open Archives Initiative: http://oaisrv.nsdl.cornell.edu/Register/BrowseSites.pl as well as those indexed by http://oaister.umdl.umich.edu/o/oaister/viewcolls.html Those are all OAI-compliant archives, and they include both central, discipline-based archives and distributed institutional archives. With OAI-interoperability, it doesn't matter which kind of OAI archive a paper is in, but I am promoting university archives http://www.eprints.org/self-faq/#institution-facilitate-filling http://www.eprints.org/ rather than central ones (even though I founded a central one myself http://cogprints.ecs.soton.ac.uk/ ) because researchers' institutions (and their research funders) all share in the joint publish-or-perish interests (and rewards) of maximizing the impact of their research output. Central repositories and disciplines do not. (They are the common locus for research that is competing for impact.) Hence research institutions (and their funders) are in a position to encourage, facilitate, and even mandate (through an extension of the publish-or-perish carrot-and-stick) open-access self-archiving of their own research output in their own OAI archive by their researchers, whereas disciplines and central organizations (e.g., WTO, WHO, UNESCO) are not: http://www.ecs.soton.ac.uk/~harnad/Temp/archpolnew.html http://www.ariadne.ac.uk/issue35/harnad/ Having said that, it is still a historical fact that the first and still-biggest open-access OAI archive is a central, discipline-based one, the Physics Archive founded in 1991 http://arxiv.org/. But Arxiv's growth rate has been steadily linear since 1991, and shows no sign of either accelerating or generalizing to all the other disciplines. So clearly something else was needed to hasten the open-access era, and my own hunch is that a concerted policy university-based archiving was what was needed. http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.ppt 2) I read about the Ingenta-Southampton cooperation concerning eprints-software in 2002. What has happend so far? Is there a result yet? It's still there on paper, but Ingenta has not yet made any move to implement or promote it. The idea had been that the Ingenta option would be for those universities that did not want to be bothered with maintaining their own OAI archives, and preferred to outsource it to Ingenta. This is still a good idea, but the ball is in Ingenta's court; Southampton has plenty to do already, with optimizing and maintaining the GNU eprints.org archive-creating software it provides free to universities, with creating tools for measuring and demonstrating the impact of open-access research (to help induce researchers and their institutions to self-archive) http://citebase.eprints.org/cgi-bin/search and with trying to shape national and international self-archiving policy. Other archive-creating softwares have since appeared too but what is needed now is not more software, but more self-archiving, and a clear, focused rationale, agenda and policy for it. http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2670.html 3) Is there any other serious method of preservation expect OAIS? Serious method of preservation for *what*? As noted, the Physics Arxiv, which is OAI-compliant but not OAIS http://www.rlg.ac.uk/longterm/oais.html is alive and well, and has been since 1991. But the first, second and third objective of open-access self-archiving is *access*, right now. The main preservation burden for all the physics journal articles that are self-archived in Arxiv as preprints and postprints is not on Arxiv but on each physics journal publisher's primary corpus. http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2676.html http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2678.html Please do not conflate the problem of open-access -- which is a *supplement* to publishing in journals, not a *substitute* for it -- with the problem of digital preservation of journal content -- which is a problem for journals, not for authors' institutional OAI archives. And, in the same breath, don't conflate institutional OAI archives whose purpose is to provide
Re: Central vs. Distributed Archives
Subject Threads: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/1583.html http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0293.html From: [identity removed] What I wish to emphasize... is the big difference between posting one's production on line in one's personal site, and sending it to an international server such as ArXiv... Yes, you are quite right that there is this difference. See: Open Letter to Philip Campbell, Editor, Nature http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2601.html in which this point is explicitly discussed. Let me point out that this point (about central-disciplinary versus distributed-institutional self-archiving) is one of the three reasons I switched my own support several years ago from central, discipline-based archiving (back) to local, institution-based archiving (where I had started: http://www.arl.org/sc/subversive/ ). My three reasons for switching back were: (1) OAI-interoperability has made central and distributed self-archiving interoperable, hence jointly harvestable, searchable and navigable, hence equivalent. (2) Researchers and their institutions share a common interest in maximizing their (shared) research impact (and its rewards), whereas researchers and their disciplines do not. Institutions are hence in a position to use publish or perish carrots and sticks to encourage institutional self-archiving. Disciplines cannot (although of course any disciplinary culture of self-archiving can be equally directed toward central or institutional self-archiving). Hence institutional self-archiving, once it catches on, can grow far faster than disciplinary self-archiving. http://www.ecs.soton.ac.uk/~harnad/Temp/archpolnew.html http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm (3) Institutional self-archiving is truly *self*-archiving -- by the author, of his own institutional research output, in his own institution's research archive. And it is restricted *only* to the output from researchers of that institution, made openly accessible purely to maximize its impact. It is hence in a position to benefit from the growing number of progressive self-archiving policies on the part of publishers: http://www.lboro.ac.uk/departments/ls/disresearch/romeo/Romeo%20Publisher%20Policies.htm In contrast, a central, 3rd-party archive runs the risk of falling under the (understandable) efforts of the publisher not to let *other* publishers re-publish the work to which the original publisher has added the value. (Of course, in the online and interoperable age this is moot for give-away open-access research, because if something is openly accessible to one and all on the web, it makes no difference whatosever whether it is openly accessible from this website or from that website! But central, 3rd-party archives are a psychological deterrent because, being 3rd-party rather than self, as the author's institution is, it makes them -- in principle, but so far of course never in practise -- open to publishers' claims of 3rd-party copyright-infringement by a rival publisher. The author himself (and hence his own institution) is immune to this, and hence can be the beneficiary of the retention of the *self-archiving* right where a 3rd-party, central archive is not. Anyway, since all OAI archives are interoperable and equivalent, I see no reason at a time when self-archiving is still growing much too slowly (compared to what would so easily be possible) to retard its growth in any unnecessary way: Focussing on central discipline-based archives and self-archiving is no longer necessary. Distributed institution-based archives and self-archiving achieve the exact same end, with at least one fewer obstacle (and at least one more incentive). Yes, as you say, most publishers allow authors to do the first thing [institution-based but not central self-archiving]: the APS, for instance, changed its copyright transfer form a few years ago to make this perfectly legal. I think that EPS did the same. But sending a document to a more general server such as ArXiv is another matter, and this is not permitted - at least for the moment (APS does not allow it for instance). APS does not (yet) allow their *PDF files* to be self-archived in ArXiv, but it does allow the final, revised text to be self-archived. So this problem is trivial. http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0749.html http://forms.aps.org/author/copytrnsfr.pdf What is less trivial (because it is *perceived* by authors as a deterrent) is publishers' expressed opposition to 3rd-party (i.e., central) self-archiving. The simple and obvious solution is distributed institutional self-archiving, linked by the glue of OAI. Most private websites are not permanent; experience shows that they are often not updated, not stable, and that their url sometimes disappear after a few years. This is, by the way, why we need centralized structure to ensure long term preservation of all
Re: Central vs. Distributed Archives
[Thread: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0293.html] Dear Stevan Just a question of clarification. I have noticed that you lately recommend exclusively institutional eprint archives and not (inter)disciplinary archives. Why is that? What are the reasons for not recommending disciplinary archives? As you well know, the most successfull archive we have seen (arxiv.org) is disciplinary, and there are a few others on the way. If I am to guess, you might be thinking that authors can be pressured to place their papers in institutional archives by making it a condition in their employment contracts, or something similar. This pressure can also be applied in at least some kinds of disciplinary archives (such as http://orgprints.org), by way of making the condition in the research grant. And the motivation is straight forward: what the public pays for should be made publicly available. One possible benefit of (inter)disciplinary archives is that they can better support a kind of 'community feeling' (which a journal can also sometimes offer), and that this community feeling can help improve research communication. kind regards Hugo Alroe -Oprindelig meddelelse- Fra: Stevan Harnad [mailto:har...@ecs.soton.ac.uk] Sendt: 19. februar 2003 16:32 Til: american-scientist-open-access-fo...@listserver.sigmaxi.org Emne: Re: STM Talk: Open Access by Peaceful Evolution What researchers can and should do right now for OA is to self-archive their own refereed research output (Self-Archive Unto Others As Ye Would Have Them Self-Archive Unto You) in their own institutional Eprint Archives, rather than to keel scolding publishers for not doing it for them -- *especially* as publishers (e.g., Elsevier) are now coming round to recognizing their own responsible role in all this, by formally supporting author/institution self-archiving: http://www.lboro.ac.uk/departments/ls/disresearch/romeo/Romeo%20Publisher%2 0Policies.htm Stevan Harnad
Re: Central vs. Distributed Archives
On Mon, 24 Feb 2003, Hugo Fjelsted Alrøe wrote: [Thread: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0293.html] I have noticed that you lately recommend exclusively institutional eprint archives and not (inter)disciplinary archives. Why is that? What are the reasons for not recommending disciplinary archives? As you well know, the most successful archive we have seen (arxiv.org) is disciplinary, and there are a few others on the way. Both institutional self-archiving and central self-archiving are welcome and valuable contributions to open-access. Moreover, because of OAI-compliance, they are all interoperable. So the short answer is that it makes no difference. But there is a bit more: Strategically, several years ago, I could see no reason why large central archives like the Physics ArXiv should not subsume all of the literature, in all disciplines. But gradally two problems become apparent, along with their solutions: Problem 1: ArXiv itself, though the biggest, is still growing too slowly, even in Physics: It is growing linearly, which means it will still be another decade before we arrive at a year when *all* of that year's physics publications are self-archived. http://arxiv.org/show_monthly_submissions Problem 2: The central-archiving of ArXiv was generalizing even more slowly to other disciplines: CogPrints (at 5+ years), another central archive, still only has about 1500 papers, compared to ArXiv's (at 11+ years) 200,000. http://cogprints.ecs.soton.ac.uk/ http://www.earlham.edu/~peters/fos/timeline.htm Solution 1: The Open Archives Initiative in 1999 provided an interoperability protocol that effectively made all compliant archives equivalent, whether they were central or institutional. http://www.openarchive.org Solution 2: What is needed to accelerate self-archiving is an *incentive*, and it is clear that that incentive is something that is shared by a researcher and his own institution, not a researcher and his discipline or a central archive. http://software.eprints.org/#ep2 The purpose of self-archiving is to maximize the visibility, accessibility, usage and impact of one's research. In a word, to maximize research impact. The benefits of research impact are shared by researchers and their institutions. It is one of the main factors in determining salaries, promotion tenure, research-funding, prizes and prestige. These are all shared interests for researchers and their institutions. They are behind the publish or perish injunction. This means that the institution is not only a natural ally in self-archiving, but it can even be the provider of the carrot and the stick, as an extension of exactly the same considerations as those underlying publish-or-perish: Maximize research impact. http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.ppt http://www.ecs.soton.ac.uk/~harnad/Temp/unto-others.doc It is for this reason that I think institutional self-archiving holds greater promise for propelling open-access to critical mass than central archiving -- or, as the effect is additive, I should really say: than central archiving alone. If I am to guess, you might be thinking that authors can be pressured to place their papers in institutional archives by making it a condition in their employment contracts, or something similar. This pressure can also be applied in at least some kinds of disciplinary archives (such as http://orgprints.org), by way of making the condition in the research grant. And the motivation is straight forward: what the public pays for should be made publicly available. I agree. And both of these pressures are welcome. But the institutional self-archiving solution is more general, and pan-disciplinary. It is easier to create and fill institutional archives (using local carrots and sticks) than to create a central archive for each discipline and get all researchers to fill it. Institutional self-archiving also benefits from a wider institutional interest in making institutional digital output and holdings (not just refereed research) openly accessible (though I confess that this double mandate has been a 2-edged sword, also causing confusion about what the target contents of institutional archives should be, and thereby slowing rather than hastening the self-archiving of refereed research output). I would say that when an institution has adopted a policy of mandatory self-archiving for all its researchers, it is easier and more general to also provide the local archives to do it in, rather than to rely on their being spawned and sustained by some external central entity for each discipline. The policy is then also a uniform, self-conained and self-sufficient one, whereas self-archive somewhere would have been too vague and would not fit most disciplines yet (rather the way publish in an open-access journal would be a premature injunction in most disciplines and specialties today). Last, there is a link between self-archiving and research
Re: Central vs. Distributed Archives
dear Stevan, thanks a lot for your somehwat summary of the topic up to now. I agree with what you say. All paths leading to the same destination. Indeed, we work on all three lines: encourage the authors, the institutions to set up selfarchiving with our help or gate or not and promote central archives. I now daw you img files . Ebs
Re: Central vs. Distributed Archives
The current topic thread begins with: Central vs. Distributed Archives http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0950.html See also the earlier thread: Central vs. Distributed Archives http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0294.html On Sat, 17 Nov 2001, Eberhard R. Hilf wrote: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/1655.html eh Steve said the only way is using OAi-compliance by the author to eh self-archive his documents before and through refereeing. eh eh The word only is too much of a load. eh eh In Physics (and Mathematics) since a long time authors can self-archive eh their documents, without having to install any software or learn about eh OAi. They are automatically included into the OAi scheme by the eh OAi compliant service providers by using PhysDoc (or Math-Net) as gateways eh who take care of their document being included. My comrade-at-arms Ebs Hilf has misinterpreted the sense of my only. He is of course quite right that central, discipline-based self-archiving (in OAI-compliant Eprints Archives) is likewise an effective and welcome form of self-archiving. However, as I wrote in the very next posting: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/1654.html sh The Physics Archive [http://arxiv.org], for example, has over 150,000 sh articles, but cumulated across 10 years! At that rate, even for this sh most advanced of all the self-archiving disciplines, the year 2011 will sh be the first in which ALL the articles published in physics that sh year will be accessible for free for all: sh sh http://www.ecs.soton.ac.uk/~harnad/Tp/Digitometrics/img001.htm sh sh http://www.ecs.soton.ac.uk/~harnad/Tp/Digitometrics/img002.htm sh sh This is why institution-based self-archiving now needs to be vigorously sh supported and promoted to fast-forward us all to the optimal and sh inevitable for research and researchers. It was with this fact in mind that I had written written the earlier only passage: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/1653.html sh The only sure way to free access to the entire refereed research sh literature online, right now, is for researchers themselves to take the sh initiative and self-archive it (in their own institutions' OAI-compliant sh Eprint Archives: http://www.arl.org/sparc/pubs/enews/aug01.html#6 ) The force of the only was coupled with the sense of the right now! A researcher in any particular discipline today (other than Physics, Mathematics, or Cognitive Sciences) cannot take the initiative and self-archive his refereed research in a central archive for his discipline, because such central archives do not yet exist for most disciplines! Nor, where they to exist, are they filling anywhere near fast enough (see the 2 Digitometrics links above). Researchers' individual (and thereby collective) leverage (and rewards for publication and impact) operates largely at the level of their own institutions. Researchers need not install any software themselves, nor learn anything about OAi. They need only encourage their own universities to do so, out of shared self-interest in research visibility, uptake and impact: 7. What you can do now to free the refereed literature online http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm#7 Online or Invisible? (Steve Lawrence) http://www.neci.nec.com/~lawrence/papers/online-nature01/ By way of OAI-interoperable central Eprint Archives, physicists and mathematicians today have http://arxiv.org and Ebs's PhysDoc (or Math-Net) http://physnet.uni-oldenburg.de/PhysNet/physdoc.html and Cognitive Scientists have http://cogprints.soton.ac.uk/ But for all the other disciplines, the fastest and surest path today is to have their own institutions install their own OAI-compliant Institutional Eprint Archives (using the free http://www.eprints.org software) as a growing number of universities and research institutions are now doing: Institute of Education, University of London, London, England University Library System, University of Pittsburgh http://philsci-archive.pitt.edu Centre pour la Communication Scientifique Directe http://eprinttheses.in2p3.fr Media Studies, University of Ulster, Coleraine, Northern Ireland Formations Media Studies Archive http://formations2.ulst.ac.uk/ California Institute of Technology http://caltechcstr.library.caltech.edu/ Instituto Brasileiro de Informacao em Ciencncia e Tecnologia http://www.sbg.ibict.br Institut Jacques Monod, Paris Department of Philosophy, University of Vienna http://eprints.philo.at University of Southampton, Southampton, UK http://demoprints.eprints.org/ RIACS, NASA Ames, Moffett Field CA http://horus.riacs.edu University of Nottingham, Nottingham http://www-db.library.nottingham.ac.uk/ep1 University of Rochester Libraries http://128.151.45.180/ Sissa Multimedia Database http://mmdb.sissa.it/ University of California Digital Libraries http://www.escholarship.cdlib.org
Re: Central vs. Distributed Archives
On Fri, 2 Feb 2001, Greg Kuperberg wrote: On Sun, Dec 31, 2000 at 09:57:50PM +, Stevan Harnad wrote: http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm Physicists have already shown the way, but at their current self-archiving rate, even they will take another decade to free the entire Physics literature Of course you are entitled to your opinion that institution-based open archiving (sorry, I won't call it self-archiving) is the bugle call of the revolution. Terminology is terminology, but calling one's own archiving of one's own papers self-archiving sure sounds like calling a spade a spade... Besides, the Open Archives Initiative (OAI http://www. openarchives.org) has informed me in no uncertain terms that I should NOT characterize self-archiving as open-archiving or vice versa. The OAI is a much broader initiative than the self-archiving initiative. OAI is dedicated to providing shared interoperability standards for the entire on-line digital literature, whether self-archived or not, whether for-free or for-fee, whether journal, book or other, whether full-text or not, whether centralized or distributed. It is true that the OAI was originally proposed as the UPS (Universal Preprint Service), which was indeed a form of self-archiving (though a limited form, focussing on the unrefereed preprint rather than on both the unrefereed preprint and the refereed postprint, as self-archiving does). But UPS was quickly dropped and the OAI has since vastly outgrown those limited original objectives. In my opinion, institution-based archives are, o in physics, all but superceded by the arXiv, On-Line archives (apart from the Physics arXiv) are all but non-existent. The hope is that institution-based, distributed self-archiving (perhaps with the newfound help of the http://www.eprints.org archive-creating software) will now remedy this. And, as I said above, even in Physics, self-archiving is still growing too slowly to free the Physics literature in less than a decade. It seems to me that the central self-archiving model, admirable and welcome though it is, can use all the help it can get. o in mathematics, a politically appealing distraction, and I have no idea why you mention politics. The only appeal is to researchers, that they should free their refereed research from their obsolete access- and impact-barriers by self-archiving it, now. I have no political preference for their doing it the central way or the distributed way: We should all just go ahead and DO it! I used to lean towards central self-archiving myself, seeing no reason why it should not all be subsumed under arXiv; but that just isn't happening, and the clock is ticking; so it's time to add more powerful and general means of self-archiving. Besides, the whole point of OAI-compliance and interoperability is that it should no longer MATTER which way you self-archive: centrally or institutionally. It's all harvestable into the same global virtual archive anyway, thanks to the OAI protocol. Unless one's political objective becomes, publisher-like, to protect one's own proprietary (centralized?) turf instead of to free the research literature... o in computer science and economics, the inadequate status quo. I have no idea what you mean by the above. As I said before, I know that NCSTRL and RePEc, which are the efforts in computer science and economics to make institutional archives interoperable, are important major projects. I don't mean to slight them. But they are not a panacea and they do not match the arXiv. Nobody is trying to match anything. We are trying to free the research literature, as quickly and as effectively as possible. Computer science has a second important project, ResearchIndex/CiteSeer, which has some good features that the arXiv does not. But (a) it doesn't match the arXiv either, (b) it relies on search engine intelligence and not bureaucratic standards, and (c) an arXiv search facility could be made as intelligent as CiteSeer. I really can't follow any of this, and I have no idea who you think is competing with whom for what: ResearchIndex/CiteSeer is a wonderful tool, harvesting and citation-linking papers on the Web, whether in OAI-compliant archives or not. As the OAI-compliant corpus grows (with the growth of central and distributed self-archiving), ResearchIndex/CiteSeer's harvest will grow, and surely we all welcome that! I don't know what you have in mind with bureaucratic standards, but you need not sell me on search-engine intelligence: I love it already. Moreover, as the OAI-compliant corpus grows, it will spawn still further and more powerful Open Archive Service Providers (e.g., OpCit http://opcit.eprints.org and ARC http://arc.cs.odu.edu/). But the main goal now is to do whatever can be done to make that corpus grow into the full refereed literature in all disciplines as soon as possible. This is not the time to squabble over who has the best
Re: Central vs. Distributed Archives
On Sat, Feb 03, 2001 at 10:28:19AM +, Stevan Harnad wrote: Terminology is terminology, but calling one's own archiving of one's own papers self-archiving sure sounds like calling a spade a spade... In my opinion, if I submit a paper to the arXiv or to a hypothetical UC Davis archive, that is them archiving my papers, not me archiving my own. The arXiv has a technical staff, admittedly small, and you could fairly call the staff members archivists. The authors are not archivists. Besides, the Open Archives Initiative (OAI http://www. openarchives.org) has informed me in no uncertain terms that I should NOT characterize self-archiving as open-archiving or vice versa. I suspect that that's because you don't take into account considerations that they consider important. In any case in your paper you do still imply that the arXiv is an example of self-archiving. Anyway, my *main* comment last time is that you don't even mention these points of disagreement in your article. Your article has the bias that if people agree with you on the ends, it doesn't matter if they agree with you on the means. On-Line archives (apart from the Physics arXiv) are all but non-existent. That's not true at all. In mathematics alone the AMS has a list of 60+ department-based and research-institute-based archives, http://www.ams.org/global-preprints/dept-server.html and 16 subdiscipline-based archives, http://www.ams.org/global-preprints/special-server.html Maybe a dozen of these independent archives are bigger, as measured by new submissions per month, than your CogPrints archive. The biggest one, mp_arc, gets 30 new papers a month. If you put them all together they are comparable in size to the math arXiv. But they're not growing as quickly as the math arXiv, not even those in Germany that enjoy an interoperable metadata standard and a common search engine called MPRESS, http://mathnet.preprints.org . MPRESS even includes everything in the math arXiv. MPRESS can be useful, but it is not the panacea that you seem to expect it to be. o in mathematics, a politically appealing distraction, and I have no idea why you mention politics. Because deciding who gets to maintain the archives is political. People get service credit for it and they don't want to give that up. Some of the Europeans don't trust projects that they perceive as American. In mathematics, the numerous institution-based archives tend to satisfy administrators more and readers less. They are useful, but they grow less quickly than the arXiv because they are less useful. They aren't by any means the arXiv's savior. Besides, the whole point of OAI-compliance and interoperability is that it should no longer MATTER which way you self-archive: centrally or institutionally. It's all harvestable into the same global virtual archive anyway, thanks to the OAI protocol. There lies MPRESS, the global virtual archive in mathematics, and it still does matter. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Sat, 3 Feb 2001 Greg Kuperberg g...@math.ucdavis.edu wrote: if I submit a paper to the arXiv... that is them archiving my papers, not me archiving my own. Sorry, Greg, I don't find these details useful. This is terminological niggling. (As long as we're at it, I prefer the word depositing to an archive, because I submit to a journal.) The arXiv has a technical staff, admittedly small, and you could fairly call the staff members archivists. The authors are not archivists. And authors are not publishers either. Yet it is quite common to say I've published that paper. What was needed was a term to describe the act of depositing a paper into a free on-line archive for yourself, rather than relying on someone else (e.g., a publisher) to do it for you. Self-archiving describes that quite transparently. (If I had to vote on it, I'd say most of the work of archiving itself was being done by the software and the hardware, not the staff. But the supporting staff are certainly essential, as they are even for personal web-pages...) in your paper you do still imply that the arXiv is an example of self-archiving. And so it is. Authors can self-archive in centralized OAI-compliant archives like arXiv or distributed institutional OAI-compliant archives like the ones being set up using eprints.org software. Anyway, my *main* comment last time is that you don't even mention these points of disagreement in your article. Your article has the bias that if people agree with you on the ends, it doesn't matter if they agree with you on the means. Well it seems to me that in my article (1) I recommend self-archiving to free the refereed research literature, and (2) I recommend self-archiving in distributed institutional OAI-compliant Archives to complement self-archiving in centralized OAI-compliant Archives. Now in recommending this, what exactly do you think I should add? That there are some people who think it's not worth complementing the former with the latter? that they think we should just carry on with the former as if there were no new possibilities for broadening and accelerating the growth of self-archiving? Why would I want to say that? Why would anyone want to say that? On-Line archives (apart from the Physics arXiv) are all but non-existent. That's not true at all. In mathematics alone the AMS has a list of 60+ department-based and research-institute-based archives, Perhaps I should have said interoperable OAI-compliant archives. And if they exist, that's splendid. I hope there will be many more. Maybe a dozen of these independent archives are bigger, as measured by new submissions per month, than your CogPrints archive. The biggest one, mp_arc, gets 30 new papers a month. If you put them all together they are comparable in size to the math arXiv. Good. Let them go OAI-compliant (perhaps by installing eprints.org software!) and they will be making a valuable contribution to freeing the refereed research literature (assuming they are not just for unrefereed preprints!). But they're not growing as quickly as the math arXiv So what? I have no idea why you mention politics. Because deciding who gets to maintain the archives is political. People get service credit for it and they don't want to give that up. Pity. Especially if it ever engenders a conflict of interest (as it has done in journal publishing) between what's in the best interest of research and researchers (maximizing free access) and what's in the interests of archivists. Some of the Europeans don't trust projects that they perceive as American. In mathematics, the numerous institution-based archives tend to satisfy administrators more and readers less. They are useful, but they grow less quickly than the arXiv because they are less useful. They aren't by any means the arXiv's savior. Make 'em all OAI-compliant and it will no longer make a bit of difference... Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southamptonhttp://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM NOTE: A complete archive of the ongoing discussion of providing free access to the refereed journal literature online is available at the American Scientist September Forum (98 99 00 01): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html You may join the list at the site above. Discussion can be posted to: american-scientist-open-access-fo...@amsci.org
Re: Central vs. Distributed Archives
Greg, I honestly don't know what the substantive issue is that you are disagreeing with me about. We are both for freeing the research literature. We are both for self-archiving. We are both for interoperability. We both agree that the Physics arXiv was the first to show the way. We both agree that it would be good if the pace of self-archiving were accelerated. We both agree that it would be good if self-archiving spread to all disciplines. So what is at issue here? That I have suggested that distributed OAI-compliant self-archiving may help accelerate and spread self-archiving whereas you think it won't? Well let's just wait and see. You seem to have some reason for wanting to nip distributed self-archiving in the bud, a reason that I can't fathom. Could it be because it is competing with arXiv in mathematics? Who cares? Self-archiving is self-archiving, and free is free. As for interoperability, the reason I stress it is that that is what will make the locus-differences between the individual archives irrelevant. It will all be harvested into global virtual archives, and those, not the individual archives, will be the locus classicus for the research literature. On Sat, 3 Feb 2001, Greg Kuperberg wrote: You don't just recommend institution-based archives, you hype them as superior to discipline-based archives. You describe them as a powerful and natural complement that you hope will broaden and accelerate the self-archiving literature. I think you should add, more clearly than you have, that that part is only your opinion, and not that of the physicists and others who have shown the way. Greg, it seems to me hope is already at least as subjective and hypothetical a descriptor as opinion. Nor does hope equal hype. Nor do I say anything about superior. I simply state the facts (and hopes). The facts are that it started in Physics, in the form of centralized self-archiving; but this is only growing linearly and not generalizing across disciplines. Enter OAI-interoperability and the possibility of complementing central self-archiving with distributed self-archiving. Why, one wonders, would any disinterested party (or rather, one with an interest solely in freeing the literature, not in characterizing one form of self-archiving as superior) fail to welcome a complementary form of archiving, rather than trying to dismiss it as hype and opinion, or as contrary to the opinion of physicists? The freeing of their present and future refereed research from all access- and impact-barriers forever is now entirely in the hands of researchers. Posterity is looking over our shoulders, and will not judge us flatteringly if we continue to delay the optimal and inevitable needlessly, now that it is clearly within our reach. Physicists have already shown the way, but at their current self-archiving rate, even they will take another decade to free the entire Physics literature (http://www.ecs.soton.ac.uk/~harnad/Tp/Tim/sld002.htm) -- with the Cognitive Sciences (http://cogprints.soton.ac.uk) 39 times slower still, and most of the remaining disciplines not even started: http://www.ecs.soton.ac.uk/~harnad/Tp/Tim/sld004.htm This is why it is hoped that (with the help of the eprints.org institutional archive-creating software) distributed, institution-based self-archiving, as a powerful and natural complement to central, discipline-based self-archiving, will now broaden and accelerate the self-archiving initiative, putting us all over the top at last, with the entire distributed corpus integrated by the glue of interoperability (http://www.openarchives.org). sh Perhaps I should have said interoperable OAI-compliant archives. sh And ir they exist, that's splendid. I hope there will be many more. This sounds like the Western leftists who insisted that China and the Soviet Union didn't practice true Communism. If it is utterly irrelevant that many of the mathematical archives are interoperable and DC-compliant, why will making them interoperable and OAI-compliant make all the difference? Granted, the OAI group may have made a better standard than the Dublin Core. It's still insane to dismiss one as paganism and embrace the other as gospel. Greg, I don't care! One of the purposes of interoperability is to make sure it can all be harvested into global virtual archives like ARC http://arc.cs.odu.edu/ thereby making the individual archive locus irrelevant (and empowering distributed archiving). If DC-compliance is enough to vouchsafe that, that's fine with me! Let 1000 flowers bloom! *You* (not the Western leftists) are the one who seems to have some sort of animus against these other archives! And I think we are beginning to repeat ourselves (again). We have bet on our respective horses. Can we now wait and see how they do in the self-archiving sweepstakes? (I have the advantage that I win either way, just as long as they
Re: Central vs. Distributed Archives
On Wed, 8 Nov 2000, Greg Kuperberg wrote: While libraries certainly should help preserve e-prints, I do not trust any one library, nor any other sole institution, to archive material single-handedly. Any caretaker can lose or destroy a unique copy of any document... That is why it is important to redundantly and openly mirror an archive and not just allow third-party searches. The arXiv has 18 mirror sites on six continents Who is disagreeing with this? All requisite redundancy is just as desirable, and feasible, and inevitable, with institution-based distributed archiving as with discipline-based archiving. I think there is an incorrect analogy at the heart of Greg's frequent use of the term fragmented in speaking about the institution-based approach to self-archiving: I think Greg continues to equate (1) archiving with publishing, and (2) institutional digital collections with localized books-on-shelves (ripe for a Library-of-Alexandria catastrophe; hence his example of the lost/destroyed unique document). And (3) (unrefereed, unpublished) PREprints continue to be treated as the paradigm for it all, whereas it is much more informative and representative to see it in terms of (refereed, published) POSTprints: We are, after all, aiming at freeing the REFEREED literature -- with the prepublication embryological stages merely an added bonus, rather than the focus of it all. So, to summarize: Whilst, our refereed papers are already, as they are, safely in the hands of journals and libraries, blissfully mirrored (though unblissfully unfree), we need not fret about Alexandria. Freeing a postprint (sic) via self-archiving (whether central or institutional, interoperable or not) is a bonus, a plus, a freebie, a way to make it accessible to those multitudes worldwide who cannot access it because of the S/L/P firewalls surrounding the safe, Alexandria versions. It is inviting Zeno's Paralysis (again) to say: Keep waiting till you have an Alexandria-proof centralized, mirrored, redundant arXiv-style Archive to self-archive them in before you dare to self-archive your (already safely mirrored) postprints. Nay! Release them from their hostagehood behind obsolete, impact-blocking, and completely surmountable access barriers online today through self-archiving, addict fellow-researchers the world over to that new, free form of access to it all, and the redundancies and mirrors will come tomorrow, in plenty of time to keep the freed corpus aloft in the skies. (And nothing is at risk: the firewalled version remains as safe -- from catastrophic loss as well as illicit access -- as it ever was.) If that is now transparent for postprints, it should be equally transparent that the same applies to preprints: They are destined to become postprints (hence secure, for the above reasons) anyway. Being available online early is a bonus; a freebie. Moreover, it is bonus that has no prior history of enjoying the safe/secure status of postprints anyway: access to preprints was always restricted and evanescent, destined to be superseded by the secure postprint once it was available. Now the redundancy and mirroring that will be accorded the freed postprint corpus, once it is freed, will also be inherited by the preprint corpus. So there is nothing to lose, and everything to be gained, by self-archiving all preprints and postprints now, in either the centralized OAI-compliant (http://www.openarchives.org) archives like arXiv (http://arXiv.org), or in institutional OAI-compliant archives, like Eprints (http://www.eprints.org). Ignore Cassandras: Preservation problems are eminently soluble, once the goods are up there: the real problem now is how to get researchers to put them up there, at long last. Central archives have gone part of the distance but are proving too slow. Institutional archives are natural allies in hastening us on the road to the optimal and inevitable. As a rule, it is better for web sites to share the same archive than to each have fragments. It is better for Oxford and Cambridge to each have all of Shakespeare's plays than for Oxford to have only the comedies and Cambridge to have only the tragedies. That is why I favor shared interoperability, which is in some ways centralized, to fragmented interoperability, which is optimistically called decentralized. Massive redundancy is one of the few strengths of the existing paper-based system; I am not an expert on digital storage, coding or preservation, but I am not at all sure that Greg is technically right above (and I'm certain that the Oxford/Cambridge hard-copy analogy is fallacious). I would like to hear from specialists in localized vs. distributed digital coding, redundancy, etc. -- bearing in mind that in the case of the refereed literature, this is all moot anyway, because free access now, is infinitely preferable to no access, no matter how short-lived it risks being. The locus classicus is still safely ensconced behind the toll
Re: Central vs. Distributed Archives
Greg Kuperberg writes But I disagree entirely with the claim that distributed interoperability has never been tried before. It has been tried several times, whole-heartedly with these two projects: MPRESS - mathnet.preprints.org NCSTRL - ncstrl.org And it has been a factor in many other projects, including Hypatia and the AMS preprint server. Some of these projects are more successful than others, but *all* of them suffer from inconstancy of the underlying archives. The largest project that has been done with a distributed interoperability is RePEc. RePEc catalogs 11 items now. While there is the occasional case that an archive my become obsolete, from about 140 archives, I think 5 have been made obsolete, i.e. have been moved to a place outside the original archive maintainer's control. Thus while it is problem, it is not a minor one. It is by far outweight by other advantages, such as distributed costs, minimum quality control, and wide community partipation. Cheers, Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel 2000-10-05 to 2001-01-06: Institute for Economic Research / Hitotsubashi University 2-1 Naka / Kunitachi / Tokyo 186-8603 / Japan / +81(0)42 580 8349 tho...@micro.ier.hit-u.ac.jp
Re: Central vs. Distributed Archives
Steve, I think you misunderstand Greg's concern (and mine) We do not disagree with what you want to do; we want to add to it. We are assuming, I think, that something similar to the plan you advocate will be the basic process. I do not think it enough to say distributed=secure. It's only the first step to security. In addition to being distributed, there also needs to be a reliable caretaker--not just to do the housekeeping, but to ensure that the archive is kept compatible with changing technology. I suggested that the archives be organized redundantly both by discipline and by university (and possibly by geographic/political entity, as well as what anyone wants to do). There are undoubtedly well-organized academic departments that can do this. There are also academic departments that cannot be relied on to do this right, because of size, interest, or finances. The same goes for professional societies. Certainly no individual can be relied on: all humans are mortal. All of this goes as well for refereed as for unrefereed, preprint as for reprint, officially published as for unpublished. As a librarian, I do not assume it is good enough that our refereed papers are already, as they are, safely in the hands of journals and libraries, ... There are very few library copies of many journals, and though there is excellent backup from national libraries, even their collections are incomplete. The literature published up to now will be much more secure when it too has been digitized and placed on free publicly available mirrored servers, with all the additional precautions. Besides security, this will also make them generally available with all the additional advantages of plans such as yours.
Re: Central vs. Distributed Archives
Greg: As a rule, it is better for web sites to share the same archive than to each have fragments. It is better for Oxford and Cambridge to each have all of Shakespeare's plays than for Oxford to have only the comedies and Cambridge to have only the tragedies. That is why I favor shared interoperability, which is in some ways centralized, to fragmented interoperability, which is optimistically called decentralized. Massive redundancy is one of the few strengths of the existing paper-based system; Stevan: I am not an expert on digital storage, coding or preservation, but I am not at all sure that Greg is technically right above (and I'm certain that the Oxford/Cambridge hard-copy analogy is fallacious). I would like to hear from specialists in localized vs. distributed digital coding, redundancy, etc. -- bearing in mind that in the case of the If I may separate the political issues from the technical. Political: There is a fear that a decentralised system will result in no overall responsibility for archive continuity. But, equally, a centralised body can decide that a system is no longer useful or is too expensive to be free - what happens if XXX goes pay-per-view? What rights do mirrors have to store XXX if they are told to remove their archive? Technical: The fear is that there will be only one copy of a paper stored in an institution department or library and if that archive is lost that paper disappears into digital oblivion. Data storage is very cheap - there is little difference between storing 1 or 100 copies. Oxford and Cambridge could farm all world physics archives and store their contents. This is not currently done because Open Archives include pay-per-view archives, where only the abstract can be farmed - and hence there is no provision for farming of texts. I may also point out that there are already archives that perform distributed mirroring - math arXiv is primarily made up of papers that have been archived elsewhere (judging by the lack of associated meta data and updates). Tim Brody Computer Science, University of Southampton email: tdb...@soton.ac.uk Web: http://www.ecs.soton.ac.uk/~tdb198/
Re: Central vs. Distributed Archives
On Thu, 9 Nov 2000, David Goodman wrote: Steve, I think you misunderstand Greg's concern (and mine) We do not disagree with what you want to do; we want to add to it. We are assuming, I think, that something similar to the plan you advocate will be the basic process. I do not think it enough to say distributed = secure. It's only the first step to security. In addition to being distributed, there also needs to be a reliable caretaker--not just to do the housekeeping, but to ensure that the archive is kept compatible with changing technology. I agree completely. I didn't say distributed = secure (there's a lot more to security than that). I said being freely accessible now, in distributed institutional Eprint archives is a powerful new way to complement being freely accessible in centralized Eprint archives, which are still growing much too slowly. It should not be delayed for one moment by security concerns, not one moment. I suggested that the archives be organized redundantly both by discipline and by university (and possibly by geographic/political entity, as well as what anyone wants to do). Again, complete agreement. There are undoubtedly well-organized academic departments that can do this. There are also academic departments that cannot be relied on to do this right, because of size, interest, or finances. The same goes for professional societies. Certainly no individual can be relied on: all humans are mortal. All of this goes as well for refereed as for unrefereed, preprint as for reprint, officially published as for unpublished. Agreed, and digital librarians are clearly the pertinent experts. As a librarian, I do not assume it is good enough that our refereed papers are already, as they are, safely in the hands of journals and libraries, ... Yes, but let us not again mix up agendas. There could have been -- independent of any movement to free the refereed literature online -- a movement to increase the security of the on-paper corpus (both papers and books) on-line. That's fine, desirable, but unrelated to this Forum's agenda, which is to FREE the refereed corpus online. Concerns about strengthening the paper literature's current security should not be wrapped into the freeing (now!) initiative for the refereed literature; nor should freeing (now!) be made in any way conditional on first meeting a priori security concerns. Although it is an oversimplification, it is best to treat the freeing initiative as a pure freebie, a windfall, over and above what we have already. We are talking about archiving, not publishing, an extra version of what is already published (on-paper). This face-valid, immediate goal should be kept as distinct from preservation concerns as it should be kept from peer-review-reform concerns (likewise worthy, but orthogonal, and indeed even at cross-purposes if yoked in any way to the freeing initiative). There are very few library copies of many journals, and though there is excellent backup from national libraries, even their collections are incomplete. The literature published up to now will be much more secure when it too has been digitized and placed on free publicly available mirrored servers, with all the additional precautions. Besides security, this will also make them generally available with all the additional advantages of plans such as yours. David, the securing issue is a separate one from the freeing! The material on the shelves now is not free; nor is it, let us agree, as secure as it might be. Increasing its security by distributed digital back-up is one thing (and need not be freely accessible either); freeing it online is quite another. Please, please keep these two separate or you will only encourage more Zeno's Paralysis! Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southamptonhttp://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM NOTE: A complete archive of the ongoing discussion of providing free access to the refereed journal literature online is available at the American Scientist September Forum (98 99 00): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html You may join the list at the site above. Discussion can be posted to: american-scientist-open-access-fo...@amsci.org
Re: Central vs. Distributed Archives
On Thu, Nov 09, 2000 at 11:16:11AM +, Stevan Harnad wrote: Nay! Release them from their hostagehood behind obsolete, impact-blocking, and completely surmountable access barriers online today through self-archiving, addict fellow-researchers the world over to that new, free form of access to it all, and the redundancies and mirrors will come tomorrow, in plenty of time to keep the freed corpus aloft in the skies. Entirely aside from whether your proposals are the best ones, you have previously described them as being nothing other than the Ginsparg model. Well I think of myself as devoted to the Ginsparg model, but my interpretation of it is significantly different from the one that you give here. In 1997 my thinking was much more like yours, but three years of direct experience with the arXiv has changed it. My creed is, build a large, integrated, immortal archive now, and the e-prints will come tomorrow. I won't insist that this approach is right for your discipline, because maybe you know your own community better than I do. But I do feel strongly that it is right for my discipline. And I can't speak for Paul Ginsparg either, but I would be surprised if he contradicted me outright, since he has influenced my thinking a great deal through direct correspondence. In general your liberation terminology doesn't sit so well with me. I do hint at liberation terminology from time to time; in fact the name of my front end, Front for the Mathematics arXiv, is a deliberate allusion. If the math arXiv is revolutionary, I would liken it to the American revolution. We are building a new system on new territory and letting immigrants come. I see a lot of Alexander Hamilton in our approach, and somewhat less of Thomas Jefferson. Your comments have some character of Jefferson, but very little of Hamilton, and often they sound almost Marxist. I might compare your overall vision to the Communards of Paris. But hey, you could be right in your own society. You have also correctly picked up that I don't accept the dichotomy between preprints and postprints. My view is that the preprint and the postprint are Tweedledum and Tweedledee. But that is a topic for another posting. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Thu, Nov 09, 2000 at 05:58:14PM +, Tim Brody wrote: I may also point out that there are already archives that perform distributed mirroring - math arXiv is primarily made up of papers that have been archived elsewhere (judging by the lack of associated meta data and updates). I don't understand this comment. Most of the papers in the math arXiv are eventually published, and many are in preprint series of one sort or another. However I conjecture that at least half of the submissions in the most recent three months are not on any other web site, not even on a home page. And for those that are not published or not yet published, the arXiv is the only project that explicitly promises to keep them permanently. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Thu, Nov 09, 2000 at 07:16:47PM +, Stevan Harnad wrote: I don't think sublinear or linear growth is right for your discipline (maths) either... Of course more growth is better than less. Several of us (both the arXiv staff led by Paul Ginsparg and the math advisory committee chaired by Dave Morrison, on which I serve) have worked hard to accelerate the growth of the math arXiv. I can report a partial victory. The archives that we glued together were at best growing linearly with a low slope and were showing some signs of sublinearity. After we put them together there was a discontinuous increase in new submissions, and linear growth commenced with a higher slope. I don't have a chart but the numbers are there at http://front.math.ucdavis.edu/math After we had changed so much, I was surprised that growth was still linear. (Paul Ginsparg wasn't surprised.) I now believe that linear growth in e-prints is inherent. But both the discontinuity and the one-time change in slope were heartening. That is a realistic goal when you change the system. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
This is what University Presses need to become -- the formatter, keeper, and distributor (with the university library) of the intellectual goods. If that were to happen, funded of course by the university, then the university could avoid paying twice (once to the researcher and twice to the publisher) for intellectual property. The university would also save money in the long term. I believe it will come to this model within the next five years. Thomas Bacher, Director, Purdue Press 1207 SCC-E, W. Lafayette, IN 47907-1207 (765)494-2038 Fax: (765)496-2442 www.thepress.purdue.edu Be at your life-long-learning best. Read from a University Press.
Re: Central vs. Distributed Archives
Departments are not the place, for exactly the reasons John explains. More than one of the academic depts. in more than one major university I have been affiliated with has managed to lose unique copies of Ph.D. theses, as well as every other possible type of item. I think this is an appropriate role for libraries in two dimensions: each university library should take the responsibility for all publications by its faculty and students, AND appropriate major libraries or groups of libraries could also take the responsibility for specific research areas that are not being otherwise covered. If university presses wanted to participate I think most libraries would welcome the partnership. The systems are inexpensive enough for redundancy to be affordable, and this might be one solution to the refereed/nonrefereed controversy. It only requires adequate cross-archive indexing. A part of the savings could be used to increase the number of librarians helping the other members of the university navigate the new system. Most users need help in navigating the present system (the higher the academic level the more likely they are to request it, because they know enough to realize they need it). They will need it all the more during the period of transition. Nothing in the prior course of human-developed systems gives reason to suppose they will need it less even after the transition is complete. (If the AI people think they can compete in this, I encourage them to keep trying.) John MacColl wrote: Greg Kuperberg wrote: So should we mathematicians trust individual math departments to permanently preserve their e-prints? I don't think so. Our own math preprint series at UC Davis is an arXiv overlay - all articles are automatically contributed to the math arXiv. One of my arguments for this arrangement is that we can't promise to babysit these preprints forever. We could easily forget our obligation. Stevan Harnad replied: The Department could easily forget; the institutional library is unlikely to do so. It has a lot of prior practise with stability/permanence! (And it has a good deal to gain from maintaining robust institutional Eprint Archives: The prospects of serials-crisis relief, as other institutional libraries do the same thing, with their own Eprint archives -- I would concur with this response, and would wish to develop a couple of points about why libraries are important in the freed literature scenario. Interestingly, the notion of 'forgetting' gives a new dimension to the notion of libraries as 'memory organisations'. They are no longer simply memory organisations in the sense of storing knowledge, as in a memory, but particularly as that knowledge becomes networked they are becoming organisers of access, for which function their contribution to their parent institution is to understand information structures, sources and presentations. This requires that they are memory joggers as well as memory fillers. That has always been true, but internet publication has increased both the complexity of these structures, and the rate of publication. More and more the challenge for academic libraries is to preserve the roles of hunters and collectors of knowledge in the age of internet publishing: that requires that they take a much more active approach to identifying and maintaining knowledge than was required in the age of print, when libraries had adapted to the culture of publishers, and had settled into a role which was primarily passive. But as Stevan says, interoperability in the world of eprint archives has not been tried before (and therefore cannot be criticised as the wrong model). More than that, it is at present the only model really capable of surviving in the world of internet publishing, and it conforms to the way librarians see publishing culture moving, which is why the library profession is so concerned with metadata - the key to the knowledge structures which are in transition. In the passive model, academics and researchers ordered books and journals via the library, and the library sought to ensure that the material which arrived in the form of physical product was organised optimally. Now, we find academics and researchers creating web sites with links to internet sources, and themselves interacting with such sources (as they will with open archives) without needing to act via the library. Our role as librarians is to keep pace with these changes and evolve new methods for providing not only 'permanence and stability', but also description and classification to ensure that sources are findable by other researchers, students and teachers. So - to take Greg's point about centralisation - whether an institution wishes to create an open archive for itself as an institution, or whether a single department wishes to do it, is a matter for them to decide, but either way it is in their interest to let the library know that
Re: Central vs. Distributed Archives
On Wed, Nov 08, 2000 at 12:30:39PM -0400, David Goodman wrote: Departments are not the place, for exactly the reasons John explains. More than one of the academic depts. in more than one major university I have been affiliated with has managed to lose unique copies of Ph.D. theses, as well as every other possible type of item. The fact is that most math papers on the web (excluding those in the arXiv) are on department, and not campus-wide, web servers. This is even true of papers that are organized into preprint series. One of the dangers of an interoperability approach is to hoist the e-print vision on such an accidental foundation. I also agree with John MacColl's position that libraries are more reliable archivists than departments in principle. But I disagree entirely with the claim that distributed interoperability has never been tried before. It has been tried several times, whole-heartedly with these two projects: MPRESS - mathnet.preprints.org NCSTRL - ncstrl.org And it has been a factor in many other projects, including Hypatia and the AMS preprint server. Some of these projects are more successful than others, but *all* of them suffer from inconstancy of the underlying archives. While libraries certainly should help preserve e-prints, I do not trust any one library, nor any other sole institution, to archive material single-handedly. Any caretaker can lose or destroy a unique copy of any document. (Just last year the Boston Public Library lost thousands of books in a flood, for example.) That is why it is important to redundantly and openly mirror an archive and not just allow third-party searches. The arXiv has 18 mirror sites on six continents, listed at: http://arxiv.org/servers.html That is not as many copies of the arXiv as I would like to see, although it is enough full-fledged active mirrors. More significantly anyone who wants to can maintain yet another copy of the arXiv following the instructions at: http://front.math.ucdavis.edu/scripted As a rule, it is better for web sites to share the same archive than to each have fragments. It is better for Oxford and Cambridge to each have all of Shakespeare's plays than for Oxford to have only the comedies and Cambridge to have only the tragedies. That is why I favor shared interoperability, which is in some ways centralized, to fragmented interoperability, which is optimistically called decentralized. Massive redundancy is one of the few strengths of the existing paper-based system; let's not tear up the road in addition to scrapping the horse carriage. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Tue, Nov 07, 2000 at 03:15:36PM +, Stevan Harnad wrote: So the answer is: Sure I'd have been happy to have CogPrints subsumed by arXiv if that had proved to be the way to get the entire refereed corpus online and free. But now it looks as if OAI-compliant distributed Eprint Archiving (including arXiv) will instead be subsumed into the global virtual Eprint Archive. I have learned not to claim that the arXiv is the Philosopher's Stone, much as I would like it to be. But if you're serious about merging with the arXiv, let's see how well OAI is doing in a year, as measured by the number of search queries at multiarchive OAS agents. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Fri, 3 Nov 2000, Greg Kuperberg wrote: It is not really a neutral statement to declare that it no longer matters whether a paper is in a central archive or a distributed one. Each archive is in a way an entrenched interest. Each archive maintainer has put a lot of work into his or her project, and therefore wouldn't want it assimilated into a larger archive without a very good reason. I am afraid I cannot follow this at all. Are you saying that the maintainer of a free public archive of refereed research has an interest in NOT having that research assimilated into still larger public archives, if it increases their visibility, accessibility and impact? (If there really do exist such entrenched archive-maintainer interests, they begin to resemble the conflict of interest that has emerged between researchers and journal publishers, when it comes to access-barriers to their work!) The maintainers I have in mind are those whose interest is in freeing this research from needless access/impact barriers, not in adding to them! In particular, neither universities who provide distributed institutional Eprint Archives for self-archiving the refereed research of their researchers, nor Learned Societies who do so for the sake of their disciplines, in a centralized archive, have anything to gain from preventing their respective archive contents from being harvested by Open Archive Services into still larger virtual archives, all seamlessly interoperable (e.g., http://arc.cs.odu.edu/). As to justifying access-barriers on the grounds that the archive maintainer has put a lot of work into his or her project, the Eprints software should now make that work so minimal that this dubious rationale becomes moot anyway: http://www.eprints.org This is overconfidence. The biggest reason that it is overconfidence is that it defers the permanence question. But there are other reasons as well. One is that one of the most useful features of the arXiv (and similar services such as CogPrints) is immediate notification of new results. There is no (not-readily-solvable) permanence question. At this point, getting the literature on-line and free is the most important thing to do, now. The collective interests that this will generate in KEEPING it all on-line and free will ensure that all proper steps are taken to ensure permanence. The OAI-compliant archive-creating/maintaining Eprints software has the same notification service as CogPrints -- indeed, it is a generic adaptation of the CogPrints software! http://cogprints.soton.ac.uk Another is non-redundancy: the arXiv almost completely eliminates the disarray of having many copies of a paper which may or may not be different versions. The OAI standard does not address, and perhaps cannot address, either of these important advantages of a centralized system. The OAI-standard has not yet addressed version control (it will) but the OAI-compliant Eprints Software has. Moreover, version-sorting is a natural function for an Open Archives Service that harvests all versions of a paper, and sorts them the way you like (date, archive, use, etc.) Such a service is a natural one to go hand in hand with citation-linking (which likewise has to sort versions): http://opcit.eprints.org interoperability keeps getting reinvented. The OAI protocol is steadily being optimized (and the OAI-compliant Archives with it): Is this a bad thing? Precedent suggests that if OAI succeeds, it will fade into a transparent layer, and that beyond it people will see incompatability at a new level and invent another standard. This sounds unduly pessimistic (and could be said against any attempt to create interoperability standards). HTTP is already an interoperability standard, originally invented for the purpose of distributing research documents. And there are already HTTP-based search engines, including CiteSeer, which searches only for research papers. So it's important explain how OAI would go beyond HTTP+CiteSeer. I suggest that this question be re-directed to the OAI discussion list, which is concerned with the technical details: u...@vole.lanl.gov http://vole.lanl.gov/pipermail/ups/ Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southamptonhttp://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM NOTE: A complete archive of the ongoing discussion of providing free access to the refereed journal literature online is available at the American Scientist September Forum (98 99 00): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html You may join the list at the site above. Discussion can be
Re: Central vs. Distributed Archives
On Mon, Nov 06, 2000 at 05:46:57PM +, Stevan Harnad wrote: I am afraid I cannot follow this at all. Are you saying that the maintainer of a free public archive of refereed research has an interest in NOT having that research assimilated into still larger public archives, if it increases their visibility, accessibility and impact? My position is borne entirely out of practical experience and not theory, and I am not saying exactly that. For a subject-based archive (as opposed to institutional), the maintainer has an interest in retaining credit for his efforts. He may also have at least a perceived interest in retaining control over the archival procedures. If an outside archive is assimilated into the huge arXiv, certainly it increases the visibility, accessibility, you-name-it-ability, of the individual papers. However the former maintainer's name may well fade into the background. At best asking a maintainer to merge with the arXiv is asking him to change his duties (if he stays on as an arXiv moderator or an overlay maintainer). At worst it's asking him to retire. The math advisory committee has had dozens of negotiations to merge material into the arXiv. We consider all such negotations to be delicate. After all, Stevan, suppose that we told you that CogPrints would be better off as part of the arXiv and you should surrender your collection and your responsibilities. Would you immediately agree, or would you want some time to think about it? Some might ask, what is there to decide about how to run an archive? For example, the arXiv's policy is that DVI is unreliable as an input format, although it does offer it as output. The arXiv requires TeX source for new submissions if they are written in TeX. There are other subject-based archives out there that accept *only* DVI as a submission format. The maintainers of these archives feel that TeX source is an unreliable input format, and moreover that TeX source is confidential for some authors. It is very difficult to defuse this seemingly minor issue, and it is only one of several such issues. For institutional preprint series the issues are a little different, but they are equally obstructive. Usually an institutional maintainer is less interested in retaining credit, but more concerned, sometimes correctly, about following his mandate. If we suggest to university U that they contribute their papers to the arXiv, the maintainer at U may say our faculty gave permission for me to list their papers in our preprint series, but not to contribute them to your arXiv. That can lead to yet another bureaucratic thicket. Right behind these superficial issues are more significant ones like permanence. The fact is that many institutional and subject-based archives do not want the responsibility of permanence. Some of them explicitly repudiate it. A standards-based virtual archive approach, such as OAI, aspires to please every side and sweep all such issues under the rug. I wonder if this is rushing in where angels fear to tread. There is no (not-readily-solvable) permanence question. At this point, getting the literature on-line and free is the most important thing to do, now. The collective interests that this will generate in KEEPING it all on-line and free will ensure that all proper steps are taken to ensure permanence. Again, experience tells me otherwise. Thousands of math preprints have come and gone on the web. Let me also give you a quote from a help page of a non-arXiv math archive: When your paper is ultimately published we would greatly appreciate being informed. At that time we will remove the preprint and leave a pointer to the journal in which it was published. This flatly contradicts your vision of freeing the literature. But OAI itself does not pass judgement on such policies. The OAI-compliant archive-creating/maintaining Eprints software has the same notification service as CogPrints -- indeed, it is a generic adaptation of the CogPrints software! Yes, but it *only* notifies the subscribers of that one little archive. The OAI standard leaves OAS agents with no clear notification mechanism, because there is no guarantee that the agent will be notified in a timely manner by the foundational archives. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Thu, 2 Nov 2000, Greg Kuperberg wrote: We have had much more success by moving in the opposite direction, i.e., by strengthening distributed open archival with a centralized foundation. And continued good success to the math arXiv project! But why restrict efforts to centralized ones only? The whole point of OAI interoperability is that it should no longer make any difference whether a refereed paper is archived in a central archive or a distributed archive or both! (The only alternative we want to avoid is neither!) By way of example of how it no longer makes any difference, CogPrints http://cogprints.soton.ac.uk is a centralized archive for cognitive science -- but it is using EXACTLY the same OAI-compliant Eprints architecture as is has been developed for distributed, institution-based archiving by http://www.eprints.org. In fact, the OAI-compliant Eprints software was DERIVED from the prior centralized CogPrints software! And institutions are institutions, whether they mount centralized archives or institutional archives. And mirroring and harvesting for reliability and permanence are available to both. So why keep repeating that centralized archiving helped accelerate math archiving more quickly than the prior (pre-OAI) distributed archiving? True, but things didn't stop there. And linear growth is still linear growth, whereas what we need is exponential growth, across all disciplines, if we are to reach the optimal and inevitable before we expire! So let 1000 flowers bloom, central and distributed. Interoperability will harvest them all. The MPRESS project (http://mathnet.preprints.org/) has a lot in common with OAI, and it was started before the universal math arXiv. It has its own metadata standard, Dublin Core, and its has a number of institutional preprint series among its data feeds. But it hasn't yet caught on. Maybe that was because it was going it alone, instead of distributing its efforts across disciplines, as the Open Archives Insitiative is doing. It's one thing to adopt a standard, quite another to get others to adopt it too. (This is why your advocacy of centralized archiving and anti-advocacy of distributed archiving is divisive and counterproductive: We should be supporting every effort that gets all the refereed literature up there, online, accessible, searchable, navigable, and free for all. Centralized archiving has not managed this alone, so let it now benefit from the help of Distributed Archiving!) It doesn't seem to make much difference to authors whether a preprint series is indexed by MPRESS or not. I don't understand this point. It may be another symptom of the conflation between publishing and archiving, and between preprints and postprints: What authors are choosing when they PUBLISH a paper, is a journal, i.e., a quality-certifier with a known level of quality, a trusted brand. What authors are choosing when they ARCHIVE their eprint -- whether the journal-certified, refereed POSTprint or the unrefereed PREprint -- is a means of making their paper maximally visible and accessible online, for free for all. OAI-interoperability provides that, provided the metadata-protocol is shared by all archives, irrespective of whether they are centralized or institutional. MPRESS apparently did not become such a universal (we might even call it distributed) standard. Perhaps this was in part because it did not inititially adopt OAI's strategy of minimalism: Pick the minimal functional metadata set, to maximize the ease of compliance, rather than going all the way to Dublin Core from the outset. (OAI is inching towards Dublin Core too, but thanks to minimalism and proselytising across disciplines, it may manage to bring everyone else along with it.) Part of the trouble with MPRESS is that not all of its sources are providing as good metadata as they promised. Ironically the lion's share of good metadata in MPRESS comes from the math arXiv. I would like to know where OAI thinks that MPRESS went wrong. In fact since I maintain a service provider for the math arXiv, I looked into using OA-compliant metadata instead of the ad hoc metadata that I get from the arXiv. I discovered that the OA standard is an oversimplification of the full arXiv metadata record, to the point that I can't use the OA format. I will have to leave this to OAI experts to reply to. But don't get me wrong. I am in favor of fragmented interoperability if you really can't hope for something better. And as I said, the overall STM literature might well have to be fragmented, for now, down to the level of individual disciplines (e.g. chemistry) or small groups of disciplines (physics+math+cs). Fragmented interoperability is a tautology: The whole point of interoperability is shared metadata standards unifying distributed (fragmented) systems. As to hopes: The only pertinent hope is the freeing of the entire refereed literature online. Centralized self-archiving alone
Re: Central vs. Distributed Archives
On Fri, Nov 03, 2000 at 08:24:44AM +, Stevan Harnad wrote: But why restrict efforts to centralized ones only? The whole point of OAI interoperability is that it should no longer make any difference whether a refereed paper is archived in a central archive or a distributed archive or both! (The only alternative we want to avoid is neither!) It is not really a neutral statement to declare that it no longer matters whether a paper is in a central archive or a distributed one. Each archive is in a way an entrenched interest. Each archive maintainer has put a lot of work into his or her project, and therefore wouldn't want it assimilated into a larger archive without a very good reason. So saying that it no longer matters whether it is centralized or distributed is like saying that it no longer matters whether states answer to Washington. This is overconfidence. The biggest reason that it is overconfidence is that it defers the permanence question. But there are other reasons as well. One is that one of the most useful features of the arXiv (and similar services such as CogPrints) is immediate notification of new results. Another is non-redundancy: the arXiv almost completely eliminates the disarray of having many copies of a paper which may or may not be different versions. The OAI standard does not address, and perhaps cannot address, either of these important advantages of a centralized system. A more balanced point of view would be to recognize that while a standards-based distributed system may be much better than anarchy, it doesn't finish the job. I also note that interoperability keeps getting reinvented. Precedent suggests that if OAI succeeds, it will fade into a transparent layer, and that beyond it people will see incompatability at a new level and invent another standard. HTTP is already an interoperability standard, originally invented for the purpose of distributing research documents. And there are already HTTP-based search engines, including CiteSeer, which searches only for research papers. So it's important explain how OAI would go beyond HTTP+CiteSeer. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
I have been skimming the September98 forum on and off for a few months. As a cursory Internet search will demonstrate, I strongly support what I consider the Ginsparg model, especially in my own discipline, mathematics. I would call it the arXiv model. But while I agree in outline with Stevan Harnad et al, I disagree in some of the details. (And that's where the devil is.) Here is my take on three issues in particular. 1) I have mixed feelings about the grass-roots connotations of the Open Archives Inititiative and even more in Harnad's phrase self-archiving. I do believe that the research literature should be electronic and free, and it is possible that each discipline must pass through an anarchic, do-it-yourself phase of open archival before moving on to a more organized stage. However, when I started archive work in mathematics, we already had an array of separate preprint servers cum e-print archives. The effort since then has been to reorganize much of this jumble into the math arXiv. Having many copies of one huge archive is superior to having many little archives, no matter how interoperable. Serious permanence and stability requires closer cooperation than that. At the overall STM level the literature may have to be divided into single-discipline or few-discipline fragments for some time. The Los-Alamos based arXiv works well for the TeX-based e-print culture in mathematics, physics, and parts of computer science. But it is not clear how to extend that particular system to the rest of science. If you have to have disjoint archives, fragmented interoperability is then a good goal to work towards. But you have to realize that it is only a partial solution. And I have reservations about encouraging every tenth researcher to set up yet another archive, because that can lead to entrenched Lilliputian feifdoms of e-prints. By my standards the physics part of the arXiv, with 130,000 e-prints, is large; the math arXiv, with 13,000, is medium-sized; and an archive with 1,300 or less is tiny. 2) I have been accused, sometimes correctly, of being overzealous in my support of the arXiv. I see that Stevan Harnad has about as much enthusiasm as I do, and I can't criticize that. But if the September98 forum has strong advocacy in favor of open archives, it doesn't make sense to limit criticism. Because then you're just preaching to the choir. If you don't want to debate whether or not open archives are a good idea, maybe that makes sense. But then you shouldn't dwell on how fantastic open archives are; instead you should steer the discussion to practical plans. 3) I also can't criticize Elsevier's Chemistry Preprint Server project. In a way I can't even criticize commercial publishers with high journal prices, even though I believe that the mathematical literature should be free. A for-profit company is entitled to maximize profit. If it is publicly traded, it is legally required to do so up to a point. (But the same token, the customer, academia, is entitled to minimize expenses.) I'm against Napster-style copyright infringement and I have mixed feelings about journal boycotts. My approach is less confrontational. My own recent papers lie permanently in the arXiv, I keep the copyright, and I will publish in any journal that wants the papers on those terms. From this point of view, I am not sure about the Chemistry Preprint Server, because I don't see the business model for it. But then, I don't see the business model for Google either, and I think that Google is great. It is possible that the Chemistry Preprint Server will be an important gift from Elsevier to the chemistry research community. Arguably the chemists should have done it for themselves, but maybe they lack leadership and need Elsevier to do it for them. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Thu, 2 Nov 2000, Greg Kuperberg wrote: 1) I have mixed feelings about the grass-roots connotations of the Open Archives Initiative and even more in Harnad's phrase self-archiving. You have to distinguish between the Open Archives Initiative (OAI) and the (Author/Institution) Self-Archiving (Sub-)Initiative. OAI has now evolved into an initiative for shared standards and interoperability in the metadata tagging of the contents of online archives -- WHETHER OR NOT the contents (i.e., apart from the metadata) of the archives are full-text or free: http://www.openarchives.org A commercial publisher, for example, can establish an OAI-compliant Open Archive as readily as any other institution or individual, and would benefit from the increased visibility provided by the OAI-compliant interoperability for the contents of the Archive, even if the full-texts were kept behind an S/L/P financial firewall. A journal publisher can also establish an OAI-compliant FREE Open Archive, if they do wish to give away their full-text contents at this time (as around 400 biomedical publishers are currently willing to do, as indicated in a very recent posting: http://www.freemedicaljournals.com -- although most of those archives are not yet OAI-compliant). Nor is the OAI particularly committed to either centralized, discipline-based Open Archiving (e.g. ArXiv, CogPrints) or distributed, institution-based Open Archiving (Eprints): It is developing interoperability standards that apply to both, with the objective of making the difference between them less significant, eventually perhaps even irrelevant. The (Author/Institution) Self-Archiving (Sub-)Initiative, however, is SPECIFICALLY concerned with freeing the refereed research literature through author/institution self-archiving (in OAI-compliant Open Archives): http://www.eprints.org I do believe that the research literature should be electronic and free, and it is possible that each discipline must pass through an anarchic, do-it-yourself phase of open archival before moving on to a more organized stage. It is not at all clear why you describe open archiving as anarchic! It was precisely in order to put order into distributed online digital archiving resources through interoperability that the OAI was initiated! And the other aspect of the order is the order already provided by the refereed journals, in the form of peer review and its certification. That order is medium-independent, and will be preserved in a well-tagged Open Archive: Journal-Name will be a field, etc. The only do-it-yourself issue is self-archiving itself. And the issue is very clear: If researchers want the refereed literature freed, now, then they can do it themselves, by self-archiving, now. Otherwise, they have to wait until someone else (the journal publishers?) decides to free it for them -- and that could prove to be a very long wait indeed. Harnad, S. (1999) Free at Last: The Future of Peer-Reviewed Journals. D-Lib Magazine 5(12) December 1999 http://www.dlib.org/dlib/december99/12harnad.html However, when I started archive work in mathematics, we already had an array of separate preprint servers cum e-print archives. The effort since then has been to reorganize much of this jumble into the math arXiv. Having many copies of one huge archive is superior to having many little archives, no matter how interoperable. Serious permanence and stability requires closer cooperation than that. Again, it is a question of how long the researcher community is willing to wait for the optimal and inevitable: It is now within immediate reach to eliminate all the research access/impact-barriers, now, through self-archiving. Interoperability will integrate the results into a global Archive of the entire refereed research literature, in all disciplines, as searchable as the Institute for Scientific Information's Current Contents Database -- but including the full-texts themselves (and free). (See ARC as a prototype and fore-taste of this capability: http://arc.cs.odu.edu/) But note that arXiv-style centralized, discipline-based self-archiving in Physics, the most advanced self-archiving on the planet -- with 130,000 archived paper in 10 years -- has only freed 30-40% of the Physics literature so far, and will take 10 more years to free it all at the present steady linear growth rate: http://arXiv.org/cgi-bin/show_monthly_submissions Note that I used to cite the above graph repeatedly as evidence that the self-archiving cup is half-full. But it is also evidence that it is still half-empty -- and taking another 10 years to fill. So the idea is that distributed, pan-disciplinary, institution-based self-archiving (OAI-compliant, of course) may be what is needed to get this growth rate into the exponential range for Physics, as well as to carry it over into all the other disciplines. Of course multiple copies and mirroring (and harvesting and caching) will be as important for
Re: Central vs. Distributed Archives
On Thu, Nov 02, 2000 at 03:07:58PM +, Stevan Harnad wrote: It is not at all clear why you describe open archiving as anarchic! It was precisely in order to put order into distributed online digital archiving resources through interoperability that the OAI was initiated! I certainly think that a standard for interoperability could be useful, but it is wishful thinking to suppose that it can tame an anarchy of many tiny little e-print archives. In my discipline, when the literature is excessively decentralized, as it was entirely before 1998 and still largely is, neither authors nor readers have any confidence that papers floating around on the Net are permanent. And they are right, because no one could promise to keep those papers forever with any credibility. Any given paper could be erased accidentally if it is in one tiny archive somewhere. Or maybe the maintainer of that particular archive never explicitly promised permanence anyway; if so he could shut down his archive when he gets tired. The fact that the arXiv is so large and so widely used and mirrored is a necessary ingredient for assuring permanence. The only do-it-yourself issue is self-archiving itself. And the issue is very clear: If researchers want the refereed literature freed, now, then they can do it themselves, by self-archiving, now. The self in self-archiving could mean individuals acting for themselves, or it could mean the research community acting for itself by directly supporting one or a few archives. I have the feeling that you don't see this as an important distinction. I'll give you an analogy to show you what I mean. I use Linux, which an open, standards-based operating system. It would be absurd to call my use of Linux self-programming, even though Linux is maintained by some of its users. I see the arXiv as highly analogous to Linux. This is why I am reluctant to use the phrase self-archiving. Again, it is a question of how long the researcher community is willing to wait for the optimal and inevitable: It is now within immediate reach to eliminate all the research access/impact-barriers, now, through self-archiving. I can't say that this ambitious goal is within immediate reach in mathematics, because many of us have worked hard to make it happen and we see a lot of work ahead. We can't expect all mathematicians to change their minds in one day. I have no desire to believe, as I once did, that the exponential rocket is about blast off. If you think that encouraging many small archives to spring up is the magic step, then I simply disagree. Because when we glued together many small archives into the math arXiv, the whole was much more than the sum of the parts. Even though the math arXiv has only 5% of new math papers, and even though it will take years for it to get to even 50%, it is at least growing more quickly than all of the Lilliputian mathematical archives put together. The Los-Alamos based arXiv works well for the TeX-based e-print culture in mathematics, physics, and parts of computer science. But it is not clear how to extend that particular system to the rest of science. Why? This formula has been repeated so many times that people are actually believing it, without anyone ever having explained why it should be thought to be true! I don't mean to say that other disciplines can't have an open archive that's *like* the arXiv. I certainly think that they can. I mean that other disciplines are sufficiently different that their open archives might need separate administration. And that would lead to fragmentation, which concerns me more than it does you. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Thu, 2 Nov 2000, Greg Kuperberg wrote: I certainly think that a standard for interoperability could be useful, but it is wishful thinking to suppose that it can tame an anarchy of many tiny little e-print archives. In my discipline, when the literature is excessively decentralized, as it was entirely before 1998 and still largely is, neither authors nor readers have any confidence that papers floating around on the Net are permanent. And they are right, because no one could promise to keep those papers forever with any credibility. ... The fact that the arXiv is so large and so widely used and mirrored is a necessary ingredient for assuring permanence. (1) Archives meeting the conditions to be registered OAI-compliant data-providers http://www.openarchives.org/sfc/sfc_archives.htm are not likely to be tiny little ones (though it is no problem if some of them are). (2) Most Eprints Archives are likely to be university-based archives, for all the university's refereed research, in all its disciplines. That's hardly tiny (or impermanent) either. (3) The goal is to free the refereed literature, across disciplines, now. Once the literature is thus freed the process will be irreversible. (4) The mechanisms for preserving and navigating it will continue to evolve and improve, with the whole world's refereed assets in this distributed basket (suitably mirrored, harvested, cached, backed up, etc.). (5) The immediate issue is hence not the PERMANENCE of the self-archived drafts but their EXISTENCE, free for all, now. The permanence will take care of itself. The self in self-archiving could mean individuals acting for themselves, or it could mean the research community acting for itself by directly supporting one or a few archives. I have the feeling that you don't see this as an important distinction. You are right; I think it is a red herring. Most of the individuals in question (the authors of the refereed literature) are researchers at universities and research institutions. In principle each of them could set up his own Eprints Archive and register it with the OAI (and that would be fine as a start, and would free the literature irreversibly). But of course the likely, practical strategy is for the researchers' universities and research institutions (or, more specifically, their libraries) to create and administer their institutional Eprint Archives for all their researchers' refereed output, in all disciplines. (We can have at least as abiding a faith in the durability of the collections on universities' airwaves, then, as we now have in the durability of the collections on their shelves). I can't say that this ambitious goal is within immediate reach in mathematics, because many of us have worked hard to make it happen and we see a lot of work ahead. We can't expect all mathematicians to change their minds in one day. You are now talking about something else: You are talking about what it will take to induce the research cavalry to drink, once they have been led to the waters of self-archiving. There's no second-guessing human nature, but my own hunch is that the motivational structure at the researchers' own institution -- the one that benefits from (and rewards) the impact of its own researchers' refereed output, and the one that is today weighed down by the serials crisis and the limitations that that puts on its own researchers' access to the refereed output of researchers at other institutions -- may provide just the kind of local incentive for self-archiving that a centralized, discipline based entity so far seems unable to provide. In any case, these two routes to the liberation of the refereed corpus (centralized and distributed) are complementary (and interoperable!). If you think that encouraging many small archives to spring up is the magic step, then I simply disagree. Because when we glued together many small archives into the math arXiv, the whole was much more than the sum of the parts. Even though the math arXiv has only 5% of new math papers, and even though it will take years for it to get to even 50%, it is at least growing more quickly than all of the Lilliputian mathematical archives put together. I am not a mathematician, but this whole is greater than the sum of its parts argument does not add up for me! Centralized archiving in maths is at 5% and will take years to get to 50%. What possible reason would there be not to encourage complementing it by institutional Eprint Archives immediately -- given that they will all be co-harvested (and mirrored, and cached, etc.) in global virtual archives anyway, thanks to interoperability? other disciplines are sufficiently different that their open archives might need separate administration. And that would lead to fragmentation, which concerns me more than it does you. My concern is freeing the refereed literature online, now. There is no reason it should stay hostage to S/L/P barriers for another
Re: Central vs. Distributed Archives
On my other points: On Thu, Nov 02, 2000 at 03:07:58PM +, Stevan Harnad wrote: I have, as moderator, terminated discussion on a few irrelevant or saturated topics (is there a conspiracy of university administrators to control researchers' intellectual property? is the library serials crisis simply a consequence of under-funding the libraries? how can we reform or abandon peer review?), but comments, whether supportive or critical, on the Forum's central theme -- How to free the refereed literature online, now? -- have never been suppressed. You may see it as closing discussion of all sides of a topic, but I see some character of closing down just one side of a debate. Obviously you are referring to Al Henderson's argument that free scholarly communication is a stress response to penny-pinching by university administrations. I'll grant that he has said that many times, and I'll also grant that the argument sounds absurd to me. (I am one of the researchers supposedly bullied by the administration, and if anything my complaint is that the higher-ups are biased in favor of the historical subscription-based system.) But even though I don't agree with him at all, he is no more repetitive than you are or I am. Invoking cloture strikes me as an overreaction. I couldn't agree with you more! But what gives you the impression that this Forum is trying to prevent companies from doing whatever they like? What you said originally was: The Elsevier policy of publicly archiving pre-refereeing preprints could be a good first step towards the optimal and inevitable, but it is also possible that it is intended as a Trojan Horse,... I think it's divisive to speculate that someone else's e-print archive is a Trojan Horse. It's true that I'm not sure that the CPS is compatible with Elsevier's mission of maximizing profit. But let's give it the benefit of the doubt. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Thu, 2 Nov 2000, Greg Kuperberg wrote: what gives you the impression that this Forum is trying to prevent companies from doing whatever they like? What you said originally was: sh The Elsevier policy of publicly archiving pre-refereeing preprints sh could be a good first step towards the optimal and inevitable, but it sh is also possible that it is intended as a Trojan Horse,... I think it's divisive to speculate that someone else's e-print archive is a Trojan Horse. It's true that I'm not sure that the CPS is compatible with Elsevier's mission of maximizing profit. But let's give it the benefit of the doubt. Good. Both sides of the question have been aired. (Please distinguish my actions as moderator, when I invoke cloture, from the expression of my own views on this topic -- which carry no more weight then anyone else's ex officio.) Stevan Harnad
Re: Central vs. Distributed Archives
Stevan Harnad wrote: (3) The goal is to free the refereed literature, across disciplines, now. Once the literature is thus freed the process will be irreversible. Do you mean free as in liberty or free as in free beer ? This particular bone of contention has effectively split what used to be be known as a free software movement, but is now known as the free software/open source movement. --stuart yeates s.yea...@cs.waikato.ac.nz
Re: Central vs. Distributed Archives
On Fri, 3 Nov 2000, Stuart A Yeates wrote: (3) The goal is to free the refereed literature, across disciplines, now. Once the literature is thus freed the process will be irreversible. Do you mean free as in liberty or free as in free beer ? This particular bone of contention has effectively split what used to be be known as a free software movement, but is now known as the free software/open source movement. Free in the way advertisements are free (which I suppose is more like free beer -- when you're giving away your own home-brew). But this refereed brew is definitely not free in the sense of liberty (that would be the vanity press). It is constrained by and answerable to peer review. Hence it is not relevantly like software either. But once it successfully passes that quality-control process, and is certified as such, the author can and should maximize the access to, and hence the impact of this give-away refereed research by self-archiving it online, free for all. http://www.arl.org/sc/subversive/ Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southamptonhttp://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM
Re: Central vs. Distributed Archives
I like Greg Kuperberg's postings, even though we disagree. Greg too is an advocate of freeing the literature through author self-archiving, but he prefers centralized archives, whereas I think both centralized and distributed archiving are welcome and should be encouraged, as both can hasten the freeing of the refereed literature. Centralized archiving has been with us for over 10 years, and at its current rates it will take 10 more years to free the Physics literature alone, where it is most advanced. In Greg's own field of mathematics, it might be going even more slowly. It looks to me as if centralized self-archiving can now use the help of distributed institutional self-archiving. By way of counterevidence, Greg cites the fact that in mathematics institutional self-archiving predated centralized self-archiving and was unreliable. It was centralized self-archiving that accelerated and stabilized the process. What Greg seems to overlook is that the institutional self-archiving he describes PRE-DATED the Open Archives Initiative (OAI), with its interoperability. Hence the question of whether or not distributed self-archiving in OAI-compliant Institutional Eprint Archives will accelerate the freeing of the literature has not yet been tested. Greg also seems to conflate, at some junctures, the self-archiving of unrefereed preprints with the self-archiving of refereed postprints, as if self-archiving were in some sense a rival to or substitute for refereed publication (which I certainly do not think it is); self-archiving is merely a way to free the refereed literature. On Thu, 2 Nov 2000, Greg Kuperberg wrote: In 1997, the year before the universal math arXiv was started, there were already some 10 or 20 thousand research papers freely available on the web. Most of them were on personal home pages, but thousands were in institutional and subject-based preprint series. This is irrelevant, as noted above. These archives were not OAI-compliant and hence could not be integrated or navigated in a useful way. Nonetheless the vast majority of these papers were still eventually sold as published papers. This too is irrelevant. The initiative to free the refereed literature is a PRO-RESEARCHER and PRO-RESEARCH initiative, not an anti-publisher initiative (nor even particularly a pro-library initiative): The goal is to free the refereed literature for one and all online. That is what self-archiving does. The goal is NOT to prevent other versions of the refereed literature from being sold, on-paper or on-line, if there is a market for them. (Why would we want to do that?) So what were the publishers selling? Not peer review, because you can learn from Math Reviews where a paper has been published without subscribing to the journal. To a large extent the journal system was selling, and is still selling, stability and permanence. Fine. Let it continue to do so (whether the stability/permanence is real or merely imagined). As long as another version is online and free, the goal is met. So that has been the fundamental question of open archival in mathematics for years. That is why some of the recalcitrant math publishers say that the arXiv is just a preprint server and not a permanent e-print archive. Of course I don't agree with them; I choose the arXiv over subscription journals as the future route to permanent archival. I'm afraid that this is not making sense to me. What is the argument? That the jeering of some publishers nullifies the fact that that portion of the refereed literature that has been freed is indeed free? The substantive question is: Are the refereed papers online and free? If they are, who cares if some people keep calling them prepints, when in reality they include both, pre-refereeing preprints + post-refereeing postprints (= eprints)? But I sense another point of disagreement with Greg: Earlier he said it's not the peer-review that makes people keep paying for the for-fee (refereed) version despite the availability of the for-free (refereed) version, but the stability and permanence. Perhaps. But if the implementation of the peer-review were no longer paid for by the continued support for the publishers' version, perhaps the true value and causal role of peer-review in all of this would become clearer. Moreover, for now, it is not true stability/permanence that distinguishes the publishers' for-fee version and the archives' for-free version, but mere PERCEIVED stability/permanence. With time, that may change. But for now it certainly isn't any reason to deter us from self-archiving, either centrally or institutionally. On the contrary; as long as the publishers' for-fee version is seen as the guarantor of the stability/permanence, there is no reason whatever NOT to SUPPLEMENT that with the self-archived free version -- without giving the stability/permanence issue another thought! As a practical matter most of the institutional preprint series in mathematics
Re: Central vs. Distributed Archives
On Fri, 3 Nov 2000, Stuart A Yeates wrote: So if I hear you correctly OAI will have no traffic with technical reports or technical report servers? these _are_ vanity press. Incorrect. Eprints Archives are for both unrefereed preprints and refereed postprints, suitably tagged as such. Stevan Harnad
Re: Central vs. Distributed Archives
On Thu, Nov 02, 2000 at 09:29:24PM +, Stevan Harnad wrote: Centralized archiving has been with us for over 10 years, and at its current rates it will take 10 more years to free the Physics literature alone, where it is most advanced. In Greg's own field of mathematics, it might be going even more slowly. It looks to me as if centralized self-archiving can now use the help of distributed institutional self-archiving. Actually the main difference in math is that we in effect started later than physics did. Part of the reason for that is that some of the mathematicians involved, including me but not mainly me by any means, instead devoted effort to umbrella archive projects (i.e., global virtual archives) that ultimately failed. We have had much more success by moving in the opposite direction, i.e., by strengthening distributed open archival with a centralized foundation. What Greg seems to overlook is that the institutional self-archiving he describes PRE-DATED the Open Archives Initiative (OAI), with its interoperability. This is partly untrue. The MPRESS project (http://mathnet.preprints.org/) has a lot in common with OAI, and it was started before the universal math arXiv. It has its own metadata standard, Dublin Core, and its has a number of institutional preprint series among its data feeds. But it hasn't yet caught on. It doesn't seem to make much difference to authors whether a preprint series is indexed by MPRESS or not. Part of the trouble with MPRESS is that not all of its sources are providing as good metadata as they promised. Ironically the lion's share of good metadata in MPRESS comes from the math arXiv. I would like to know where OAI thinks that MPRESS went wrong. In fact since I maintain a service provider for the math arXiv, I looked into using OA-compliant metadata instead of the ad hoc metadata that I get from the arXiv. I discovered that the OA standard is an oversimplification of the full arXiv metadata record, to the point that I can't use the OA format. But don't get me wrong. I am in favor of fragmented interoperability if you really can't hope for something better. And as I said, the overall STM literature might well have to be fragmented, for now, down to the level of individual disciplines (e.g. chemistry) or small groups of disciplines (physics+math+cs). -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
At 21:29 02/11/00 +, Stevan Harnad wrote: Obviously I'm not a conservative offering rationales for inaction. And my worry is not a priori. NCSTRL and MPRESS are two long-standing attempts at standards-based fragmented interoperability. Neither one has as much readership as the younger, fully integrated math arXiv. They pre-dated OAI and Eprints. Have just a bit more patience; but be prepared to set aside prior prejudices or you will obstruct precisely what we both want to facilitate! NCSTRL was effectively the model for OAi. Greg Kuperberg suggests that NCSTRL has not been successful. It would be useful to have some meaningful measure of whether NCSTRL has been successful or not, and to hear the views of the NCSTRL developers (who are also involved in OAi). Maybe real evidence will yield clues to the ultimate destiny of OAi - central or distributed. The Harnad-Kuperberg dialogue has been fascinating but, to my mind, hasn't resolved the issue conclusively. It will be critical to understand what the user wants. Steve
Re: Central vs. Distributed Archives
(note: I'm not sure this will get through all the aliases -- I don't think this email addr is registered with the UPS list, for example) On Thu, 2 Nov 2000, Steve Hitchcock wrote: NCSTRL was effectively the model for OAi. Greg Kuperberg suggests that NCSTRL has not been successful. It would be useful to have some meaningful measure of whether NCSTRL has been successful or not, and to hear the views of the NCSTRL developers (who are also involved in OAi). Maybe real evidence will yield clues to the ultimate destiny of OAi - central or distributed. just a point of clarification: NCSTRL was not directly the model for OAI, at least architecturally. OAI has more in common with: - RePEc (http://www.repec.org/) - SODA (http://www.dlib.org/dlib/march99/maly/03maly.html) and similar architectures. A subset of the Dienst protocol gave us a starting ground for defining a harvesting protocol, but even that has been relaxed to allow Dienst and OAI to progress independently. Most OAI service providers will probably assume a distributed storage model, because it is certainly easier to build. But technically OAI is agnostic with respect to centralized vs. distributed storage of data. OAI focuses only on metadata. Regarding centralized vs. distributed, I would submit CiteSeer http://citeseer.nj.nec.com/cs as an exemplary DL that seems to have resolved the tension between the two models - providing both links to distributed copies and cached centralized copies. regards, Michael The Harnad-Kuperberg dialogue has been fascinating but, to my mind, hasn't resolved the issue conclusively. It will be critical to understand what the user wants. Steve -- UPS mail list Mail submissions to u...@vole.lanl.gov To subscribe or unsubscribe visit http://vole.lanl.gov/mailman/listinfo/ups --- Michael L. Nelson 207 Manning Hall, School of Information and Library Science University of North Carolinam...@ils.unc.edu Chapel Hill, NC 27599 http://ils.unc.edu/~mln/ +1 919 966 5042 +1 919 962 8071 (f)
Re: Central vs. Distributed Archives
On Thu, Nov 02, 2000 at 10:08:09PM +, Steve Hitchcock wrote: NCSTRL was effectively the model for OAi. Greg Kuperberg suggests that NCSTRL has not been successful. I don't want to disparage a project as big and difficult as NCSTRL. It has had some success. It's important. But I don't think that it's nearly as successful as the arXiv. I guess I said something stronger before, that NCSTRL is not as heavily read as the math arXiv, which is much smaller than the whole arXiv system. Well possibly I'm wrong on that. But I note that the math arXiv is just as heavily read on a per-paper basis as the larger parent arXiv system. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
Professor Harnad, On Mon, 28 Jun 1999, Stevan Harnad wrote: On Mon, 28 Jun 1999, J.W.T.Smith wrote: My objection to the Los Alamos Archive model is that it is centralised and such a model can easily degenerate into a monopoly. A monopoly of what PRODUCT, on behalf of what PROVIDER relative to what MARKET? For Los Alamos is in the (government-supported) business of making it possible for authors to give away reports of their own scientific research away to one and all for free. A monopoly in the sense that it could become 'the place' where readers look for items relevant to their subject. The non-presence of an article in a recognised subject specific archive could imply it is not relevant to the subject. More on this later. And what do you mean centralised? Los Alamos is open to one and all, reader and author alike, the world over; it is mirrored in 15 countries, cached in who knows how many other places and ways, incorporated into further Gateways such as NCSTRL and Spires, and there integrated with other archives. Anyone else can make copies of the archive too (that's part of what make the product free entails), and the authors who self-archive in it are encouraged to archive their papers elsewhere too, if they wish, including in their own institutional servers, which can then be gathered together as another backup of the central archive. You are missing the point. I am not concerned with its availability, I am concerned with the implied validation of the presence of an item in a given archive. Even if the archive is mirrored it is a mirror of somewhere and the address of that somewhere has value. If this has no value why to we need an archive at all? Why don't we all mount our papers on our University servers? There are two advantages that I can see of a subject specific archive: - It can be properly maintained (it is a true archive) - It can be a 'one stop shop' of where to look for items on a specific subject. I have no problem with the first role. It is the second that carries the possibility of monopoly. As long as the archive is maintained by a neutral organisation (like a large University) this is OK but what if it should become privatised? Once an archive (or its mirrors) is seen as 'the place' to search for items of interest and access to that archive can be controlled it might be temping to place some restriction on access like payment of a fee (for purely reasonable reasons like getting enough money to maintain the archive). Now I know the actual quality control/validation is provided elsewhere (maybe by the 'old' journals, maybe by other players) but from the point of view of the author they may also need to be in the archive as well as have the validation/stamp of approval of an external organisation. As I have noted before, this central/distributed issue is a red herring, based in part on papyrocentric thinking (we are in reality talking about a distributed virtual library where locus has little meaning) You seem to contradict yourself here. If 'locus' (I don't mean physical position) has no meaning why do we need a Physics archive, or a Biomed archive, or any other subject archive? Why can't we either have one universal archive which simply stores and serves on request (at no cost and forever) any item sent to it, or no archive at all with items being stored on a user site or a University site or a commercial site (or all three or some other option/permutation)? Stop thinking in terms of a reader-end product, with competition among access-blockers, and think instead in terms of a platform for author-end freebies, with collaboration among access-providers, and things will come into better focus. This is the refereed journal literature, not trade books or magazines. You are preaching to the converted. I have been aware the trade model is wrong for academic publishing for many years. There have been proposals to replace this model going back to the 1920s or before. Nothing new here. Summary: It is possible to escape the problems of the 'trade model' of current academic publishing without running headlong into the possibly equally constraining model of a monopolistic central archive. Yes. Change the vocabulary. Why don't you drop the word 'journal' then? Why not use 'validator' or some other word that indicates the role and doesn't carry over connotations from the old papyrocentric model? John Smith, University of Kent at Canterbury, UK.
Re: Central vs. Distributed Archives
On Tue, 29 Jun 1999, J.W.T.Smith wrote: A monopoly in the sense that it could become 'the place' where readers look for items relevant to their subject. The non-presence of an article in a recognised subject specific archive could imply it is not relevant to the subject. More on this later. Papyrocentric thinking. We live in the era of metadata tagging and search engines that trawl it all. I am not concerned with its availability, I am concerned with the implied validation of the presence of an item in a given archive. Don't be. The validator is the journal, as it always was. The Archive is only the free cosmic bookshelf in the Sky... Even if the archive is mirrored it is a mirror of somewhere and the address of that somewhere has value. If this has no value why to we need an archive at all? Why don't we all mount our papers on our University servers? We should! That was the gist of my 1994 Subversive Proposal: http://www.arl.org/sc/subversive/ But there are currently still interoperability problems with institutional servers, so the colossal success of Los Alamos has shown that we will reach the optimal and inevitable faster by taking both routes, the centralised and the distributed one: http://xxx.lanl.gov/cgi-bin/show_monthly_submissions There are two advantages that I can see of a subject specific archive: - It can be properly maintained (it is a true archive) - It can be a 'one stop shop' of where to look for items on a specific subject. I have no problem with the first role. It is the second that carries the possibility of monopoly. As long as the archive is maintained by a neutral organisation (like a large University) this is OK but what if it should become privatised? EVERYTHING runs the risk of being privatized: Universities, Los Alamos, NIH. Fighting against the privatization-frenzy in whose grip the entire planet seems to be at the moment is a worthy enough mission, but it is completely irrelevant to the centralization/monopoly red herring that I believe you are preoccupied with -- for the simple reason that the menace of privatization is completely nonspecific, and afflicts ALL options, in principle. In practise, I would not worry too much about a hostile take-over of NIH by the private sector in the near future, nor about NSF tossing the Los Alamos Archive to the Trade Winds. Besides, one of the STRENGTHS of centralization is that the authors that have put their precious eggs in the collective basket and the users who forage them tend to monitor them zealously day and night, and are likely to squack vociferously if they sense any threat: Taubes, Gary. E-mail withdrawal prompts spasm. (temporary shut-down of Los Alamos Laboratory e-print archives succeeds in raising funds) Science v262, n5131 (Oct 8, 1993):173 (2 pages). ABSTRACT: Paul Ginsparg shut down the e-print archives of Los Alamos National Laboratory, the physicists' pre-publication bulletin board for a few days. The closure incited users to petition the Department of Energy and National Science Foundation for funds and secured official funding from Los Alamos. Once an archive (or its mirrors) is seen as 'the place' to search for items of interest and access to that archive can be controlled it might be temping to place some restriction on access like payment of a fee (for purely reasonable reasons like getting enough money to maintain the archive). A lot of other networked services are likely to get a price tag before the tiny refereed literature archive is likely to: It is the flea on the tail of the dog, and we will all be best served if it is given a free ride. Again, this worry is papyrocentric and misplaced. Now I know the actual quality control/validation is provided elsewhere (maybe by the 'old' journals, maybe by other players) but from the point of view of the author they may also need to be in the archive as well as have the validation/stamp of approval of an external organisation. This sentence was a bit difficult to decode, but from what I can make of it, one entity (the established journals -- why on earth not?) can continue to do the quality controlling and certification-tagging, and another (new, virtual) one, the Archive, can provide free access to the texts. What is the problem? As I have noted before, this central/distributed issue is a red herring, based in part on papyrocentric thinking (we are in reality talking about a distributed virtual library where locus has little meaning) You seem to contradict yourself here. If 'locus' (I don't mean physical position) has no meaning why do we need a Physics archive, or a Biomed archive, or any other subject archive? Why can't we either have one universal archive which simply stores and serves on request (at no cost and forever) any item sent to it, or no archive at all with items being stored on a user site or a University site or a commercial site (or all three or some
Re: Central vs. Distributed Archives
On Tue, 29 Jun 1999, J.W.T.Smith wrote: I don't see what is 'papyrocentric' about... the idea of an item gaining some kudos by being in a certain archive... A similar situation occurs when a journal gains kudos from being indexed in a specific online biblographic database. No paper involved here. Forget about indexing in databases. (If the primary journal publishers need to think about what their new niche will be in the online world of free, full-text self-archiving by authors, the secondaries and tertiaries will unfortunately have more serious worries!) The kudos comes from (P) the prestige (peer-review rigour, quality, impact factor) of the journal that accepts the paper and (I) the impact that it makes on research, in the form of further work citing and building upon it. The potential impact will be made incomparably greater by free online access for one and all. Where is the papyrocentric thinking? In the thought that the paper's locus on the Web is the source of the kudos (as the source of a paper paper's kudos was the paper journal in which it appeared). The accepting journal's imprimatur will shrink to a quality control metadata tag, like a brand-name; the locus (virtual or real) of the bytes will be of no consequence whatsoever. Papyrocentric too is the idea that there is something to compete for in being the locus of a paper. Nothing to sell, nothing to compete for. The subject specific archive seems an unnecessary complication. Applying Occam's razor it seems we can chop it off and the system can run happily without it. The Los Alamos Archive has demonstrated that (at least in Physics), the centralized end of the candle managed to free the literature before the distributed end did. Occam says: Hedge your bets and do both: Deposit in your local server AND the global one. Harnad, S. (1998) On-Line Journals and Financial Fire-Walls. Nature 395: 127-128. http://www.ecs.soton.ac.uk/~harnad/nature.html All authors should continue to entrust their work to the paper journals of their choice. But if, in addition, they were to publicly archive their pre-refereeing preprints and then their post-refereeing reprints on-line on their Home Servers, for free for all, then the de facto practises of the reader community would take care of the rest (irrespective of their reservations about bed/bath/beach reading); library serial cancellations, the collapse of the paper cardhouse, publisher perestroika, and a free for all, e-only serial corpus financed by author-end page charges would soon follow suit. A centralised variant of this subversion scenario, http://xxx.lanl.gov, has already passed the point of no return in Physics and some allied disciplines in the form of Paul Ginsparg's (1994, 1996) U.S. NSF- (National Science Foundation) and DOE- (Department of Energy) supported Physics Eprint Archive at Los Alamos National Laboratory; as history will confirm, he single-handedly set the world Learned Community on its inexorable course toward the optimal and the inevitable in August 1991. js Why don't you drop the word 'journal' then? Why not use 'validator' or js some other word that indicates the role and doesn't carry over js connotations from the old papyrocentric model? sh Suit yourself. But I think Physical Review Letters will continue to sh prefer to call itself by its current familiar and trusted brand name -- sh and why on earth shouldn't it? I'm not saying we shouldn't have Physical Review Letters (or any other title) just that in the new model we should stop calling it a 'journal'. Suit yourself. Maybe we should stop calling the contents articles too. But what's the point? The problem with the word 'journal' is that it carries connotations from the papyrocentric world. For example - the idea that an item can only be in one 'journal'. And a good connotation too! We have already gone round this one before: Referees are a scarce and overworked resource. There is no justification for asking anyone to referee an already-refereed, already-accepted paper yet again, for acceptance yet again, elsewhere. See the two reasonable sources of kudos above: (P) is acceptance by a peer-reviewed Journal; (I) (and more important) is acceptance by one's peers through the paper's impact on their reading, research and citations. No more need for infinite rounds of peer reviewing and re-reviewing. Otherwise it's like going back to school for more and more exams instead of getting on with it! We haven't the time or the manpower for such an orgy of endless assessment (even in the UK!). This does not need to be the case in a net-based model. Your descriptions of your model seem to contain a papyrocentric influence since there still seems to be a close relationship between an item and the 'journal' that validates it. There is nothing papyrocentric about quality control and certification. Even eggs go through that
Re: Central vs. Distributed Archives
Professor Harnad, On Tue, 29 Jun 1999, Stevan Harnad wrote: On Tue, 29 Jun 1999, J.W.T.Smith wrote: A monopoly in the sense that it could become 'the place' where readers look for items relevant to their subject. The non-presence of an article in a recognised subject specific archive could imply it is not relevant to the subject. More on this later. Papyrocentric thinking. We live in the era of metadata tagging and search engines that trawl it all. I don't see what is 'papyrocentric' about this since the idea of an item gaining some kudos by being in a certain archive has no necessary connection to the paper world. A similar situation occurs when a journal gains kudos from being indexed in a specific online biblographic database. No paper involved here. Once an archive (or its mirrors) is seen as 'the place' to search for items of interest and access to that archive can be controlled it might be temping to place some restriction on access like payment of a fee (for purely reasonable reasons like getting enough money to maintain the archive). A lot of other networked services are likely to get a price tag before the tiny refereed literature archive is likely to: It is the flea on the tail of the dog, and we will all be best served if it is given a free ride. Again, this worry is papyrocentric and misplaced. Again I don't see why this is 'papyrocentric'. It may be paranoid but it is not 'papyrocentric' :-) . Now I know the actual quality control/validation is provided elsewhere (maybe by the 'old' journals, maybe by other players) but from the point of view of the author they may also need to be in the archive as well as have the validation/stamp of approval of an external organisation. This sentence was a bit difficult to decode, but from what I can make of it, one entity (the established journals -- why on earth not?) can continue to do the quality controlling and certification-tagging, and another (new, virtual) one, the Archive, can provide free access to the texts. What is the problem? The subject specific archive seems an unnecessary complication. Applying Occam's razor it seems we can chop it off and the system can run happily without it. Why don't you drop the word 'journal' then? Why not use 'validator' or some other word that indicates the role and doesn't carry over connotations from the old papyrocentric model? Suit yourself. But I think Physical Review Letters will continue to prefer to call itself by its current familiar and trusted brand name -- and why on earth shouldn't it? I'm not saying we shouldn't have Physical Review Letters (or any other title) just that in the new model we should stop calling it a 'journal'. The problem with the word 'journal' is that it carries connotations from the papyrocentric world. For example - the idea that an item can only be in one 'journal'. This does not need to be the case in a net-based model. Your descriptions of your model seem to contain a papyrocentric influence since there still seems to be a close relationship between an item and the 'journal' that validates it. There is no reason why an item could not be validated by more than one validator - especially if it crosses current subject boundaries. John Smith, University of Kent at Canterbury, UK.
Central vs. Distributed Archives
On Mon, 28 Jun 1999, J.W.T.Smith wrote: This entire debate seems to have become hung up on whether or not the Los Alamos Archive model is applicable to e-publishing or e-archiving in other subject areas (especially biomed). This has obscured the fact it is perfectly possible to believe, as I do, that the Los Alamos Archive model is not the way to go for many subjects yet also believe in a model where the role of current journals is reduced to that of quality control only. My objection to the Los Alamos Archive model is that it is centralised and such a model can easily degenerate into a monopoly. A monopoly of what PRODUCT, on behalf of what PROVIDER relative to what MARKET? For Los Alamos is in the (government-supported) business of making it possible for authors to give away reports of their own scientific research away to one and all for free. And what do you mean centralised? Los Alamos is open to one and all, reader and author alike, the world over; it is mirrored in 15 countries, cached in who knows how many other places and ways, incorporated into further Gateways such as NCSTRL and Spires, and there integrated with other archives. Anyone else can make copies of the archive too (that's part of what make the product free entails), and the authors who self-archive in it are encouraged to archive their papers elsewhere too, if they wish, including in their own institutional servers, which can then be gathered together as another backup of the central archive. http://xxx.soton.ac.uk/servers.html http://ncstrl.cs.cornell.edu/ http://www.slac.stanford.edu/spires/about_spireshep.html As I have noted before, this central/distributed issue is a red herring, based in part on papyrocentric thinking (we are in reality talking about a distributed virtual library where locus has little meaning) and in part on proprietary thinking, based on the reader-end, access-blockage trade model (whereas we are talking about self-archiving facility in which authors distribute their own products for free). This has all been discussed in: http://amsci-forum.amsci.org/archives/september-forum.html See: HTTP://AMSCI-FORUM.AMSCI.ORG/scripts/wa.exe?A1=ind99L=september-forumF=lfO=TH=0D=0T=1#5 You asserted in a recent note (27 June) that there was no intention that any archive become a 'mega-journal'. However if it becomes the place where academics in a given subject expect to find relevant articles it will have become just that and it will become *necessary* for authors to place their work there. Nothing of the sort! The journal is the quality controller and certifier. There will continue to be the full spectrum and hierarchy of journals, varying in quality and impact factor, each with its own distinctive brand name. In the virtual archive, this will be designated by tags, so you can restrict your search engine to the refereed literature appearing in, say, American Physical Society journals only, if you wish. An Author Archive is hence, as I said, not a Mega-Journal: It is an archive, in which the entire refereed journal literature (as well us the unrefereed preprint literature) is available for free for all. Now who is monopolizing what for whom? Although I have long argued, e.g., http://www.ukc.ac.uk/library/papers/jwts/d-journal.htm for the separation of the quality control role of the traditional journal from the publication role I have always advocated a 'distributed' model over a 'centralised' model for 'publication/archiving'. This at least escapes the possibility of a monopoly by the operators of the central archive. It also echoes the argument in Stuart Weibel's earlier note (11 June) about the redundncy inherent in the multiple copies of books/journals in the current paper library model. That model may be inefficient (too many duplicates are kept) but its robustness is clear. Redundancy is a non-problem; we know all about backups, mirrors, distributedness, and even distributed coding. It is a waste of time to keep dwelling on these solved problems. Moreover, they have nothing to do with the monopoly issue, which is likewise a red herring. Stop thinking in terms of a reader-end product, with competition among access-blockers, and think instead in terms of a platform for author-end freebies, with collaboration among access-providers, and things will come into better focus. This is the refereed journal literature, not trade books or magazines. we should take from past publishing models that which is clearly of value like peer review (and maybe distributed archiving?) but discard that which is clearly constraining (due probably to some feature of the underlying medium of the old model) like the linking of quality control and distribution. Correct, but then what is all this needless fuss about centralisation and monopoly? Summary: It is possible to escape the problems of the 'trade model' of current academic publishing without running headlong into the possibly equally