Re: Cliff Lynch on Institutional Archives
On Tue, 18 Mar 2003, Christopher Gutteridge wrote: we are planning a University-wide eprints archive. I am concerned that some physicists will want to place their items in both the university eprints service AND the arXiv physics archive. They may be required to use the university service, but want to use arXiv as it is the primary source for their discipline. This is a duplication of effort and a potential irritation. This is a very minor technical problem (the interoperability of multiple OAI Archives containing the same paper) and part of another, slightly less minor problem, namely, version-control, within and across OAI Archives (the coordination of multiple versions and revisions of the same paper, within the same or different OAI archives), plus the optimization of cross-archive OAI search services: http://www.openarchives.org/service/listproviders.html I recommend that this be discussed with the pertinent experts in oai-tech or oai-general. It is not a general archiving or open-access matter, and can only confuse researchers (needlessly). For them, self-archiving is the optimal thing to do, institutionally in the first instance, but also in a central disciplinary archive if/when they wish; and they should not worry any further about it. (What is needed, urgently, today, is universal self-archiving, and not trivial worries about whether to do it here or there or both: OAI-interoperability makes this into a non-issue from the self-archiver's point of view, and merely a technical feature to sort out, from the OAI-developers' point of view.) Ultimately, of course, I'd hope that disciplinary archives will be replaced with subject-specific OAI service providers harvesting from the institutional archives. But there is going to be a very long transition period in which the solution evolves from our experience. A very long transition period from what to what? Right now, most OAI Archives, whether institutional or disciplinary, are either (1) non-existent, or (2) near-empty! The transition we are striving for is from empty to full archives (and let us hope it will not be too long!), not from disciplinary to institutional archives! What Chris has in mind is only one, exceptional, special case, namely, the Physics ArXiv, a disciplinary archive (but the *only* one) which is, since 1991, well on the road to getting filled in certain subareas of physics (200,000+ papers) (although even this archive is still a decade from completeness at its present linear growth rate: http://arxiv.org/show_monthly_submissions see slide 10 of http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm ) Chris is imagining that if/when the institutions of those physicists who are already self-archiving in ArXiv adopt an institutional self-archiving policy like the one in Chris's own department -- http://www.ecs.soton.ac.uk/~lac/archpol.html -- then some of those physicists may wonder why/whether they should self-archive twice! (A tempest in a teapot! The real challenge is getting all the *other* disciplines to self-archive in the first place. Don't worry about those physicists who are already ahead of the game. They are not the problem!) What I'm asking is; has anyone given consideration to ways of smoothing over this duplication of effort? Possibly some negotiated automated process for insitutional archives uploading to the subject archive, or at least assisting the author in the process. No need! First, because the duplification of effort is so minimal (the centrally self-archiving physicists being such an infinitesimal subset of all that needs to be self-archived -- namely, 2,000,000 articles per year, across disciplines, not just 200,000 across 10 years, in one discipline!). And second, because the technical problem (of duplicate self-archiving) is so soluble, in so many obvious ways! This isn't the biggest issue, but it'd be good to address it before it becomes more of a problem. It is such a small issue that it does not belong in a general discussion of open access and self-archiving for researchers. It belongs only in a technical discussion group for developers and implementers of the OAI protocol. The only issue for the research community is how to get the OAI Archives created and filled, as soon as possible; and I think it is becoming apparent that institution-based self-archiving is the most general and natural route to this goal, for the many reasons already discussed in this thread. Stevan Harnad
Re: Cliff Lynch on Institutional Archives
On Tue, 18 Mar 2003, Thomas Krichel wrote: cg What I'm asking is; has anyone given consideration to ways of smoothing cg over this duplication of effort? Possibly some negotiated automated process cg for insitutional archives uploading to the subject archive, or at least cg assisting the author in the process. This is not a pressing concern as much as it appears, because discipline-based archives have, arXiv apart, not that much stuff. Thomas gives exactly the correct answer to Chris! It is better, within an institution, to proceed department by department and listen to what the academics want (and these wants will be different in each department), rather than setting up one archive that is supposed to satisfy everybody's needs at the risk of satisfying nobody's. Of course. Institutional self-archiving does not imply one single university archive, but an OAI-interoperable network, parametrized to suit any special needs of each discipline. (That's certainly how Chris's eprints.org software is being designed: http://software.eprints.org/ ) it is best to listen to academics telling you what their needs are, rather than setting up procedures around a central institutional archive, The latter is what Clifford Lynch wants. I don't think that it will work. What is needed is institutional self-archiving, distributed across its departments interoperably, but customized to the different needs of the different disciplines. cg Ultimately, of course, I'd hope that disciplinary archives will be cg replaced [by] subject-specific OAI service providers harvesting cg from the institutional archives. I would put this in different way, I'd say that there should be more interoperability between institutional archives and disciplinary aggregators. Such aggregators don't have a prime function of archiving contents but to put the archival contents into relations with personal and institutional data and document-to-document metadata such as citations. Rather than marking up the documents content in the institutional archive with subject classification data, it should be marked up with aggregator data... n the longer run, we need an extension to the OAI protocol to support this on a larger scale. No problem. This is certainly something the OAI developers can address. But it has nothing to do with what Chris was worrying about (dupicate self-archiving in disciplinary and institutional archives); and it seems to agree about the primacy of institution-based archiving (but distributed across, and adapted to, the institution's departments and disciplines). Faculty should be given the choice [between disciplinary and institutional self-archiving]. They should not be required to do either one. arXiv have been doing a tremendous job at archiving. You are not going to replace them. But arXiv really only covers a small set of disciplines well. This seems to contradict what was said before! It would be impossible to implement an effective, systematic institutional self-archiving policy if it were optional whether researchers self-archive in their institutional archive or in a central disciplinary archive (even though OAI-interoperability makes the two alternatives completely equivalent from an open-access point of view). Let me count the ways: (1) Institutions can mandate self-archiving, disciplines cannot. (2) Most disciplines do not have disciplinary OAI Archives at all. (3) All institutions have (just about) all disciplines. (4) There are many other potential uses for institutional research archives (apart from open access). (5) OAI-interoperability guarantees that institutional and disciplinary self-archiving are equivalent from the open-access point of view, but aggregating institutional packages out of distributed disciplinary OAI archives is harder (though it is not clear how much harder) than aggregating disciplinary packages out of distributed institutional OAI archives. (6) But it is not the equivalence or ease of aggregation that is relevant at this point (with most archives non-existent or near-empty) but what is the most promising and natural way to reach universal open access. (Return to (1) above.) http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm http://paracite.eprints.org/cgi-bin/rae_front.cgi Stevan Harnad
Re: Cliff Lynch on Institutional Archives
On Tue, 18 Mar 2003, Philip Hunter wrote: sh Right now, most OAI sh Archives, whether institutional or disciplinary, are either (1) sh non-existent, or (2) near-empty! The transition we are striving for is sh from empty to full archives (and let us hope it will not be too long!), sh not from disciplinary to institutional archives! I have to agree with this assessment. It is rather puzzling that take-up of the technology is (relatively) so low, and the archives are so empty It's not quite *that* slow! http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm unless organisational issues (such as, for example, the absence of an intelligible publishing process for the multiple submission of eprints to archives) are more important in the field than is being recognised in this discussion. I couldn't follow this! What is an intelligible publishing process? Eprints = pre-refereeing preprints + published, refereed postprints. http://www.eprints.org/self-faq/#What-is-Eprint Self-archiving is not self-publishing; it is merely a means of providing open access to one's own preprints and postprints. And what is multiple submission of eprints to archives? Eprints are not submitted to archives. (They are merely deposited in archives.) Preprints are submitted to *journals* (for peer review), and if/when accepted, the refereed postprints are published by those journals. Preprints and postprints are deposited (self-archived) in Eprint Archives. The rate of archive-creation and filling is increasing, but it needs to be accelerated substantially, and as soon as possible. Systematic institutional self-archiving policies will help accomplish this once institutions realize the direct causal connection between maximizing research access and maximizing research impact. http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.ppt http://www.ecs.soton.ac.uk/~harnad/Temp/unto-others.html http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm http://www.ecs.soton.ac.uk/~lac/archpol.html http://paracite.eprints.org/cgi-bin/rae_front.cgi Stevan Harnad
Re: Cliff Lynch on Institutional Archives
Stevan Harnad writes Thomas gives exactly the correct answer to Chris! I didn't know this was a quiz :-) What is needed is institutional self-archiving, distributed across its departments interoperably, but customized to the different needs of the different disciplines. That is a tall order. (1) Institutions can mandate self-archiving, disciplines cannot. Cliff imagines that they can, but in practice, it will be tough. You can not put a KGB officer in every academic's office! (2) Most disciplines do not have disciplinary OAI Archives at all. Sure, but all have some ways to communicate informally, and many have innovative channels. Sure, many of them stay small, but there is not technical obstacle to a meaningful aggergation. (4) There are many other potential uses for institutional research archives (apart from open access). I agree. If I would run an institution's archive I would back up all the web sites each year. In 20 years time, you would get a fascinating picture of the development of the institution. (5) OAI-interoperability guarantees that institutional and disciplinary self-archiving are equivalent from the open-access point of view, but aggregating institutional packages out of distributed disciplinary OAI archives is harder (though it is not clear how much harder) than aggregating disciplinary packages out of distributed institutional OAI archives. no, it is easier to construct feature-rich datasets out of disciplinary archives, because some of them will be prepared with the specifics of an aggregator in mind. With greetings from Minsk, Belarus, Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel
Re: Cliff Lynch on Institutional Archives
On Tue, 18 Mar 2003, Thomas Krichel wrote: sh (1) Institutions can mandate self-archiving, disciplines cannot. Cliff [Lynch] imagines that they can, but in practice, it will be tough. You can not put a KGB officer in every academic's office! You're on the wrong track. Self-archiving can and will be mandated by researchers' instituitions by and for *exactly* the same reasons and methods as publishing-or-perishing is mandated by institutions. No KGB, just the simple carrot/stick career consequences of research and research impact. Once the direct causal connection between access and impact is shown and known -- e.g., http://www.nature.com/nature/debates/e-access/Articles/lawrence.html -- everyone will find it as natural that research institutions should reward their researchers for maximizing the impact of their publishing (by self-archiving it) as to maximize the publishing itself. http://www.eprints.org/self-faq/#institution-facilitate-filling sh (2) Most disciplines do not have disciplinary OAI Archives at all. Sure, but all have some ways to communicate informally, and many have innovative channels. Sure, many of them stay small, but there is no technical obstacle to a meaningful aggregation. Here is the point on which Thomas and I part ways (profoundly). I agree completely that where papers have not yet been self-archived in OAI-compliant Archives (whether institutional or disciplinary) it is highly desirable to find, link, metadata-enhance or harvest any discoverable online papers that already exist on arbitrary websites webwide. This is the invaluable service Thomas's RePEc (Research Papers in Economics) is performing for over 86,000 non-OAI papers that would otherwise be very difficult to find and use http://repec.org/ But the objective of OAI-compliant institutional self-archiving (and a systematic policy mandating it) is to get away as soon as possible from having to resort to these makeshift solutions for arbitrary web content. (Nor is any of this relevant to what I said, which is that most disciplines do not have disciplinary OAI Archives at all, and disciplines are in no position to mandate self-archiving, whereas institutions are.) http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm http://paracite.eprints.org/cgi-bin/rae_front.cgi sh (5) OAI-interoperability guarantees that institutional and disciplinary sh self-archiving are equivalent from the open-access point of view, but sh aggregating institutional packages out of distributed disciplinary sh OAI archives is harder (though it is not clear how much harder) than sh aggregating disciplinary packages out of distributed institutional sh OAI archives. no, it is easier to construct feature-rich datasets out of disciplinary archives, because some of them will be prepared with the specifics of an aggregator in mind. I regret I couldn't follow the logic of this at all. First, there are almost no disciplinary OAI archives. Second, makeshift measures with arbitrary web content are exactly that: makeshift, interim measures. Third, from the fact that some arbitrary content may happen to have some desirable specific features, nothing whatsoever follows. And fourth, whatever are the specific features desired, they can be systematically included (and mandated) in the institutional OAI archives (parametrized to fit each discipline). Aggregation is not the objective: Interoperable content is; and (mandated) institutional OAI self-archiving is the most direct, fastest and surest way to generate it. Stevan Harnad
Re: Cliff Lynch on Institutional Archives
Stevan Harnad writes What is needed, urgently, today, is universal self-archiving, and not trivial worries about whether to do it here or there or both: OAI-interoperability makes this into a non-issue from the self-archiver's point of view, and merely a technical feature to sort out, from the OAI-developers' point of view. Success here depends on selling the idea to academics, and that depends crucially on what business models are followed. What Chris has in mind is only one, exceptional, special case, namely, the Physics ArXiv, a disciplinary archive (but the *only* one) which is, since 1991, well on the road to getting filled in certain subareas of physics (200,000+ papers) (although even this archive is still a decade from completeness at its present linear growth rate: http://arxiv.org/show_monthly_submissions see slide 10 of http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm ) There are other special cases. In fact each of the disciplines that have traditionally issued preprints and working papers, i.e. computer science, economomics, mathematics and physics has its own special case. All have their own business model. One size does not fit all. No need! First, because the duplification of effort is so minimal It will not be, especially when there is a chance to have different versions in different archives, this could be rather, if not highly, problematic. It is such a small issue that it does not belong in a general discussion of open access and self-archiving for researchers. You constantly belittle techncial problems, and then you wonder why the archives are staying empty or do not exist. Answer: because these technical problems have not been solved. By belittling them, you put yourself in the way of finding a solution. With greetings from Minsk, Belarus, Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel
Re: Cliff Lynch on Institutional Archives
On Tue, 18 Mar 2003, Thomas Krichel wrote: sh What is needed, urgently, today, is universal self-archiving, and sh not trivial worries about whether to do it here or there or both: sh OAI-interoperability makes this into a non-issue from the sh self-archiver's point of view, and merely a technical feature to sh sort out, from the OAI-developers' point of view. Success here depends on selling the idea to academics, and that depends crucially on what business models are followed. I have no idea what business models have to do with demonstrating to academics that increasing research access increases research impact. http://www.nature.com/nature/debates/e-access/Articles/lawrence.html each of the disciplines that have traditionally issued preprints and working papers, i.e. computer science, economomics, mathematics and physics has its own special case. All have their own business model. One size does not fit all. I still can't follow. These are among the disciplines whose researchers have self-archived -- in the case of physics/maths, mostly in one disciplinary archive, in the case of computer science and economics, in arbitrary websites (and some central archives). I don't know what you mean by a business model. And the only fact that fits them all is that self-archiving maximizes research impact by maximizing research access. That is also the only relevant fact here -- other than that OAI-compliant self-archiving is far more effective and desirable than arbitrary self-archiving. No need! First, because the duplification of effort is so minimal It will not be, especially when there is a chance to have different versions in different archives, this could be rather, if not highly, problematic. I have no idea how much of a technical problem duplicate self-archiving would cause (whether of the same paper in different archives, or different versions of the same paper in the same or different archives). But my response is: If only that were our only remaining 'problem' then my work would be done! The real problem is getting the research community to realize that it needs to self-archive *at all* (never mind how many versions!), and why, and how. Compared to that fundamental nullplification-of-effort problem, which is the one we are still facing currently, any duplication-of-effort or balking-at-duplicating-effort problem is truly trivial. It is such a small issue that it does not belong in a general discussion of open access and self-archiving for researchers. You constantly belittle techncial problems, and then you wonder why the archives are staying empty or do not exist. Answer: because these technical problems have not been solved. By belittling them, you put yourself in the way of finding a solution. I belittle trivial problems to put them in context, and to highlight the sole nontrivial problem. Double-archived papers are a trivial problem. Non-archived papers are the nontrivial problem. I am *certain* (not guessing, *certain*) that the reason the archives are not filling faster is most decidedly *not* because of any aspect of the duplicate paper problem. Most researchers don't even understand why they should self-archive *one* version of a paper, let alone being concerned about having to self-archive more than one. What gets in the way of finding a solution to the nontrivial problem -- universal self-archiving -- is a pathway littered with trivial problems and nonproblems (of which duplication is merely the 23rd of at least 26 I've catalogued so far: http://www.eprints.org/self-faq/#23.Version ). Stevan Harnad