Re: Cliff Lynch on Institutional Archives

2003-03-18 Thread Stevan Harnad
On Tue, 18 Mar 2003, Christopher Gutteridge wrote:

 we are planning a University-wide eprints archive. I am
 concerned that some physicists will want to place their items in both
 the university eprints service AND the arXiv physics archive. They may
 be required to use the university service, but want to use arXiv as it
 is the primary source for their discipline. This is a duplication of
 effort and a potential irritation.

This is a very minor technical problem (the interoperability of multiple
OAI Archives containing the same paper) and part of another, slightly
less minor problem, namely, version-control, within and across OAI
Archives (the coordination of multiple versions and revisions of the
same paper, within the same or different OAI archives), plus the
optimization of cross-archive OAI search services:
http://www.openarchives.org/service/listproviders.html

I recommend that this be discussed with the pertinent experts in oai-tech
or oai-general. It is not a general archiving or open-access matter, and
can only confuse researchers (needlessly). For them, self-archiving is
the optimal thing to do, institutionally in the first instance, but also
in a central disciplinary archive if/when they wish; and they should
not worry any further about it. (What is needed, urgently, today, is
universal self-archiving, and not trivial worries about whether to do it
here or there or both: OAI-interoperability makes this into a non-issue
from the self-archiver's point of view, and merely a technical feature
to sort out, from the OAI-developers' point of view.)

 Ultimately, of course, I'd hope that disciplinary archives will be replaced
 with subject-specific OAI service providers harvesting from the institutional
 archives. But there is going to be a very long transition period in which
 the solution evolves from our experience.

A very long transition period from what to what? Right now, most OAI
Archives, whether institutional or disciplinary, are either (1)
non-existent, or (2) near-empty! The transition we are striving for is
from empty to full archives (and let us hope it will not be too long!),
not from disciplinary to institutional archives!

What Chris has in mind is only one, exceptional, special case,
namely, the Physics ArXiv, a disciplinary archive (but the *only*
one) which is, since 1991, well on the road to getting filled in
certain subareas of physics (200,000+ papers) (although even this
archive is still a decade from completeness at its present linear
growth rate: http://arxiv.org/show_monthly_submissions see slide 10 of
http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm )

Chris is imagining that if/when the institutions of those physicists
who are already self-archiving in ArXiv adopt an institutional
self-archiving policy like the one in Chris's own department --
http://www.ecs.soton.ac.uk/~lac/archpol.html -- then some of those
physicists may wonder why/whether they should self-archive twice!
(A tempest in a teapot! The real challenge is getting all the *other*
disciplines to self-archive in the first place. Don't worry about those
physicists who are already ahead of the game. They are not the
problem!)

 What I'm asking is; has anyone given consideration to ways of smoothing
 over this duplication of effort? Possibly some negotiated automated process
 for insitutional archives uploading to the subject archive, or at least
 assisting the author in the process.

No need! First, because the duplification of effort is so minimal (the
centrally self-archiving physicists being such an infinitesimal subset
of all that needs to be self-archived -- namely, 2,000,000 articles per
year, across disciplines, not just 200,000 across 10 years, in one
discipline!). And second, because the technical problem (of duplicate
self-archiving) is so soluble, in so many obvious ways!

 This isn't the biggest issue, but it'd be good to address it before it
 becomes more of a problem.

It is such a small issue that it does not belong in a general discussion
of open access and self-archiving for researchers. It belongs only in a
technical discussion group for developers and implementers of the OAI
protocol. The only issue for the research community is how to get the
OAI Archives created and filled, as soon as possible; and I think it
is becoming apparent that institution-based self-archiving is the most
general and natural route to this goal, for the many reasons already
discussed in this thread.

Stevan Harnad


Re: Cliff Lynch on Institutional Archives

2003-03-18 Thread Stevan Harnad
On Tue, 18 Mar 2003, Thomas Krichel wrote:

cg What I'm asking is; has anyone given consideration to ways of smoothing
cg over this duplication of effort? Possibly some negotiated automated process
cg for insitutional archives uploading to the subject archive, or at least
cg assisting the author in the process.

   This is not a pressing concern as much as it appears, because
   discipline-based archives have, arXiv apart, not that much stuff.

Thomas gives exactly the correct answer to Chris!

   It is better,
   within an institution, to proceed department by department and
   listen to what the academics want (and these wants will be
   different in each department), rather than setting up one
   archive that is supposed to satisfy everybody's needs at the
   risk of satisfying nobody's.

Of course. Institutional self-archiving does not imply one single
university archive, but an OAI-interoperable network, parametrized to
suit any special needs of each discipline. (That's certainly how Chris's
eprints.org software is being designed: http://software.eprints.org/ )

   it is best to listen to academics telling you
   what their needs are, rather than setting up procedures around
   a central institutional archive, The latter is what Clifford Lynch wants.
   I don't think that it will work.

What is needed is institutional self-archiving, distributed across its
departments interoperably, but customized to the different needs of the
different disciplines.

cg Ultimately, of course, I'd hope that disciplinary archives will be
cg replaced [by] subject-specific OAI service providers harvesting
cg from the institutional archives.

   I would put this in different way, I'd say that there should be more
   interoperability between institutional archives and disciplinary
   aggregators. Such aggregators don't have a prime function of
   archiving contents but to put the archival contents into
   relations with personal and institutional data and
   document-to-document metadata such as citations. Rather
   than marking up the documents content in the institutional
   archive with subject classification data, it should be marked
   up with aggregator data... n the longer run, we need an extension
   to the OAI protocol to support this on a larger scale.

No problem. This is certainly something the OAI developers can address.
But it has nothing to do with what Chris was worrying about (dupicate
self-archiving in disciplinary and institutional archives); and it seems
to agree about the primacy of institution-based archiving (but
distributed across, and adapted to, the institution's departments and
disciplines).

   Faculty should be given the choice [between disciplinary and
   institutional self-archiving]. They should not be required
   to do either one. arXiv have been doing a tremendous job at
   archiving. You are not going to replace them. But arXiv really
   only covers a small set of disciplines well.

This seems to contradict what was said before! It would be impossible
to implement an effective, systematic institutional self-archiving
policy if it were optional whether researchers self-archive in their
institutional archive or in a central disciplinary archive (even though
OAI-interoperability makes the two alternatives completely equivalent
from an open-access point of view). Let me count the ways:

(1) Institutions can mandate self-archiving, disciplines cannot.

(2) Most disciplines do not have disciplinary OAI Archives at all.

(3) All institutions have (just about) all disciplines.

(4) There are many other potential uses for institutional research
archives (apart from open access).

(5) OAI-interoperability guarantees that institutional and disciplinary
self-archiving are equivalent from the open-access point of view, but
aggregating institutional packages out of distributed disciplinary
OAI archives is harder (though it is not clear how much harder) than
aggregating disciplinary packages out of distributed institutional
OAI archives.

(6) But it is not the equivalence or ease of aggregation that is relevant
at this point (with most archives non-existent or near-empty) but what
is the most promising and natural way to reach universal open access.
(Return to (1) above.)

http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm
http://paracite.eprints.org/cgi-bin/rae_front.cgi

Stevan Harnad


Re: Cliff Lynch on Institutional Archives

2003-03-18 Thread Stevan Harnad
On Tue, 18 Mar 2003, Philip Hunter wrote:

sh Right now, most OAI
sh Archives, whether institutional or disciplinary, are either (1)
sh non-existent, or (2) near-empty! The transition we are striving for is
sh from empty to full archives (and let us hope it will not be too long!),
sh not from disciplinary to institutional archives!

 I have to agree with this assessment. It is rather puzzling that take-up of
 the technology is (relatively) so low, and the archives are so empty

It's not quite *that* slow!
http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm

 unless
 organisational issues (such as, for example, the absence of an intelligible
 publishing process for the multiple submission of eprints to archives) are
 more important in the field than is being recognised in this discussion.

I couldn't follow this!

What is an intelligible publishing process?
Eprints = pre-refereeing preprints + published, refereed postprints.
http://www.eprints.org/self-faq/#What-is-Eprint
Self-archiving is not self-publishing; it is merely a means of providing
open access to one's own preprints and postprints.

And what is multiple submission of eprints to archives?
Eprints are not submitted to archives. (They are merely deposited in
archives.) Preprints are submitted to *journals* (for peer review),
and if/when accepted, the refereed postprints are published by those
journals. Preprints and postprints are deposited (self-archived) in
Eprint Archives.

The rate of archive-creation and filling is increasing, but it needs to
be accelerated substantially, and as soon as possible. Systematic
institutional self-archiving policies will help accomplish this once
institutions realize the direct causal connection between maximizing
research access and maximizing research impact.

http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.ppt
http://www.ecs.soton.ac.uk/~harnad/Temp/unto-others.html
http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm
http://www.ecs.soton.ac.uk/~lac/archpol.html
http://paracite.eprints.org/cgi-bin/rae_front.cgi

Stevan Harnad


Re: Cliff Lynch on Institutional Archives

2003-03-18 Thread Thomas Krichel
  Stevan Harnad writes

 Thomas gives exactly the correct answer to Chris!

  I didn't know this was a quiz :-)

 What is needed is institutional self-archiving, distributed across its
 departments interoperably, but customized to the different needs of the
 different disciplines.

  That is a tall order.

 (1) Institutions can mandate self-archiving, disciplines cannot.

  Cliff imagines that they can, but in practice, it will be tough.
  You can not put a KGB officer in every academic's office!

 (2) Most disciplines do not have disciplinary OAI Archives at all.

  Sure, but all have some ways to communicate informally, and many
  have innovative channels. Sure, many of them stay small, but
  there is not technical obstacle to a meaningful aggergation.

 (4) There are many other potential uses for institutional research
 archives (apart from open access).

  I agree. If I would run an institution's archive I would back
  up all the web sites each year. In 20 years time, you would get
  a fascinating picture of the development of the institution.

 (5) OAI-interoperability guarantees that institutional and disciplinary
 self-archiving are equivalent from the open-access point of view, but
 aggregating institutional packages out of distributed disciplinary
 OAI archives is harder (though it is not clear how much harder) than
 aggregating disciplinary packages out of distributed institutional
 OAI archives.

  no, it is easier to construct feature-rich datasets out of
  disciplinary archives, because some of them will be prepared
  with the specifics of an aggregator in mind.

  With greetings from Minsk, Belarus,


  Thomas Krichel http://openlib.org/home/krichel
 RePEc:per:1965-06-05:thomas_krichel


Re: Cliff Lynch on Institutional Archives

2003-03-18 Thread Stevan Harnad
On Tue, 18 Mar 2003, Thomas Krichel wrote:

sh (1) Institutions can mandate self-archiving, disciplines cannot.

   Cliff [Lynch] imagines that they can, but in practice, it will be tough.
   You can not put a KGB officer in every academic's office!

You're on the wrong track. Self-archiving can and will be mandated by
researchers' instituitions by and for *exactly* the same reasons and
methods as publishing-or-perishing is mandated by institutions. No
KGB, just the simple carrot/stick career consequences of research and
research impact. Once the direct causal connection between access and
impact is shown and known -- e.g.,
http://www.nature.com/nature/debates/e-access/Articles/lawrence.html --
everyone will find it as natural that research institutions should
reward their researchers for maximizing the impact of their publishing
(by self-archiving it) as to maximize the publishing itself.
http://www.eprints.org/self-faq/#institution-facilitate-filling

sh (2) Most disciplines do not have disciplinary OAI Archives at all.

   Sure, but all have some ways to communicate informally, and many
   have innovative channels. Sure, many of them stay small, but
   there is no technical obstacle to a meaningful aggregation.

Here is the point on which Thomas and I part ways (profoundly). I agree
completely that where papers have not yet been self-archived in
OAI-compliant Archives (whether institutional or disciplinary) it is
highly desirable to find, link, metadata-enhance or harvest any
discoverable online papers that already exist on arbitrary websites
webwide. This is the invaluable service Thomas's RePEc (Research
Papers in Economics) is performing for over 86,000 non-OAI papers that
would otherwise be very difficult to find and use http://repec.org/

But the objective of OAI-compliant institutional self-archiving (and
a systematic policy mandating it) is to get away as soon as possible
from having to resort to these makeshift solutions for arbitrary web
content. (Nor is any of this relevant to what I said, which is that most
disciplines do not have disciplinary OAI Archives at all, and disciplines
are in no position to mandate self-archiving, whereas institutions are.)
http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm
http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm
http://paracite.eprints.org/cgi-bin/rae_front.cgi

sh (5) OAI-interoperability guarantees that institutional and disciplinary
sh self-archiving are equivalent from the open-access point of view, but
sh aggregating institutional packages out of distributed disciplinary
sh OAI archives is harder (though it is not clear how much harder) than
sh aggregating disciplinary packages out of distributed institutional
sh OAI archives.

   no, it is easier to construct feature-rich datasets out of
   disciplinary archives, because some of them will be prepared
   with the specifics of an aggregator in mind.

I regret I couldn't follow the logic of this at all. First, there are
almost no disciplinary OAI archives. Second, makeshift measures with
arbitrary web content are exactly that: makeshift, interim measures.
Third, from the fact that some arbitrary content may happen to
have some desirable specific features, nothing whatsoever follows.
And fourth, whatever are the specific features desired, they can be
systematically included (and mandated) in the institutional OAI archives
(parametrized to fit each discipline).

Aggregation is not the objective: Interoperable content is; and (mandated)
institutional OAI self-archiving is the most direct, fastest and surest
way to generate it.

Stevan Harnad


Re: Cliff Lynch on Institutional Archives

2003-03-18 Thread Thomas Krichel
  Stevan Harnad writes

 What is needed, urgently, today, is universal self-archiving, and
 not trivial worries about whether to do it here or there or both:
 OAI-interoperability makes this into a non-issue from the
 self-archiver's point of view, and merely a technical feature to
 sort out, from the OAI-developers' point of view.

  Success here depends on selling the idea to academics, and that
  depends crucially on what business models are followed.

 What Chris has in mind is only one, exceptional, special case,
 namely, the Physics ArXiv, a disciplinary archive (but the *only*
 one) which is, since 1991, well on the road to getting filled in
 certain subareas of physics (200,000+ papers) (although even this
 archive is still a decade from completeness at its present linear
 growth rate: http://arxiv.org/show_monthly_submissions see slide 10 of
 http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm )

  There are other special cases. In fact each of the disciplines
  that have traditionally issued preprints and working papers,
  i.e. computer science, economomics, mathematics and physics
  has its own special case. All have their own business model.
  One size does not fit all.

 No need! First, because the duplification of effort is so minimal

  It will not be, especially when there is a chance to have
  different versions in different archives, this could be
  rather, if not highly, problematic.

 It is such a small issue that it does not belong in a general discussion
 of open access and self-archiving for researchers.

  You constantly belittle techncial problems, and then you wonder
  why the archives are staying empty or do not exist. Answer: because
  these technical problems have not been solved. By belittling
  them, you put yourself in the way of finding a solution.

  With greetings from Minsk, Belarus,


  Thomas Krichel http://openlib.org/home/krichel
 RePEc:per:1965-06-05:thomas_krichel


Re: Cliff Lynch on Institutional Archives

2003-03-18 Thread Stevan Harnad
On Tue, 18 Mar 2003, Thomas Krichel wrote:

sh What is needed, urgently, today, is universal self-archiving, and
sh not trivial worries about whether to do it here or there or both:
sh OAI-interoperability makes this into a non-issue from the
sh self-archiver's point of view, and merely a technical feature to
sh sort out, from the OAI-developers' point of view.

   Success here depends on selling the idea to academics, and that
   depends crucially on what business models are followed.

I have no idea what business models have to do with demonstrating to
academics that increasing research access increases research impact.
http://www.nature.com/nature/debates/e-access/Articles/lawrence.html

   each of the disciplines
   that have traditionally issued preprints and working papers,
   i.e. computer science, economomics, mathematics and physics
   has its own special case. All have their own business model.
   One size does not fit all.

I still can't follow. These are among the disciplines whose researchers
have self-archived -- in the case of physics/maths, mostly in one
disciplinary archive, in the case of computer science and economics,
in arbitrary websites (and some central archives). I don't know what you
mean by a business model. And the only fact that fits them all is that
self-archiving maximizes research impact by maximizing research access.
That is also the only relevant fact here -- other than that OAI-compliant
self-archiving is far more effective and desirable than arbitrary
self-archiving.

  No need! First, because the duplification of effort is so minimal

   It will not be, especially when there is a chance to have
   different versions in different archives, this could be
   rather, if not highly, problematic.

I have no idea how much of a technical problem duplicate self-archiving
would cause (whether of the same paper in different archives, or
different versions of the same paper in the same or different archives).
But my response is: If only that were our only remaining 'problem'
then my work would be done! The real problem is getting the research
community to realize that it needs to self-archive *at all* (never mind
how many versions!), and why, and how. Compared to that fundamental
nullplification-of-effort problem, which is the one we are still facing
currently, any duplication-of-effort or balking-at-duplicating-effort
problem is truly trivial.

  It is such a small issue that it does not belong in a general discussion
  of open access and self-archiving for researchers.

   You constantly belittle techncial problems, and then you wonder
   why the archives are staying empty or do not exist. Answer: because
   these technical problems have not been solved. By belittling
   them, you put yourself in the way of finding a solution.

I belittle trivial problems to put them in context, and to highlight
the sole nontrivial problem. Double-archived papers are a trivial
problem. Non-archived papers are the nontrivial problem. I am *certain*
(not guessing, *certain*) that the reason the archives are not filling
faster is most decidedly *not* because of any aspect of the duplicate paper
problem. Most researchers don't even understand why they should
self-archive *one* version of a paper, let alone being concerned about
having to self-archive more than one. What gets in the way of finding
a solution to the nontrivial problem -- universal self-archiving --
is a pathway littered with trivial problems and nonproblems (of which
duplication is merely the 23rd of at least 26 I've catalogued so far:
http://www.eprints.org/self-faq/#23.Version ).

Stevan Harnad