Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-03 Thread Isidro F. Aguillo
Dear Stuart,

I do not know if you understand the ultimate purpose of Open access  
initiatives in general and the institutional repositories in  
particular. But I think you are mising the central point that the  
end-users should guide the design of the repository according to their  
real needs.

Well, in OA the end users are the authors of the papers, their  
institutions that fund the research and host the papers and the  
librarians who manage the repository. In this scenario the software  
developers task is to fulfill in the most professional way the needs  
of the authors.

Regarding authors needs, the W3C organization and its 'cool' proposals  
is arbitrary basically because they do not know how scholarly  
communication works, and the aims and methods of OA. They are not  
stakeholders for us.

In any case, I will prefer and thank from you comments on the specific  
proposals and not a general, ambigous and unsupported global criticism.

Best regards,


Stuart Yeates stuart.yea...@vuw.ac.nz escribió:

 I'm not sure that knee-jerk reaction to an arbitrary list of bad  
 practice is a good place to start and seems like a really bad driver  
 for software development.

 Maybe we should be talking to our fellow implementers and building  
 on the work of http://www.w3.org/Provider/Style/URI.html,  
 http://www.w3.org/TR/cooluris/,  
 http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to  
 build a compilation of _best_ practice.

 Cheers
 stuart

 -Original Message-
 From: Tim Donohue [mailto:tdono...@duraspace.org]
 Sent: Wednesday, 3 September 2014 8:49 a.m.
 To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net
 Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net
 Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of Repositories

 Hello Isidro,

 DuraSpace (the stewarding organization behind DSpace and Fedora  
 repository software) was planning to send you a compiled list of the  
 concerns with your proposal. As you can tell from the previous email  
 thread, many of the users of DSpace have similar concerns. Rather  
 than bombard you with all of them individually (which you could see  
 from browsing the thread), we hoped to draft up a response  
 summarizing the concerns of the DSpace community.

 Below you'll find an initial draft of the summarized concerns. The  
 rule numbering below is based on the numbering at:
 http://repositories.webometrics.info/en/node/26

 --- Concerns with the Proposal from Ranking Web of Repositories

 * Rule #2 (IRs that don't use the institutional domain will be  
 excluded) would cause the exclusion of some IRs which are hosted by  
 DSpace service providers. As an example, some DSpaceDirect.org users  
 have URLs https://[something].dspacedirect.org which would cause  
 their exclusion as it is a non-institutional domain. Many other  
 DSpace hosting providers have similar non-institutional domain URLs  
 by default.

 * Rule #4 (Repositories using ports other than 80 or 8080) would  
 wrongly exclude all DSpace sites which use HTTPS (port 443). Many  
 institutions choose to run DSpace via HTTPS instead of HTTP.

 * Rule #5 (IRs that use the name of the software in the hostname  
 would be excluded) may also affect IRs which are hosted by service  
 providers (like DSpaceDirect). Again, some DSpaceDirect customers  
 have URLs which use *.dspacedirect.org (includes dspace). This  
 rule would also exclude MIT's IR which is the original DSpace (and  
 has used the same URL for the last 10+ years): http://dspace.mit.edu/

 * Rule #6 (IRs that use more than 4 directory levels for the URL  
 address of the full texts will be excluded.) may accidentally  
 exclude a large number of DSpace sites. The common download URLs for  
 full text in DSpace are both are at least 4 directory levels deep:

 - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename]
 - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename]

 NOTE: prefix and id are parts of an Item's Handle  
 (http://hdl.handle.net/), which is the persistent identifier  
 assigned to the item via the Handle System. So, this is how a  
 persistent URL like
 http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace.

 * Rule #7 (IRs that use more than 3 different numeric (or useless)  
 codes in their URLs will be excluded.). It is unclear how they would  
 determine this, and what the effect may be on DSpace sites  
 worldwide. Again, looking at the common DSpace URL paths above, if a  
 file had a numeric
 name, it may be excluded as DSpace URLs already include 2-3 numeric  
 codes by default ([prefix],[id], and [sequence] are all numeric).

 * Rule #8 (IRs with more than 50% of the records not linking to OA  
 full text versions..). Again, unclear how they would determine this,  
 and whether the way they are doing so would accidentally exclude  
 some major DSpace sites. For example, there are major DSpace sites  
 which

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-03 Thread Isidro F. Aguillo
Dear Kim,

Thanks for your message. I answer to your specific comments


Kim Shepherd kim.sheph...@gmail.com escribió:

 Hi Isidro and lists,

 Regarding point 6 -- I see what you're saying, but it shouldn't really be
 up to the DSpace community repositories (who all use the handle prefix /
 identifier system, as I'm sure you know!) to argue why 1234/123 is better
 than thesis/phsyics/something, because we're not the ones proposing that
 URI segments be part of any metric used to judge the world ranking of a
 repository. It's also not as simple as you might think, particularly when
 ensuring unique URIs and persistent URIs, etc.
 I think you're saying that URIs should either look nice or be
 meaningful, or both, but I'm not sure we should rely on URIs to be too
 meaningful, especially when we have ways of including that with semantic
 markup in references, structured data in our METS/ORE feeds via OAI, etc.

This is the basic misunderstanding. The repository end user is an  
author that whishes to increase the global visibility and usage of  
his/her deposited papers. Looking nice is relevant because the main  
tool of the author for obtaining visibility and visits is to cite them  
in his/her future papers or to mention it for example in Wikipedia or  
Twitter.

Looking nice is adding informative value (authors name, publication  
year, topic) that can be relevant for the reader helping to decide if  
following the link. Looking nice also works as quality control, a  
mistake in lastname is easier to notice that in a series of numbers.

But far more importat: By far the largest number of visitors came from  
Google or other engines. If you are not searching for specific title,  
then the semantic content of the URL is increasing considerably your  
positioning in Google.

Of course, you can ignore that, but there is already many people who  
prefers to deposit in ResearchGate or Academia as the visibility of  
their works is better.


 Regarding point 5 -- I don't see that this matters either. No end user
 cares what the IR is actually called, surely? Whatever arguments you can
 make for our IRs having bad names, punishing us for preserving the
 permanence of those names and URIs we've already minted seems a bit unfair?
 The first IR I thought of when reading this was, of course,
 http://dspace.mit.edu.
 I think point 5 actually punishes EPrints repositories most unfairly, since
 eprints is an accepted name for digital manuscripts as well as the
 platform used -- I think I've even seen IRs called eprints.something.etc
 running platforms other than EPrints.

You are answering the question. Imagine that in the future I decide to  
use eprints instead of dspace or whatever other better that can be  
developed in the future. Then, are going to change the domain or not?.

On the other side, why branding an intellectual result with the name  
of the tool?. I write my papers with MS Word and never made any  
significant mention of this fact in any of the papers.

What is the problem with?

http://repository.university.edu/


 Numbers 6 and 7, I think I agree with Mark, but don't really have anything
 to add. I don't really understand why this would even be considered as a
 metric, let alone grounds for exclusion. What are some examples of cases
 where long URIs (or, eg. directories as fulltext hosted in IRs with their
 own dir structures, which happens) or URIs which happen to contain numbers
 result in end users or machines not being able to properly locate/use
 hosted resources?

Metrics? Who is talking about metrics? I only said Keep It Simple.

 Number 8 is probably the thing that will punish my own institution most,
 which is a pity because we have a large absolute amount of fulltext, but
 for various reasons, a lot of record only items as well. This is probably a
 philisophical argument about defining an OA repository I guess?

We can discuss about the threshold, I think 50% is reasonably but it  
could be used only for contents after embargo ends (usually 6 or 12  
months).

 I hope my criticisms here don't seem too harsh - thanks for taking the time
 to listen to feedback.

 On a lighter note, I'm sort of pleased these proposals have been doing the
 rounds, as I think it might just be the thing that convinces my own
 institution to take world repository ranking off our KPIs, and
 concentrate more on qualitative value of the repository we host.

Thanks for your cooperation ...

 Cheers

 Kim (a DSpace dev/admin, and already biased against quantitative metrics in
 IRs so not exactly an objective commenter ;))


You are a perfect objective commenter as you identify as dev/admin but  
the repository was not intended to serve you. The 'customers' are your  
authors and your institution and THEY are strongly biased to support  
metrics. Simply, ask them.



 On 3 September 2014 07:56, Isidro F. Aguillo isidro.agui...@cchs.csic.es
 wrote:

 Dear colleagues,

 As editor of the Ranking Web of Repositories I 

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-03 Thread Isidro F. Aguillo
Dear Tim,

Tim Donohue tdono...@duraspace.org escribió:

 Hello Isidro,

 DuraSpace (the stewarding organization behind DSpace and Fedora
 repository software) was planning to send you a compiled list of the
 concerns with your proposal. As you can tell from the previous email
 thread, many of the users of DSpace have similar concerns. Rather than
 bombard you with all of them individually (which you could see from
 browsing the thread), we hoped to draft up a response summarizing the
 concerns of the DSpace community.

Thanks a lot. That is far beyond my better expectations.


 Below you'll find an initial draft of the summarized concerns. The rule
 numbering below is based on the numbering at:
 http://repositories.webometrics.info/en/node/26

 --- Concerns with the Proposal from Ranking Web of Repositories

 * Rule #2 (IRs that don't use the institutional domain will be excluded)
 would cause the exclusion of some IRs which are hosted by DSpace service
 providers. As an example, some DSpaceDirect.org users have URLs
 https://[something].dspacedirect.org which would cause their exclusion
 as it is a non-institutional domain. Many other DSpace hosting providers
 have similar non-institutional domain URLs by default.


Major issue. Repository is not another bibliographic database, it is  
the archive of the academic output of the institution. And as such it  
should be iron brand.

If .. the institution is very small or in a country with limited  
resources the hosting option is perfectly valid and we will do an  
exception. But in these cases, Is not a redirection possible?

If .. the problem is related to governance we should not reward the  
institution bad practice.


 * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly
 exclude all DSpace sites which use HTTPS (port 443). Many institutions
 choose to run DSpace via HTTPS instead of HTTP.

No problem adding a few more ports to the short list


 * Rule #5 (IRs that use the name of the software in the hostname would
 be excluded) may also affect IRs which are hosted by service providers
 (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which
 use *.dspacedirect.org (includes dspace). This rule would also exclude
 MIT's IR which is the original DSpace (and has used the same URL for
 the last 10+ years): http://dspace.mit.edu/

This is easy. Imagine that the people at MIT decide next week they  
prefer eprints or other new software. A repository is a repository,  
why not a final one like?:

repository.mit.edu

On the other side, I wrote all my papers using MS Word and never cited  
that fact in any of them, less of all in the authorship



 * Rule #6 (IRs that use more than 4 directory levels for the URL address
 of the full texts will be excluded.) may accidentally exclude a large
 number of DSpace sites. The common download URLs for full text in DSpace
 are both are at least 4 directory levels deep:

 - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename]
 - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename]

 NOTE: prefix and id are parts of an Item's Handle
 (http://hdl.handle.net/), which is the persistent identifier assigned to
 the item via the Handle System. So, this is how a persistent URL like
 http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace.


I understand the technical part, but this for the needs of the  
sysadmin of the system (=NE person). Now, please take into account the  
needs of 10,000 internal end-users authors. Why not redirect  
(aliasing?)?



 * Rule #7 (IRs that use more than 3 different numeric (or useless) codes
 in their URLs will be excluded.). It is unclear how they would determine
 this, and what the effect may be on DSpace sites worldwide. Again,
 looking at the common DSpace URL paths above, if a file had a numeric
 name, it may be excluded as DSpace URLs already include 2-3 numeric
 codes by default ([prefix],[id], and [sequence] are all numeric).

I have a personal example. 20 years ago my email admin decided my  
email account should be 'dctfa11', abusing from your notation  
([prefix],[id], and [sequence]). After several years it was possible  
to change to 'isidro.aguillo'.

I going to use the URL of my papers to cite them, to marketing them,  
to copy in my CV, . Please, help me translating /56/89/567894 into  
aguillo2014b


 * Rule #8 (IRs with more than 50% of the records not linking to OA full
 text versions..). Again, unclear how they would determine this, and
 whether the way they are doing so would accidentally exclude some major
 DSpace sites. For example, there are major DSpace sites which include a
 larger number of Theses/Dissertations. These Theses/Dissertations may
 not be 100% Open Access to the world, but may be fully accessible
 everyone on campus.

50%!!!
A 'place' with less than 50% of the full texts unavailable is NOT an  
Open Access Repository.


 Another, perhaps more serious concern, is on the 

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-03 Thread Isidro F. Aguillo
 development.
 
  Maybe we should be talking to our fellow implementers and building
  on the work of http://www.w3.org/Provider/Style/URI.html,
  http://www.w3.org/TR/cooluris/,
  http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to
  build a compilation of _best_ practice.
 
  Cheers
  stuart
 
  -Original Message-
  From: Tim Donohue [mailto:tdono...@duraspace.org]
  Sent: Wednesday, 3 September 2014 8:49 a.m.
  To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net
  Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net
  Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of
 Repositories
 
  Hello Isidro,
 
  DuraSpace (the stewarding organization behind DSpace and Fedora
  repository software) was planning to send you a compiled list of the
  concerns with your proposal. As you can tell from the previous email
  thread, many of the users of DSpace have similar concerns. Rather
  than bombard you with all of them individually (which you could see
  from browsing the thread), we hoped to draft up a response
  summarizing the concerns of the DSpace community.
 
  Below you'll find an initial draft of the summarized concerns. The
  rule numbering below is based on the numbering at:
  http://repositories.webometrics.info/en/node/26
 
  --- Concerns with the Proposal from Ranking Web of Repositories
 
  * Rule #2 (IRs that don't use the institutional domain will be
  excluded) would cause the exclusion of some IRs which are hosted by
  DSpace service providers. As an example, some DSpaceDirect.org users
  have URLs https://[something].dspacedirect.org which would cause
  their exclusion as it is a non-institutional domain. Many other
  DSpace hosting providers have similar non-institutional domain URLs
  by default.
 
  * Rule #4 (Repositories using ports other than 80 or 8080) would
  wrongly exclude all DSpace sites which use HTTPS (port 443). Many
  institutions choose to run DSpace via HTTPS instead of HTTP.
 
  * Rule #5 (IRs that use the name of the software in the hostname
  would be excluded) may also affect IRs which are hosted by service
  providers (like DSpaceDirect). Again, some DSpaceDirect customers
  have URLs which use *.dspacedirect.org (includes dspace). This
  rule would also exclude MIT's IR which is the original DSpace (and
  has used the same URL for the last 10+ years): http://dspace.mit.edu/
 
  * Rule #6 (IRs that use more than 4 directory levels for the URL
  address of the full texts will be excluded.) may accidentally
  exclude a large number of DSpace sites. The common download URLs for
  full text in DSpace are both are at least 4 directory levels deep:
 
  - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename]
  - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename]
 
  NOTE: prefix and id are parts of an Item's Handle
  (http://hdl.handle.net/), which is the persistent identifier
  assigned to the item via the Handle System. So, this is how a
  persistent URL like
  http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace.
 
  * Rule #7 (IRs that use more than 3 different numeric (or useless)
  codes in their URLs will be excluded.). It is unclear how they would
  determine this, and what the effect may be on DSpace sites
  worldwide. Again, looking at the common DSpace URL paths above, if a
  file had a numeric
  name, it may be excluded as DSpace URLs already include 2-3 numeric
  codes by default ([prefix],[id], and [sequence] are all numeric).
 
  * Rule #8 (IRs with more than 50% of the records not linking to OA
  full text versions..). Again, unclear how they would determine this,
  and whether the way they are doing so would accidentally exclude
  some major DSpace sites. For example, there are major DSpace sites
  which include a larger number of Theses/Dissertations. These
  Theses/Dissertations may not be 100% Open Access to the world, but
  may be fully accessible everyone on campus.
 
  ---
 
  Another, perhaps more serious concern, is on the timeline you propose.
  You suggest a timeline of January 2015 when these newly proposed
  rules would be in place. Yet, if these rules were to go in place,
  some rules may require changes to the DSpace software itself (as I
  laid out above, some rules may not mesh well with DSpace software as
  it is, unless I'm misunderstanding the rule itself).
 
  Unfortunately, based on our DSpace open source release timelines, we
  have ONE new release (DSpace 5.0) planned between now and January
  2015.
  Even if we were able to implement some of these recommended changes
  at a software level, the vast majority (likely 80-90%) of DSpace
  instances would likely NOT be able to upgrade to the latest DSpace
  version before your January deadline (as the 5.0 release is
  scheduled for Nov/Dec).
  Therefore, as is, your January 2015 ranking may accidentally exclude
  a large number of DSpace sites from your rankings, and DSpace is
  still the most widely used

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-03 Thread Germán Biozzoli
I think that


 * Rule #7 (IRs that use more than 3 different numeric (or useless) codes
 in their URLs will be excluded.). It is unclear how they would determine
 this, and what the effect may be on DSpace sites worldwide. Again,
 looking at the common DSpace URL paths above, if a file had a numeric
 name, it may be excluded as DSpace URLs already include 2-3 numeric
 codes by default ([prefix],[id], and [sequence] are all numeric).

I have a personal example. 20 years ago my email admin decided my
email account should be 'dctfa11', abusing from your notation
([prefix],[id], and [sequence]). After several years it was possible
to change to 'isidro.aguillo'.

I going to use the URL of my papers to cite them, to marketing them,
to copy in my CV, . Please, help me translating /56/89/567894 into
aguillo2014b

---
has no posible conceptual discussion:

http://www.w3.org/2013/dwbp/wiki/URI_Design_and_Management_for_Persistence

And of course, correct URIs have effects over SEO, that is an inherent
responsability to IRs platforms. As a DSpace implementor I understand that
it could have no inmediate solution, but to me it's undoubted the correct
requirement for future DSpace versions.

Regards
German



2014-09-03 19:53 GMT-03:00 Isidro F. Aguillo isidro.agui...@cchs.csic.es:

 Dear Tim,

 Tim Donohue tdono...@duraspace.org escribió:

  Hello Isidro,
 
  DuraSpace (the stewarding organization behind DSpace and Fedora
  repository software) was planning to send you a compiled list of the
  concerns with your proposal. As you can tell from the previous email
  thread, many of the users of DSpace have similar concerns. Rather than
  bombard you with all of them individually (which you could see from
  browsing the thread), we hoped to draft up a response summarizing the
  concerns of the DSpace community.

 Thanks a lot. That is far beyond my better expectations.


  Below you'll find an initial draft of the summarized concerns. The rule
  numbering below is based on the numbering at:
  http://repositories.webometrics.info/en/node/26
 
  --- Concerns with the Proposal from Ranking Web of Repositories
 
  * Rule #2 (IRs that don't use the institutional domain will be excluded)
  would cause the exclusion of some IRs which are hosted by DSpace service
  providers. As an example, some DSpaceDirect.org users have URLs
  https://[something].dspacedirect.org which would cause their exclusion
  as it is a non-institutional domain. Many other DSpace hosting providers
  have similar non-institutional domain URLs by default.


 Major issue. Repository is not another bibliographic database, it is
 the archive of the academic output of the institution. And as such it
 should be iron brand.

 If .. the institution is very small or in a country with limited
 resources the hosting option is perfectly valid and we will do an
 exception. But in these cases, Is not a redirection possible?

 If .. the problem is related to governance we should not reward the
 institution bad practice.


  * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly
  exclude all DSpace sites which use HTTPS (port 443). Many institutions
  choose to run DSpace via HTTPS instead of HTTP.

 No problem adding a few more ports to the short list


  * Rule #5 (IRs that use the name of the software in the hostname would
  be excluded) may also affect IRs which are hosted by service providers
  (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which
  use *.dspacedirect.org (includes dspace). This rule would also exclude
  MIT's IR which is the original DSpace (and has used the same URL for
  the last 10+ years): http://dspace.mit.edu/

 This is easy. Imagine that the people at MIT decide next week they
 prefer eprints or other new software. A repository is a repository,
 why not a final one like?:

 repository.mit.edu

 On the other side, I wrote all my papers using MS Word and never cited
 that fact in any of them, less of all in the authorship



  * Rule #6 (IRs that use more than 4 directory levels for the URL address
  of the full texts will be excluded.) may accidentally exclude a large
  number of DSpace sites. The common download URLs for full text in DSpace
  are both are at least 4 directory levels deep:
 
  - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename]
  - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename]
 
  NOTE: prefix and id are parts of an Item's Handle
  (http://hdl.handle.net/), which is the persistent identifier assigned to
  the item via the Handle System. So, this is how a persistent URL like
  http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace.


 I understand the technical part, but this for the needs of the
 sysadmin of the system (=NE person). Now, please take into account the
 needs of 10,000 internal end-users authors. Why not redirect
 (aliasing?)?



  * Rule #7 (IRs that use more than 3 different numeric (or 

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-03 Thread Isidro F. Aguillo
Dear colleaugue,

User is user, not any casual reader. A non-casual reader of an  
academic paper is usually another scientist (a Higgs boson text is not  
for everybody), and any professional (including junior ones like PhD  
students) scientist is or is going to be an author.

Best regards,


sharad sharad7...@gmail.com escribió:

 Hi,

 Most of us will do agree with the fact that authors themselves (or
 funder of research/repository) are not the sole end users of a
 repository. In contrary major end users are non-authors or people who
 have not contributed to a repository and are just the users of the
 repository.

 If what ever changes that are proposed are for the better user
 experience of end-user, let us not assume that the end users are only
 authors.

 Best Regards,
 Sharad


 On Thu, Sep 4, 2014 at 2:55 AM, Anton Angelo an...@mojo.org wrote:
 Hi Isidro,

 As a librarian/technologist managing a institutional repository I have to
 disagree with you on the definition of our end users.   There are two ways
 to look at this, and neither end up with the authors as end users.

 An idealistic approach would see the final readers of the items as the end
 users.  They are the ones, defined by various OA declarations, the ones for
 whom we are doing this.

 A pragmatic approach would say the funders of the research are the end
 users, as they are demanding the output of their funding to be made OA.

 The latter group are the ones your service is most useful for, in
 determining the performance of their outputs - the more visible, the better
 vehicle for publication.

 I am beginning to think that rankings are not a very useful manner in which
 to compare IRs, but a list of platform agnostic best practice standards
 (like the orange book for security, back in the day) is the way forward.
 Though I have extensively used the service in my research on IR
 effectiveness, that was mostly because the repository I manage has a high
 ranking, and it was useful to promote it internally.  This kind of behaviour
 usually ends up in 'gaming', and is counterproductive - exactly what OA is
 trying to get away from  (h-index, impact factor, etc).

 IRs are really about getting the right output to the right person - even one
 download can be a total success.  I think in the future altmetric tools are
 probably going to be more use than a ranking service, as useful as it has
 been in the past - provided they report on the work in OA being done in the
 global south.

 aa






 On 4 September 2014 09:12, Isidro F. Aguillo isidro.agui...@cchs.csic.es
 wrote:

 Dear Stuart,

 I do not know if you understand the ultimate purpose of Open access
 initiatives in general and the institutional repositories in
 particular. But I think you are mising the central point that the
 end-users should guide the design of the repository according to their
 real needs.

 Well, in OA the end users are the authors of the papers, their
 institutions that fund the research and host the papers and the
 librarians who manage the repository. In this scenario the software
 developers task is to fulfill in the most professional way the needs
 of the authors.

 Regarding authors needs, the W3C organization and its 'cool' proposals
 is arbitrary basically because they do not know how scholarly
 communication works, and the aims and methods of OA. They are not
 stakeholders for us.

 In any case, I will prefer and thank from you comments on the specific
 proposals and not a general, ambigous and unsupported global criticism.

 Best regards,


 Stuart Yeates stuart.yea...@vuw.ac.nz escribió:

  I'm not sure that knee-jerk reaction to an arbitrary list of bad
  practice is a good place to start and seems like a really bad driver
  for software development.
 
  Maybe we should be talking to our fellow implementers and building
  on the work of http://www.w3.org/Provider/Style/URI.html,
  http://www.w3.org/TR/cooluris/,
  http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to
  build a compilation of _best_ practice.
 
  Cheers
  stuart
 
  -Original Message-
  From: Tim Donohue [mailto:tdono...@duraspace.org]
  Sent: Wednesday, 3 September 2014 8:49 a.m.
  To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net
  Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net
  Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of
  Repositories
 
  Hello Isidro,
 
  DuraSpace (the stewarding organization behind DSpace and Fedora
  repository software) was planning to send you a compiled list of the
  concerns with your proposal. As you can tell from the previous email
  thread, many of the users of DSpace have similar concerns. Rather
  than bombard you with all of them individually (which you could see
  from browsing the thread), we hoped to draft up a response
  summarizing the concerns of the DSpace community.
 
  Below you'll find an initial draft of the summarized concerns. The
  rule numbering below is based

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-03 Thread Isidro F. Aguillo
Dear German,

Thanks for your support. I understand this can be difficult to  
implement or that needs time to develop. No problem for applying the  
proposals later or not applying at all.

Best regards,

Germán Biozzoli germanbiozz...@gmail.com escribió:

 I think that

 
 * Rule #7 (IRs that use more than 3 different numeric (or useless) codes
 in their URLs will be excluded.). It is unclear how they would determine
 this, and what the effect may be on DSpace sites worldwide. Again,
 looking at the common DSpace URL paths above, if a file had a numeric
 name, it may be excluded as DSpace URLs already include 2-3 numeric
 codes by default ([prefix],[id], and [sequence] are all numeric).

 I have a personal example. 20 years ago my email admin decided my
 email account should be 'dctfa11', abusing from your notation
 ([prefix],[id], and [sequence]). After several years it was possible
 to change to 'isidro.aguillo'.

 I going to use the URL of my papers to cite them, to marketing them,
 to copy in my CV, . Please, help me translating /56/89/567894 into
 aguillo2014b

 ---
 has no posible conceptual discussion:

 http://www.w3.org/2013/dwbp/wiki/URI_Design_and_Management_for_Persistence

 And of course, correct URIs have effects over SEO, that is an inherent
 responsability to IRs platforms. As a DSpace implementor I understand that
 it could have no inmediate solution, but to me it's undoubted the correct
 requirement for future DSpace versions.

 Regards
 German



 2014-09-03 19:53 GMT-03:00 Isidro F. Aguillo isidro.agui...@cchs.csic.es:

 Dear Tim,

 Tim Donohue tdono...@duraspace.org escribió:

  Hello Isidro,
 
  DuraSpace (the stewarding organization behind DSpace and Fedora
  repository software) was planning to send you a compiled list of the
  concerns with your proposal. As you can tell from the previous email
  thread, many of the users of DSpace have similar concerns. Rather than
  bombard you with all of them individually (which you could see from
  browsing the thread), we hoped to draft up a response summarizing the
  concerns of the DSpace community.

 Thanks a lot. That is far beyond my better expectations.


  Below you'll find an initial draft of the summarized concerns. The rule
  numbering below is based on the numbering at:
  http://repositories.webometrics.info/en/node/26
 
  --- Concerns with the Proposal from Ranking Web of Repositories
 
  * Rule #2 (IRs that don't use the institutional domain will be excluded)
  would cause the exclusion of some IRs which are hosted by DSpace service
  providers. As an example, some DSpaceDirect.org users have URLs
  https://[something].dspacedirect.org which would cause their exclusion
  as it is a non-institutional domain. Many other DSpace hosting providers
  have similar non-institutional domain URLs by default.


 Major issue. Repository is not another bibliographic database, it is
 the archive of the academic output of the institution. And as such it
 should be iron brand.

 If .. the institution is very small or in a country with limited
 resources the hosting option is perfectly valid and we will do an
 exception. But in these cases, Is not a redirection possible?

 If .. the problem is related to governance we should not reward the
 institution bad practice.


  * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly
  exclude all DSpace sites which use HTTPS (port 443). Many institutions
  choose to run DSpace via HTTPS instead of HTTP.

 No problem adding a few more ports to the short list


  * Rule #5 (IRs that use the name of the software in the hostname would
  be excluded) may also affect IRs which are hosted by service providers
  (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which
  use *.dspacedirect.org (includes dspace). This rule would also exclude
  MIT's IR which is the original DSpace (and has used the same URL for
  the last 10+ years): http://dspace.mit.edu/

 This is easy. Imagine that the people at MIT decide next week they
 prefer eprints or other new software. A repository is a repository,
 why not a final one like?:

 repository.mit.edu

 On the other side, I wrote all my papers using MS Word and never cited
 that fact in any of them, less of all in the authorship



  * Rule #6 (IRs that use more than 4 directory levels for the URL address
  of the full texts will be excluded.) may accidentally exclude a large
  number of DSpace sites. The common download URLs for full text in DSpace
  are both are at least 4 directory levels deep:
 
  - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename]
  - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename]
 
  NOTE: prefix and id are parts of an Item's Handle
  (http://hdl.handle.net/), which is the persistent identifier assigned to
  the item via the Handle System. So, this is how a persistent URL like
  http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace.


 

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-02 Thread Hilton Gibson
Hi All

As meat for further constructive debate, I would like to submit our
rationalisations for the selection of our URL.
http://wiki.lib.sun.ac.za/index.php/SUNScholar/Guidelines/Step_2 (Step 2 -
Marketing Friendly (Vanity URL), Persistent URL and Preservable Digital
Objects)

Regards

Hilton

*Hilton Gibson*
Ubuntu Linux Systems Administrator
JS Gericke Library
Room 1025C
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758


On 2 September 2014 21:56, Isidro F. Aguillo isidro.agui...@cchs.csic.es
wrote:

 Dear colleagues,

 As editor of the Ranking Web of Repositories I published the referred
 info in order to open debate about issues that are, in my humble
 opinion, very concerning for the future of repositories. As my email
 address
 is clearly stated in the webpage, I do not understand why you decided
 not consider my position and explanations in this debate.

 I am going to answer the specific points introduced by Mark Wood and,
 of course, I am open not only to further discussions but to modify my
 proposals accordingly.


  From: Mark H. Wood mw...@iupui.edu
  Date: 2 September 2014 16:28
  Subject: Re: [Dspace-tech] IMPORTANT NEWS: Important Info for Future
  Editions | Ranking Web of Repositories
  To: dspace-tech@lists.sourceforge.net, General List 
  dspace-gene...@lists.sourceforge.net
 
 
  Points 4, 6 and  7 reveal a profound lack of understanding of
  hypertext and fundamental security issues, and I would not be
  surprised to learn that they ignore typical user behavior as well.
  Does anyone but a sysadmin. or developer really type in direct URLs to
  repository content?  Citations please.


 Point 4. In many academic institutions the access to ports other than
 standards is forbidden due to security reasons. If you use other ones, the
 contents are invisible to the people accesing from other universities.

 Point 6 y 7. Explain me why .../handle/556/78/6789 is better than
 .../thesis/physics/Wood2013b and why aliasing is not possible.

 Probably authors will cite the URL of deposited files in their
 published papers, but with this awful, lengthy, useless addresses they
 probably prefer not to do.

 One of the main reasons for depositing papers is to increase their
 visibility, but this is only possible if other authors can locate
 easily them. Tipically, for example, in Google. Do you know the
 advantages
 of URL semantic content for improving position in Google? There are
 thousands of papers about academic SEO. For example, there are ones
 stating the advantages of using library instead of lib in webnames.


  I would argue that we can better do without appearing in the Ranking
  Web of Repositories, whatever that is, than to give up the ability
  to protect our users' credentials.  (Point 4, which disallows HTTPS)

 Are you mixing public and private sections? You can protect your users
 without
 destroying visibility.

  Point 5 is just bizarre.  Why does someone think this is a problem?
  Not that I think it particularly useful to use the name of supporting
  software in naming a repository service, but how can it possibly hurt?

 The repository is the probably the most important part of the
 intellectual treasure of the university and their authors, You are
 simply proposing to brand the continent instead of the content.

  Are there any actual statistics to support the belief that long URLs
  in the interior of a service actually affect anyone's behavior?

 Interior is irrelevant, the contents of the repository are for the
 end-users that are not sysadmin but the institution authors and authors
 and readers from the rest of the world. We are talking of Open
 Access and in my opinion the referred issues are barriers to the open.


  It sounds like there should be some discussion among the various
  parties.  Where?


 As mentioned before here I am for further comments. Thanks for your
 cooperation.


  --
  Mark H. Wood
  Lead Technology Analyst
 
  University Library
  Indiana University - Purdue University Indianapolis
  755 W. Michigan Street
  Indianapolis, IN 46202
  317-274-0749
  www.ulib.iupui.edu
 
 
 --
  Slashdot TV.
  Video for Nerds.  Stuff that matters.
  http://tv.slashdot.org/
  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
  List Etiquette:
  https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


 --
 Isidro F. Aguillo, HonPhD
 Cybermetrics Lab (3C1). CCHS - CSIC
 Albasanz, 26-28. 28037 Madrid. Spain

 isidro.aguillo @ cchs.csic.es
 www. webometrics.info

 - Terminar mensaje reenviado -

 --
 Isidro F. Aguillo, HonPhD
 Cybermetrics Lab (3C1). CCHS - CSIC
 Albasanz, 26-28. 28037 Madrid. Spain

 isidro.aguillo @ cchs.csic.es
 www. webometrics.info



 

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-02 Thread Kim Shepherd
Hi Isidro and lists,

Regarding point 6 -- I see what you're saying, but it shouldn't really be
up to the DSpace community repositories (who all use the handle prefix /
identifier system, as I'm sure you know!) to argue why 1234/123 is better
than thesis/phsyics/something, because we're not the ones proposing that
URI segments be part of any metric used to judge the world ranking of a
repository. It's also not as simple as you might think, particularly when
ensuring unique URIs and persistent URIs, etc.
I think you're saying that URIs should either look nice or be
meaningful, or both, but I'm not sure we should rely on URIs to be too
meaningful, especially when we have ways of including that with semantic
markup in references, structured data in our METS/ORE feeds via OAI, etc.

Regarding point 5 -- I don't see that this matters either. No end user
cares what the IR is actually called, surely? Whatever arguments you can
make for our IRs having bad names, punishing us for preserving the
permanence of those names and URIs we've already minted seems a bit unfair?
The first IR I thought of when reading this was, of course,
http://dspace.mit.edu.
I think point 5 actually punishes EPrints repositories most unfairly, since
eprints is an accepted name for digital manuscripts as well as the
platform used -- I think I've even seen IRs called eprints.something.etc
running platforms other than EPrints.

Numbers 6 and 7, I think I agree with Mark, but don't really have anything
to add. I don't really understand why this would even be considered as a
metric, let alone grounds for exclusion. What are some examples of cases
where long URIs (or, eg. directories as fulltext hosted in IRs with their
own dir structures, which happens) or URIs which happen to contain numbers
result in end users or machines not being able to properly locate/use
hosted resources?

Number 8 is probably the thing that will punish my own institution most,
which is a pity because we have a large absolute amount of fulltext, but
for various reasons, a lot of record only items as well. This is probably a
philisophical argument about defining an OA repository I guess?

I hope my criticisms here don't seem too harsh - thanks for taking the time
to listen to feedback.

On a lighter note, I'm sort of pleased these proposals have been doing the
rounds, as I think it might just be the thing that convinces my own
institution to take world repository ranking off our KPIs, and
concentrate more on qualitative value of the repository we host.

Cheers

Kim (a DSpace dev/admin, and already biased against quantitative metrics in
IRs so not exactly an objective commenter ;))



On 3 September 2014 07:56, Isidro F. Aguillo isidro.agui...@cchs.csic.es
wrote:

 Dear colleagues,

 As editor of the Ranking Web of Repositories I published the referred
 info in order to open debate about issues that are, in my humble
 opinion, very concerning for the future of repositories. As my email
 address
 is clearly stated in the webpage, I do not understand why you decided
 not consider my position and explanations in this debate.

 I am going to answer the specific points introduced by Mark Wood and,
 of course, I am open not only to further discussions but to modify my
 proposals accordingly.


  From: Mark H. Wood mw...@iupui.edu
  Date: 2 September 2014 16:28
  Subject: Re: [Dspace-tech] IMPORTANT NEWS: Important Info for Future
  Editions | Ranking Web of Repositories
  To: dspace-tech@lists.sourceforge.net, General List 
  dspace-gene...@lists.sourceforge.net
 
 
  Points 4, 6 and  7 reveal a profound lack of understanding of
  hypertext and fundamental security issues, and I would not be
  surprised to learn that they ignore typical user behavior as well.
  Does anyone but a sysadmin. or developer really type in direct URLs to
  repository content?  Citations please.


 Point 4. In many academic institutions the access to ports other than
 standards is forbidden due to security reasons. If you use other ones, the
 contents are invisible to the people accesing from other universities.

 Point 6 y 7. Explain me why .../handle/556/78/6789 is better than
 .../thesis/physics/Wood2013b and why aliasing is not possible.

 Probably authors will cite the URL of deposited files in their
 published papers, but with this awful, lengthy, useless addresses they
 probably prefer not to do.

 One of the main reasons for depositing papers is to increase their
 visibility, but this is only possible if other authors can locate
 easily them. Tipically, for example, in Google. Do you know the
 advantages
 of URL semantic content for improving position in Google? There are
 thousands of papers about academic SEO. For example, there are ones
 stating the advantages of using library instead of lib in webnames.


  I would argue that we can better do without appearing in the Ranking
  Web of Repositories, whatever that is, than to give up the ability
  to protect our users' credentials.  (Point 4, 

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-02 Thread Stuart Yeates
I'm not sure that knee-jerk reaction to an arbitrary list of bad practice is a 
good place to start and seems like a really bad driver for software development.

Maybe we should be talking to our fellow implementers and building on the work 
of http://www.w3.org/Provider/Style/URI.html, http://www.w3.org/TR/cooluris/, 
http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to build a 
compilation of _best_ practice.

Cheers
stuart

-Original Message-
From: Tim Donohue [mailto:tdono...@duraspace.org] 
Sent: Wednesday, 3 September 2014 8:49 a.m.
To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net
Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net
Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of Repositories

Hello Isidro,

DuraSpace (the stewarding organization behind DSpace and Fedora repository 
software) was planning to send you a compiled list of the concerns with your 
proposal. As you can tell from the previous email thread, many of the users of 
DSpace have similar concerns. Rather than bombard you with all of them 
individually (which you could see from browsing the thread), we hoped to draft 
up a response summarizing the concerns of the DSpace community.

Below you'll find an initial draft of the summarized concerns. The rule 
numbering below is based on the numbering at: 
http://repositories.webometrics.info/en/node/26

--- Concerns with the Proposal from Ranking Web of Repositories

* Rule #2 (IRs that don't use the institutional domain will be excluded) would 
cause the exclusion of some IRs which are hosted by DSpace service providers. 
As an example, some DSpaceDirect.org users have URLs 
https://[something].dspacedirect.org which would cause their exclusion as it is 
a non-institutional domain. Many other DSpace hosting providers have similar 
non-institutional domain URLs by default.

* Rule #4 (Repositories using ports other than 80 or 8080) would wrongly 
exclude all DSpace sites which use HTTPS (port 443). Many institutions choose 
to run DSpace via HTTPS instead of HTTP.

* Rule #5 (IRs that use the name of the software in the hostname would be 
excluded) may also affect IRs which are hosted by service providers (like 
DSpaceDirect). Again, some DSpaceDirect customers have URLs which use 
*.dspacedirect.org (includes dspace). This rule would also exclude MIT's IR 
which is the original DSpace (and has used the same URL for the last 10+ 
years): http://dspace.mit.edu/

* Rule #6 (IRs that use more than 4 directory levels for the URL address of the 
full texts will be excluded.) may accidentally exclude a large number of DSpace 
sites. The common download URLs for full text in DSpace are both are at least 4 
directory levels deep:

- XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename]
- JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename]

NOTE: prefix and id are parts of an Item's Handle (http://hdl.handle.net/), 
which is the persistent identifier assigned to the item via the Handle System. 
So, this is how a persistent URL like
http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace.

* Rule #7 (IRs that use more than 3 different numeric (or useless) codes in 
their URLs will be excluded.). It is unclear how they would determine this, and 
what the effect may be on DSpace sites worldwide. Again, looking at the common 
DSpace URL paths above, if a file had a numeric 
name, it may be excluded as DSpace URLs already include 2-3 numeric codes by 
default ([prefix],[id], and [sequence] are all numeric).

* Rule #8 (IRs with more than 50% of the records not linking to OA full text 
versions..). Again, unclear how they would determine this, and whether the way 
they are doing so would accidentally exclude some major DSpace sites. For 
example, there are major DSpace sites which include a larger number of 
Theses/Dissertations. These Theses/Dissertations may not be 100% Open Access to 
the world, but may be fully accessible everyone on campus.

---

Another, perhaps more serious concern, is on the timeline you propose. 
You suggest a timeline of January 2015 when these newly proposed rules would be 
in place. Yet, if these rules were to go in place, some rules may require 
changes to the DSpace software itself (as I laid out above, some rules may not 
mesh well with DSpace software as it is, unless I'm misunderstanding the rule 
itself).

Unfortunately, based on our DSpace open source release timelines, we have ONE 
new release (DSpace 5.0) planned between now and January 2015. 
Even if we were able to implement some of these recommended changes at a 
software level, the vast majority (likely 80-90%) of DSpace instances would 
likely NOT be able to upgrade to the latest DSpace version before your January 
deadline (as the 5.0 release is scheduled for Nov/Dec). 
Therefore, as is, your January 2015 ranking may accidentally exclude a large 
number of DSpace sites from your rankings, and DSpace is still