Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
Dear Stuart, I do not know if you understand the ultimate purpose of Open access initiatives in general and the institutional repositories in particular. But I think you are mising the central point that the end-users should guide the design of the repository according to their real needs. Well, in OA the end users are the authors of the papers, their institutions that fund the research and host the papers and the librarians who manage the repository. In this scenario the software developers task is to fulfill in the most professional way the needs of the authors. Regarding authors needs, the W3C organization and its 'cool' proposals is arbitrary basically because they do not know how scholarly communication works, and the aims and methods of OA. They are not stakeholders for us. In any case, I will prefer and thank from you comments on the specific proposals and not a general, ambigous and unsupported global criticism. Best regards, Stuart Yeates stuart.yea...@vuw.ac.nz escribió: I'm not sure that knee-jerk reaction to an arbitrary list of bad practice is a good place to start and seems like a really bad driver for software development. Maybe we should be talking to our fellow implementers and building on the work of http://www.w3.org/Provider/Style/URI.html, http://www.w3.org/TR/cooluris/, http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to build a compilation of _best_ practice. Cheers stuart -Original Message- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Wednesday, 3 September 2014 8:49 a.m. To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of Repositories Hello Isidro, DuraSpace (the stewarding organization behind DSpace and Fedora repository software) was planning to send you a compiled list of the concerns with your proposal. As you can tell from the previous email thread, many of the users of DSpace have similar concerns. Rather than bombard you with all of them individually (which you could see from browsing the thread), we hoped to draft up a response summarizing the concerns of the DSpace community. Below you'll find an initial draft of the summarized concerns. The rule numbering below is based on the numbering at: http://repositories.webometrics.info/en/node/26 --- Concerns with the Proposal from Ranking Web of Repositories * Rule #2 (IRs that don't use the institutional domain will be excluded) would cause the exclusion of some IRs which are hosted by DSpace service providers. As an example, some DSpaceDirect.org users have URLs https://[something].dspacedirect.org which would cause their exclusion as it is a non-institutional domain. Many other DSpace hosting providers have similar non-institutional domain URLs by default. * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly exclude all DSpace sites which use HTTPS (port 443). Many institutions choose to run DSpace via HTTPS instead of HTTP. * Rule #5 (IRs that use the name of the software in the hostname would be excluded) may also affect IRs which are hosted by service providers (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which use *.dspacedirect.org (includes dspace). This rule would also exclude MIT's IR which is the original DSpace (and has used the same URL for the last 10+ years): http://dspace.mit.edu/ * Rule #6 (IRs that use more than 4 directory levels for the URL address of the full texts will be excluded.) may accidentally exclude a large number of DSpace sites. The common download URLs for full text in DSpace are both are at least 4 directory levels deep: - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename] - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename] NOTE: prefix and id are parts of an Item's Handle (http://hdl.handle.net/), which is the persistent identifier assigned to the item via the Handle System. So, this is how a persistent URL like http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace. * Rule #7 (IRs that use more than 3 different numeric (or useless) codes in their URLs will be excluded.). It is unclear how they would determine this, and what the effect may be on DSpace sites worldwide. Again, looking at the common DSpace URL paths above, if a file had a numeric name, it may be excluded as DSpace URLs already include 2-3 numeric codes by default ([prefix],[id], and [sequence] are all numeric). * Rule #8 (IRs with more than 50% of the records not linking to OA full text versions..). Again, unclear how they would determine this, and whether the way they are doing so would accidentally exclude some major DSpace sites. For example, there are major DSpace sites which
Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
Dear Kim, Thanks for your message. I answer to your specific comments Kim Shepherd kim.sheph...@gmail.com escribió: Hi Isidro and lists, Regarding point 6 -- I see what you're saying, but it shouldn't really be up to the DSpace community repositories (who all use the handle prefix / identifier system, as I'm sure you know!) to argue why 1234/123 is better than thesis/phsyics/something, because we're not the ones proposing that URI segments be part of any metric used to judge the world ranking of a repository. It's also not as simple as you might think, particularly when ensuring unique URIs and persistent URIs, etc. I think you're saying that URIs should either look nice or be meaningful, or both, but I'm not sure we should rely on URIs to be too meaningful, especially when we have ways of including that with semantic markup in references, structured data in our METS/ORE feeds via OAI, etc. This is the basic misunderstanding. The repository end user is an author that whishes to increase the global visibility and usage of his/her deposited papers. Looking nice is relevant because the main tool of the author for obtaining visibility and visits is to cite them in his/her future papers or to mention it for example in Wikipedia or Twitter. Looking nice is adding informative value (authors name, publication year, topic) that can be relevant for the reader helping to decide if following the link. Looking nice also works as quality control, a mistake in lastname is easier to notice that in a series of numbers. But far more importat: By far the largest number of visitors came from Google or other engines. If you are not searching for specific title, then the semantic content of the URL is increasing considerably your positioning in Google. Of course, you can ignore that, but there is already many people who prefers to deposit in ResearchGate or Academia as the visibility of their works is better. Regarding point 5 -- I don't see that this matters either. No end user cares what the IR is actually called, surely? Whatever arguments you can make for our IRs having bad names, punishing us for preserving the permanence of those names and URIs we've already minted seems a bit unfair? The first IR I thought of when reading this was, of course, http://dspace.mit.edu. I think point 5 actually punishes EPrints repositories most unfairly, since eprints is an accepted name for digital manuscripts as well as the platform used -- I think I've even seen IRs called eprints.something.etc running platforms other than EPrints. You are answering the question. Imagine that in the future I decide to use eprints instead of dspace or whatever other better that can be developed in the future. Then, are going to change the domain or not?. On the other side, why branding an intellectual result with the name of the tool?. I write my papers with MS Word and never made any significant mention of this fact in any of the papers. What is the problem with? http://repository.university.edu/ Numbers 6 and 7, I think I agree with Mark, but don't really have anything to add. I don't really understand why this would even be considered as a metric, let alone grounds for exclusion. What are some examples of cases where long URIs (or, eg. directories as fulltext hosted in IRs with their own dir structures, which happens) or URIs which happen to contain numbers result in end users or machines not being able to properly locate/use hosted resources? Metrics? Who is talking about metrics? I only said Keep It Simple. Number 8 is probably the thing that will punish my own institution most, which is a pity because we have a large absolute amount of fulltext, but for various reasons, a lot of record only items as well. This is probably a philisophical argument about defining an OA repository I guess? We can discuss about the threshold, I think 50% is reasonably but it could be used only for contents after embargo ends (usually 6 or 12 months). I hope my criticisms here don't seem too harsh - thanks for taking the time to listen to feedback. On a lighter note, I'm sort of pleased these proposals have been doing the rounds, as I think it might just be the thing that convinces my own institution to take world repository ranking off our KPIs, and concentrate more on qualitative value of the repository we host. Thanks for your cooperation ... Cheers Kim (a DSpace dev/admin, and already biased against quantitative metrics in IRs so not exactly an objective commenter ;)) You are a perfect objective commenter as you identify as dev/admin but the repository was not intended to serve you. The 'customers' are your authors and your institution and THEY are strongly biased to support metrics. Simply, ask them. On 3 September 2014 07:56, Isidro F. Aguillo isidro.agui...@cchs.csic.es wrote: Dear colleagues, As editor of the Ranking Web of Repositories I
Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
Dear Tim, Tim Donohue tdono...@duraspace.org escribió: Hello Isidro, DuraSpace (the stewarding organization behind DSpace and Fedora repository software) was planning to send you a compiled list of the concerns with your proposal. As you can tell from the previous email thread, many of the users of DSpace have similar concerns. Rather than bombard you with all of them individually (which you could see from browsing the thread), we hoped to draft up a response summarizing the concerns of the DSpace community. Thanks a lot. That is far beyond my better expectations. Below you'll find an initial draft of the summarized concerns. The rule numbering below is based on the numbering at: http://repositories.webometrics.info/en/node/26 --- Concerns with the Proposal from Ranking Web of Repositories * Rule #2 (IRs that don't use the institutional domain will be excluded) would cause the exclusion of some IRs which are hosted by DSpace service providers. As an example, some DSpaceDirect.org users have URLs https://[something].dspacedirect.org which would cause their exclusion as it is a non-institutional domain. Many other DSpace hosting providers have similar non-institutional domain URLs by default. Major issue. Repository is not another bibliographic database, it is the archive of the academic output of the institution. And as such it should be iron brand. If .. the institution is very small or in a country with limited resources the hosting option is perfectly valid and we will do an exception. But in these cases, Is not a redirection possible? If .. the problem is related to governance we should not reward the institution bad practice. * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly exclude all DSpace sites which use HTTPS (port 443). Many institutions choose to run DSpace via HTTPS instead of HTTP. No problem adding a few more ports to the short list * Rule #5 (IRs that use the name of the software in the hostname would be excluded) may also affect IRs which are hosted by service providers (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which use *.dspacedirect.org (includes dspace). This rule would also exclude MIT's IR which is the original DSpace (and has used the same URL for the last 10+ years): http://dspace.mit.edu/ This is easy. Imagine that the people at MIT decide next week they prefer eprints or other new software. A repository is a repository, why not a final one like?: repository.mit.edu On the other side, I wrote all my papers using MS Word and never cited that fact in any of them, less of all in the authorship * Rule #6 (IRs that use more than 4 directory levels for the URL address of the full texts will be excluded.) may accidentally exclude a large number of DSpace sites. The common download URLs for full text in DSpace are both are at least 4 directory levels deep: - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename] - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename] NOTE: prefix and id are parts of an Item's Handle (http://hdl.handle.net/), which is the persistent identifier assigned to the item via the Handle System. So, this is how a persistent URL like http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace. I understand the technical part, but this for the needs of the sysadmin of the system (=NE person). Now, please take into account the needs of 10,000 internal end-users authors. Why not redirect (aliasing?)? * Rule #7 (IRs that use more than 3 different numeric (or useless) codes in their URLs will be excluded.). It is unclear how they would determine this, and what the effect may be on DSpace sites worldwide. Again, looking at the common DSpace URL paths above, if a file had a numeric name, it may be excluded as DSpace URLs already include 2-3 numeric codes by default ([prefix],[id], and [sequence] are all numeric). I have a personal example. 20 years ago my email admin decided my email account should be 'dctfa11', abusing from your notation ([prefix],[id], and [sequence]). After several years it was possible to change to 'isidro.aguillo'. I going to use the URL of my papers to cite them, to marketing them, to copy in my CV, . Please, help me translating /56/89/567894 into aguillo2014b * Rule #8 (IRs with more than 50% of the records not linking to OA full text versions..). Again, unclear how they would determine this, and whether the way they are doing so would accidentally exclude some major DSpace sites. For example, there are major DSpace sites which include a larger number of Theses/Dissertations. These Theses/Dissertations may not be 100% Open Access to the world, but may be fully accessible everyone on campus. 50%!!! A 'place' with less than 50% of the full texts unavailable is NOT an Open Access Repository. Another, perhaps more serious concern, is on the
Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
development. Maybe we should be talking to our fellow implementers and building on the work of http://www.w3.org/Provider/Style/URI.html, http://www.w3.org/TR/cooluris/, http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to build a compilation of _best_ practice. Cheers stuart -Original Message- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Wednesday, 3 September 2014 8:49 a.m. To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of Repositories Hello Isidro, DuraSpace (the stewarding organization behind DSpace and Fedora repository software) was planning to send you a compiled list of the concerns with your proposal. As you can tell from the previous email thread, many of the users of DSpace have similar concerns. Rather than bombard you with all of them individually (which you could see from browsing the thread), we hoped to draft up a response summarizing the concerns of the DSpace community. Below you'll find an initial draft of the summarized concerns. The rule numbering below is based on the numbering at: http://repositories.webometrics.info/en/node/26 --- Concerns with the Proposal from Ranking Web of Repositories * Rule #2 (IRs that don't use the institutional domain will be excluded) would cause the exclusion of some IRs which are hosted by DSpace service providers. As an example, some DSpaceDirect.org users have URLs https://[something].dspacedirect.org which would cause their exclusion as it is a non-institutional domain. Many other DSpace hosting providers have similar non-institutional domain URLs by default. * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly exclude all DSpace sites which use HTTPS (port 443). Many institutions choose to run DSpace via HTTPS instead of HTTP. * Rule #5 (IRs that use the name of the software in the hostname would be excluded) may also affect IRs which are hosted by service providers (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which use *.dspacedirect.org (includes dspace). This rule would also exclude MIT's IR which is the original DSpace (and has used the same URL for the last 10+ years): http://dspace.mit.edu/ * Rule #6 (IRs that use more than 4 directory levels for the URL address of the full texts will be excluded.) may accidentally exclude a large number of DSpace sites. The common download URLs for full text in DSpace are both are at least 4 directory levels deep: - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename] - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename] NOTE: prefix and id are parts of an Item's Handle (http://hdl.handle.net/), which is the persistent identifier assigned to the item via the Handle System. So, this is how a persistent URL like http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace. * Rule #7 (IRs that use more than 3 different numeric (or useless) codes in their URLs will be excluded.). It is unclear how they would determine this, and what the effect may be on DSpace sites worldwide. Again, looking at the common DSpace URL paths above, if a file had a numeric name, it may be excluded as DSpace URLs already include 2-3 numeric codes by default ([prefix],[id], and [sequence] are all numeric). * Rule #8 (IRs with more than 50% of the records not linking to OA full text versions..). Again, unclear how they would determine this, and whether the way they are doing so would accidentally exclude some major DSpace sites. For example, there are major DSpace sites which include a larger number of Theses/Dissertations. These Theses/Dissertations may not be 100% Open Access to the world, but may be fully accessible everyone on campus. --- Another, perhaps more serious concern, is on the timeline you propose. You suggest a timeline of January 2015 when these newly proposed rules would be in place. Yet, if these rules were to go in place, some rules may require changes to the DSpace software itself (as I laid out above, some rules may not mesh well with DSpace software as it is, unless I'm misunderstanding the rule itself). Unfortunately, based on our DSpace open source release timelines, we have ONE new release (DSpace 5.0) planned between now and January 2015. Even if we were able to implement some of these recommended changes at a software level, the vast majority (likely 80-90%) of DSpace instances would likely NOT be able to upgrade to the latest DSpace version before your January deadline (as the 5.0 release is scheduled for Nov/Dec). Therefore, as is, your January 2015 ranking may accidentally exclude a large number of DSpace sites from your rankings, and DSpace is still the most widely used
Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
I think that * Rule #7 (IRs that use more than 3 different numeric (or useless) codes in their URLs will be excluded.). It is unclear how they would determine this, and what the effect may be on DSpace sites worldwide. Again, looking at the common DSpace URL paths above, if a file had a numeric name, it may be excluded as DSpace URLs already include 2-3 numeric codes by default ([prefix],[id], and [sequence] are all numeric). I have a personal example. 20 years ago my email admin decided my email account should be 'dctfa11', abusing from your notation ([prefix],[id], and [sequence]). After several years it was possible to change to 'isidro.aguillo'. I going to use the URL of my papers to cite them, to marketing them, to copy in my CV, . Please, help me translating /56/89/567894 into aguillo2014b --- has no posible conceptual discussion: http://www.w3.org/2013/dwbp/wiki/URI_Design_and_Management_for_Persistence And of course, correct URIs have effects over SEO, that is an inherent responsability to IRs platforms. As a DSpace implementor I understand that it could have no inmediate solution, but to me it's undoubted the correct requirement for future DSpace versions. Regards German 2014-09-03 19:53 GMT-03:00 Isidro F. Aguillo isidro.agui...@cchs.csic.es: Dear Tim, Tim Donohue tdono...@duraspace.org escribió: Hello Isidro, DuraSpace (the stewarding organization behind DSpace and Fedora repository software) was planning to send you a compiled list of the concerns with your proposal. As you can tell from the previous email thread, many of the users of DSpace have similar concerns. Rather than bombard you with all of them individually (which you could see from browsing the thread), we hoped to draft up a response summarizing the concerns of the DSpace community. Thanks a lot. That is far beyond my better expectations. Below you'll find an initial draft of the summarized concerns. The rule numbering below is based on the numbering at: http://repositories.webometrics.info/en/node/26 --- Concerns with the Proposal from Ranking Web of Repositories * Rule #2 (IRs that don't use the institutional domain will be excluded) would cause the exclusion of some IRs which are hosted by DSpace service providers. As an example, some DSpaceDirect.org users have URLs https://[something].dspacedirect.org which would cause their exclusion as it is a non-institutional domain. Many other DSpace hosting providers have similar non-institutional domain URLs by default. Major issue. Repository is not another bibliographic database, it is the archive of the academic output of the institution. And as such it should be iron brand. If .. the institution is very small or in a country with limited resources the hosting option is perfectly valid and we will do an exception. But in these cases, Is not a redirection possible? If .. the problem is related to governance we should not reward the institution bad practice. * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly exclude all DSpace sites which use HTTPS (port 443). Many institutions choose to run DSpace via HTTPS instead of HTTP. No problem adding a few more ports to the short list * Rule #5 (IRs that use the name of the software in the hostname would be excluded) may also affect IRs which are hosted by service providers (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which use *.dspacedirect.org (includes dspace). This rule would also exclude MIT's IR which is the original DSpace (and has used the same URL for the last 10+ years): http://dspace.mit.edu/ This is easy. Imagine that the people at MIT decide next week they prefer eprints or other new software. A repository is a repository, why not a final one like?: repository.mit.edu On the other side, I wrote all my papers using MS Word and never cited that fact in any of them, less of all in the authorship * Rule #6 (IRs that use more than 4 directory levels for the URL address of the full texts will be excluded.) may accidentally exclude a large number of DSpace sites. The common download URLs for full text in DSpace are both are at least 4 directory levels deep: - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename] - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename] NOTE: prefix and id are parts of an Item's Handle (http://hdl.handle.net/), which is the persistent identifier assigned to the item via the Handle System. So, this is how a persistent URL like http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace. I understand the technical part, but this for the needs of the sysadmin of the system (=NE person). Now, please take into account the needs of 10,000 internal end-users authors. Why not redirect (aliasing?)? * Rule #7 (IRs that use more than 3 different numeric (or
Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
Dear colleaugue, User is user, not any casual reader. A non-casual reader of an academic paper is usually another scientist (a Higgs boson text is not for everybody), and any professional (including junior ones like PhD students) scientist is or is going to be an author. Best regards, sharad sharad7...@gmail.com escribió: Hi, Most of us will do agree with the fact that authors themselves (or funder of research/repository) are not the sole end users of a repository. In contrary major end users are non-authors or people who have not contributed to a repository and are just the users of the repository. If what ever changes that are proposed are for the better user experience of end-user, let us not assume that the end users are only authors. Best Regards, Sharad On Thu, Sep 4, 2014 at 2:55 AM, Anton Angelo an...@mojo.org wrote: Hi Isidro, As a librarian/technologist managing a institutional repository I have to disagree with you on the definition of our end users. There are two ways to look at this, and neither end up with the authors as end users. An idealistic approach would see the final readers of the items as the end users. They are the ones, defined by various OA declarations, the ones for whom we are doing this. A pragmatic approach would say the funders of the research are the end users, as they are demanding the output of their funding to be made OA. The latter group are the ones your service is most useful for, in determining the performance of their outputs - the more visible, the better vehicle for publication. I am beginning to think that rankings are not a very useful manner in which to compare IRs, but a list of platform agnostic best practice standards (like the orange book for security, back in the day) is the way forward. Though I have extensively used the service in my research on IR effectiveness, that was mostly because the repository I manage has a high ranking, and it was useful to promote it internally. This kind of behaviour usually ends up in 'gaming', and is counterproductive - exactly what OA is trying to get away from (h-index, impact factor, etc). IRs are really about getting the right output to the right person - even one download can be a total success. I think in the future altmetric tools are probably going to be more use than a ranking service, as useful as it has been in the past - provided they report on the work in OA being done in the global south. aa On 4 September 2014 09:12, Isidro F. Aguillo isidro.agui...@cchs.csic.es wrote: Dear Stuart, I do not know if you understand the ultimate purpose of Open access initiatives in general and the institutional repositories in particular. But I think you are mising the central point that the end-users should guide the design of the repository according to their real needs. Well, in OA the end users are the authors of the papers, their institutions that fund the research and host the papers and the librarians who manage the repository. In this scenario the software developers task is to fulfill in the most professional way the needs of the authors. Regarding authors needs, the W3C organization and its 'cool' proposals is arbitrary basically because they do not know how scholarly communication works, and the aims and methods of OA. They are not stakeholders for us. In any case, I will prefer and thank from you comments on the specific proposals and not a general, ambigous and unsupported global criticism. Best regards, Stuart Yeates stuart.yea...@vuw.ac.nz escribió: I'm not sure that knee-jerk reaction to an arbitrary list of bad practice is a good place to start and seems like a really bad driver for software development. Maybe we should be talking to our fellow implementers and building on the work of http://www.w3.org/Provider/Style/URI.html, http://www.w3.org/TR/cooluris/, http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to build a compilation of _best_ practice. Cheers stuart -Original Message- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Wednesday, 3 September 2014 8:49 a.m. To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of Repositories Hello Isidro, DuraSpace (the stewarding organization behind DSpace and Fedora repository software) was planning to send you a compiled list of the concerns with your proposal. As you can tell from the previous email thread, many of the users of DSpace have similar concerns. Rather than bombard you with all of them individually (which you could see from browsing the thread), we hoped to draft up a response summarizing the concerns of the DSpace community. Below you'll find an initial draft of the summarized concerns. The rule numbering below is based
Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
Dear German, Thanks for your support. I understand this can be difficult to implement or that needs time to develop. No problem for applying the proposals later or not applying at all. Best regards, Germán Biozzoli germanbiozz...@gmail.com escribió: I think that * Rule #7 (IRs that use more than 3 different numeric (or useless) codes in their URLs will be excluded.). It is unclear how they would determine this, and what the effect may be on DSpace sites worldwide. Again, looking at the common DSpace URL paths above, if a file had a numeric name, it may be excluded as DSpace URLs already include 2-3 numeric codes by default ([prefix],[id], and [sequence] are all numeric). I have a personal example. 20 years ago my email admin decided my email account should be 'dctfa11', abusing from your notation ([prefix],[id], and [sequence]). After several years it was possible to change to 'isidro.aguillo'. I going to use the URL of my papers to cite them, to marketing them, to copy in my CV, . Please, help me translating /56/89/567894 into aguillo2014b --- has no posible conceptual discussion: http://www.w3.org/2013/dwbp/wiki/URI_Design_and_Management_for_Persistence And of course, correct URIs have effects over SEO, that is an inherent responsability to IRs platforms. As a DSpace implementor I understand that it could have no inmediate solution, but to me it's undoubted the correct requirement for future DSpace versions. Regards German 2014-09-03 19:53 GMT-03:00 Isidro F. Aguillo isidro.agui...@cchs.csic.es: Dear Tim, Tim Donohue tdono...@duraspace.org escribió: Hello Isidro, DuraSpace (the stewarding organization behind DSpace and Fedora repository software) was planning to send you a compiled list of the concerns with your proposal. As you can tell from the previous email thread, many of the users of DSpace have similar concerns. Rather than bombard you with all of them individually (which you could see from browsing the thread), we hoped to draft up a response summarizing the concerns of the DSpace community. Thanks a lot. That is far beyond my better expectations. Below you'll find an initial draft of the summarized concerns. The rule numbering below is based on the numbering at: http://repositories.webometrics.info/en/node/26 --- Concerns with the Proposal from Ranking Web of Repositories * Rule #2 (IRs that don't use the institutional domain will be excluded) would cause the exclusion of some IRs which are hosted by DSpace service providers. As an example, some DSpaceDirect.org users have URLs https://[something].dspacedirect.org which would cause their exclusion as it is a non-institutional domain. Many other DSpace hosting providers have similar non-institutional domain URLs by default. Major issue. Repository is not another bibliographic database, it is the archive of the academic output of the institution. And as such it should be iron brand. If .. the institution is very small or in a country with limited resources the hosting option is perfectly valid and we will do an exception. But in these cases, Is not a redirection possible? If .. the problem is related to governance we should not reward the institution bad practice. * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly exclude all DSpace sites which use HTTPS (port 443). Many institutions choose to run DSpace via HTTPS instead of HTTP. No problem adding a few more ports to the short list * Rule #5 (IRs that use the name of the software in the hostname would be excluded) may also affect IRs which are hosted by service providers (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which use *.dspacedirect.org (includes dspace). This rule would also exclude MIT's IR which is the original DSpace (and has used the same URL for the last 10+ years): http://dspace.mit.edu/ This is easy. Imagine that the people at MIT decide next week they prefer eprints or other new software. A repository is a repository, why not a final one like?: repository.mit.edu On the other side, I wrote all my papers using MS Word and never cited that fact in any of them, less of all in the authorship * Rule #6 (IRs that use more than 4 directory levels for the URL address of the full texts will be excluded.) may accidentally exclude a large number of DSpace sites. The common download URLs for full text in DSpace are both are at least 4 directory levels deep: - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename] - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename] NOTE: prefix and id are parts of an Item's Handle (http://hdl.handle.net/), which is the persistent identifier assigned to the item via the Handle System. So, this is how a persistent URL like http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace.
Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
Hi All As meat for further constructive debate, I would like to submit our rationalisations for the selection of our URL. http://wiki.lib.sun.ac.za/index.php/SUNScholar/Guidelines/Step_2 (Step 2 - Marketing Friendly (Vanity URL), Persistent URL and Preservable Digital Objects) Regards Hilton *Hilton Gibson* Ubuntu Linux Systems Administrator JS Gericke Library Room 1025C Stellenbosch University Private Bag X5036 Stellenbosch 7599 South Africa Tel: +27 21 808 4100 | Cell: +27 84 646 4758 On 2 September 2014 21:56, Isidro F. Aguillo isidro.agui...@cchs.csic.es wrote: Dear colleagues, As editor of the Ranking Web of Repositories I published the referred info in order to open debate about issues that are, in my humble opinion, very concerning for the future of repositories. As my email address is clearly stated in the webpage, I do not understand why you decided not consider my position and explanations in this debate. I am going to answer the specific points introduced by Mark Wood and, of course, I am open not only to further discussions but to modify my proposals accordingly. From: Mark H. Wood mw...@iupui.edu Date: 2 September 2014 16:28 Subject: Re: [Dspace-tech] IMPORTANT NEWS: Important Info for Future Editions | Ranking Web of Repositories To: dspace-tech@lists.sourceforge.net, General List dspace-gene...@lists.sourceforge.net Points 4, 6 and 7 reveal a profound lack of understanding of hypertext and fundamental security issues, and I would not be surprised to learn that they ignore typical user behavior as well. Does anyone but a sysadmin. or developer really type in direct URLs to repository content? Citations please. Point 4. In many academic institutions the access to ports other than standards is forbidden due to security reasons. If you use other ones, the contents are invisible to the people accesing from other universities. Point 6 y 7. Explain me why .../handle/556/78/6789 is better than .../thesis/physics/Wood2013b and why aliasing is not possible. Probably authors will cite the URL of deposited files in their published papers, but with this awful, lengthy, useless addresses they probably prefer not to do. One of the main reasons for depositing papers is to increase their visibility, but this is only possible if other authors can locate easily them. Tipically, for example, in Google. Do you know the advantages of URL semantic content for improving position in Google? There are thousands of papers about academic SEO. For example, there are ones stating the advantages of using library instead of lib in webnames. I would argue that we can better do without appearing in the Ranking Web of Repositories, whatever that is, than to give up the ability to protect our users' credentials. (Point 4, which disallows HTTPS) Are you mixing public and private sections? You can protect your users without destroying visibility. Point 5 is just bizarre. Why does someone think this is a problem? Not that I think it particularly useful to use the name of supporting software in naming a repository service, but how can it possibly hurt? The repository is the probably the most important part of the intellectual treasure of the university and their authors, You are simply proposing to brand the continent instead of the content. Are there any actual statistics to support the belief that long URLs in the interior of a service actually affect anyone's behavior? Interior is irrelevant, the contents of the repository are for the end-users that are not sysadmin but the institution authors and authors and readers from the rest of the world. We are talking of Open Access and in my opinion the referred issues are barriers to the open. It sounds like there should be some discussion among the various parties. Where? As mentioned before here I am for further comments. Thanks for your cooperation. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Isidro F. Aguillo, HonPhD Cybermetrics Lab (3C1). CCHS - CSIC Albasanz, 26-28. 28037 Madrid. Spain isidro.aguillo @ cchs.csic.es www. webometrics.info - Terminar mensaje reenviado - -- Isidro F. Aguillo, HonPhD Cybermetrics Lab (3C1). CCHS - CSIC Albasanz, 26-28. 28037 Madrid. Spain isidro.aguillo @ cchs.csic.es www. webometrics.info
Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
Hi Isidro and lists, Regarding point 6 -- I see what you're saying, but it shouldn't really be up to the DSpace community repositories (who all use the handle prefix / identifier system, as I'm sure you know!) to argue why 1234/123 is better than thesis/phsyics/something, because we're not the ones proposing that URI segments be part of any metric used to judge the world ranking of a repository. It's also not as simple as you might think, particularly when ensuring unique URIs and persistent URIs, etc. I think you're saying that URIs should either look nice or be meaningful, or both, but I'm not sure we should rely on URIs to be too meaningful, especially when we have ways of including that with semantic markup in references, structured data in our METS/ORE feeds via OAI, etc. Regarding point 5 -- I don't see that this matters either. No end user cares what the IR is actually called, surely? Whatever arguments you can make for our IRs having bad names, punishing us for preserving the permanence of those names and URIs we've already minted seems a bit unfair? The first IR I thought of when reading this was, of course, http://dspace.mit.edu. I think point 5 actually punishes EPrints repositories most unfairly, since eprints is an accepted name for digital manuscripts as well as the platform used -- I think I've even seen IRs called eprints.something.etc running platforms other than EPrints. Numbers 6 and 7, I think I agree with Mark, but don't really have anything to add. I don't really understand why this would even be considered as a metric, let alone grounds for exclusion. What are some examples of cases where long URIs (or, eg. directories as fulltext hosted in IRs with their own dir structures, which happens) or URIs which happen to contain numbers result in end users or machines not being able to properly locate/use hosted resources? Number 8 is probably the thing that will punish my own institution most, which is a pity because we have a large absolute amount of fulltext, but for various reasons, a lot of record only items as well. This is probably a philisophical argument about defining an OA repository I guess? I hope my criticisms here don't seem too harsh - thanks for taking the time to listen to feedback. On a lighter note, I'm sort of pleased these proposals have been doing the rounds, as I think it might just be the thing that convinces my own institution to take world repository ranking off our KPIs, and concentrate more on qualitative value of the repository we host. Cheers Kim (a DSpace dev/admin, and already biased against quantitative metrics in IRs so not exactly an objective commenter ;)) On 3 September 2014 07:56, Isidro F. Aguillo isidro.agui...@cchs.csic.es wrote: Dear colleagues, As editor of the Ranking Web of Repositories I published the referred info in order to open debate about issues that are, in my humble opinion, very concerning for the future of repositories. As my email address is clearly stated in the webpage, I do not understand why you decided not consider my position and explanations in this debate. I am going to answer the specific points introduced by Mark Wood and, of course, I am open not only to further discussions but to modify my proposals accordingly. From: Mark H. Wood mw...@iupui.edu Date: 2 September 2014 16:28 Subject: Re: [Dspace-tech] IMPORTANT NEWS: Important Info for Future Editions | Ranking Web of Repositories To: dspace-tech@lists.sourceforge.net, General List dspace-gene...@lists.sourceforge.net Points 4, 6 and 7 reveal a profound lack of understanding of hypertext and fundamental security issues, and I would not be surprised to learn that they ignore typical user behavior as well. Does anyone but a sysadmin. or developer really type in direct URLs to repository content? Citations please. Point 4. In many academic institutions the access to ports other than standards is forbidden due to security reasons. If you use other ones, the contents are invisible to the people accesing from other universities. Point 6 y 7. Explain me why .../handle/556/78/6789 is better than .../thesis/physics/Wood2013b and why aliasing is not possible. Probably authors will cite the URL of deposited files in their published papers, but with this awful, lengthy, useless addresses they probably prefer not to do. One of the main reasons for depositing papers is to increase their visibility, but this is only possible if other authors can locate easily them. Tipically, for example, in Google. Do you know the advantages of URL semantic content for improving position in Google? There are thousands of papers about academic SEO. For example, there are ones stating the advantages of using library instead of lib in webnames. I would argue that we can better do without appearing in the Ranking Web of Repositories, whatever that is, than to give up the ability to protect our users' credentials. (Point 4,
Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories
I'm not sure that knee-jerk reaction to an arbitrary list of bad practice is a good place to start and seems like a really bad driver for software development. Maybe we should be talking to our fellow implementers and building on the work of http://www.w3.org/Provider/Style/URI.html, http://www.w3.org/TR/cooluris/, http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to build a compilation of _best_ practice. Cheers stuart -Original Message- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Wednesday, 3 September 2014 8:49 a.m. To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of Repositories Hello Isidro, DuraSpace (the stewarding organization behind DSpace and Fedora repository software) was planning to send you a compiled list of the concerns with your proposal. As you can tell from the previous email thread, many of the users of DSpace have similar concerns. Rather than bombard you with all of them individually (which you could see from browsing the thread), we hoped to draft up a response summarizing the concerns of the DSpace community. Below you'll find an initial draft of the summarized concerns. The rule numbering below is based on the numbering at: http://repositories.webometrics.info/en/node/26 --- Concerns with the Proposal from Ranking Web of Repositories * Rule #2 (IRs that don't use the institutional domain will be excluded) would cause the exclusion of some IRs which are hosted by DSpace service providers. As an example, some DSpaceDirect.org users have URLs https://[something].dspacedirect.org which would cause their exclusion as it is a non-institutional domain. Many other DSpace hosting providers have similar non-institutional domain URLs by default. * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly exclude all DSpace sites which use HTTPS (port 443). Many institutions choose to run DSpace via HTTPS instead of HTTP. * Rule #5 (IRs that use the name of the software in the hostname would be excluded) may also affect IRs which are hosted by service providers (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which use *.dspacedirect.org (includes dspace). This rule would also exclude MIT's IR which is the original DSpace (and has used the same URL for the last 10+ years): http://dspace.mit.edu/ * Rule #6 (IRs that use more than 4 directory levels for the URL address of the full texts will be excluded.) may accidentally exclude a large number of DSpace sites. The common download URLs for full text in DSpace are both are at least 4 directory levels deep: - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename] - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename] NOTE: prefix and id are parts of an Item's Handle (http://hdl.handle.net/), which is the persistent identifier assigned to the item via the Handle System. So, this is how a persistent URL like http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace. * Rule #7 (IRs that use more than 3 different numeric (or useless) codes in their URLs will be excluded.). It is unclear how they would determine this, and what the effect may be on DSpace sites worldwide. Again, looking at the common DSpace URL paths above, if a file had a numeric name, it may be excluded as DSpace URLs already include 2-3 numeric codes by default ([prefix],[id], and [sequence] are all numeric). * Rule #8 (IRs with more than 50% of the records not linking to OA full text versions..). Again, unclear how they would determine this, and whether the way they are doing so would accidentally exclude some major DSpace sites. For example, there are major DSpace sites which include a larger number of Theses/Dissertations. These Theses/Dissertations may not be 100% Open Access to the world, but may be fully accessible everyone on campus. --- Another, perhaps more serious concern, is on the timeline you propose. You suggest a timeline of January 2015 when these newly proposed rules would be in place. Yet, if these rules were to go in place, some rules may require changes to the DSpace software itself (as I laid out above, some rules may not mesh well with DSpace software as it is, unless I'm misunderstanding the rule itself). Unfortunately, based on our DSpace open source release timelines, we have ONE new release (DSpace 5.0) planned between now and January 2015. Even if we were able to implement some of these recommended changes at a software level, the vast majority (likely 80-90%) of DSpace instances would likely NOT be able to upgrade to the latest DSpace version before your January deadline (as the 5.0 release is scheduled for Nov/Dec). Therefore, as is, your January 2015 ranking may accidentally exclude a large number of DSpace sites from your rankings, and DSpace is still