Tim, Thank you for the informative post.
The non-committers have their work cut out for them and I think that's a good thing. Here at Creighton we have taken advantage of the distributed collection administration features as well as enabling self-submission to some collections. I will begin by meeting with those folks and I'll look forward to continuing this discussion with the Sponsors and the Steering committee. Richard Jizba Health Sciences Library Creighton University (402) 280-5142 [email protected] -----Original Message----- From: Tim Donohue [mailto:[email protected]] Sent: Friday, February 14, 2014 10:44 AM To: Jeffrey A Trimble; Jizba, Richard; Pottinger, Hardy J.; [email protected] Cc: [email protected] Subject: Re: [Dspace-general] Search in DSpace Hi All, Just to add to the discussion here... these are all great questions and worth considering. If there are issues with how Discovery/Solr works or if it's not ideal, then we have options to either fix it or look towards a better solution altogether. Apologies for what is a rather long email, but I'm trying to best explain the Committer thought processes here... First, it's worth being aware that Discovery/Solr is actually being used both for Search and for Browse. Prior to Discover/Solr, Search was performed via Lucene, while Browse was a custom built system which used the underlying database (Postgres). To try to explain the "Committer" point of view/discussions that have taken place since then: * Committers began to run into limitations of our custom browse system and Lucene based on some of the Search/Browse feature requests coming in. While some of these features could have been implemented in Lucene, we noticed that a combined Search/Browse in Solr was beginning to look more favorable. * Committers have been struggling with the sometimes "patchwork" nature of the DSpace codebase (which happens after 10 years). DSpace is extremely powerful and extremely configurable. But, as we have limited resources, we've come to the realization that we need to simplify and modularize the codebase little by little (while trying to ensure DSpace retains its strong niche). * Committers were also noticing that more and more open source systems these days (Hydra, Islandora, even EPrints) are using Solr to handle both Search & Browse. Solr has begun to become a defacto "standard" in many ways, and it's used well beyond the IR community as well. However, even within the Committers group, not all have agreed that Solr is *the best solution*, which is part of the reason why "Discovery" exists. Without getting too technical, Discovery is essentially a "generic search/browse" layer built for DSpace. Discovery itself actually could/can provide multiple plugins for multiple search engines. The Committers have actually discussed building a Lucene plugin for Discovery and even an Elastic Search plugin for Discovery. So, in the future, it's possible DSpace Search/Browse could look like this (and individual institutions could choose which plugin you wanted): * Discovery Search/Browse * Solr plugin for Discovery * Lucene plugin for Discovery (doesn't yet exist) * Elastic Search plugin for Discovery (doesn't yet exist) So, the Committers have never decided to completely remove Lucene support forever. All that we've decided is that we need to standardize on a common Search/Browse platform (which is Discovery), because we just don't have enough development resources right now to build/maintain multiple completely separate search/browse codebases (Discovery & traditional Lucene search are entirely separate codebases, and those separate codebases still exist even in DSpace 4.x). The reality though is that currently Discovery only supports Solr. We hope that it can support Lucene and other search tools as well, but we need to find committer resources or other volunteers to help us build those additional search plugins. I also fully agree here that it'd be wonderful to find better ways to survey our "actual" end users to better determine their search/browse needs out of DSpace. This is something that I'd also love to see happen. Unfortunately it's not a role the Committers are able to play as we are not experts in writing/drafting/promoting user surveys. Perhaps however the upcoming DSpace Steering Committee and/or DCAT could help us in finding ways to survey the community and our users. This does sound like a discussion we could begin at the upcoming DuraSpace Sponsors Summit in March (for those in attendance). In general we don't always have a smooth process in place to survey the community about these decisions. The Committers team sometimes has less than perfect information and we sometimes have to make difficult decisions based on our existing resources (and it's not always clear which decisions are potentially controversial to others in the community). So, this is a great discussion to be starting, both specifically with regards to Search/Browse, and generally with regards to how best to survey our widespread community, etc. I hope this helps the discussion. Glad to also clarify anything I've said if it's unclear. - Tim On 2/13/2014 8:15 PM, Jeffrey A Trimble wrote: > On a completely end user note, we have found our Discovery service for > our Library is not well liked by undergraduate students. The results > are too large most of the time (10K+) and they (the user) frustrate > easily if they have to learn to customize the search. > > Our Information Literacy/Bibliographic Instruction Librarians have > stopped teaching Discovery Layer Services and the norm. (We us EBSCO > Discovery Service which is a Rolls Royce!). The EDS for us not only > searches our local loads (local databases, local electronic resources, > online catalogs, DSPACE server) but also all of OhioLINK. It is > really overwhelming for them. > > We still teach traditional Keyword Boolean as the starting point and > move to the browse queries and then to the ³pre coordinated searches² > such as Library of Congress Subject Headings. Pre-Coordinated > searches is a fancy name for Controlled Subject Vocabulary. > > It will be interesting to see how FAST headings will affect searching > as OCLC derives them from LCSH and as ILS¹ begin to index them into > browse searching and keyword/boolean searching. > > I think that Discovery Layers are attempting to compete with Google > searching. And the rhetorical question or theoretical question is > does discovery have Œdeliverables¹ without drilling down into the > results to get what you really came for? > > Professionally and personally, I do use Discovery, but I¹m a trained > professional, not a dilettante in the information seeking world. > > We are in a major paradigm shift that has truly only begun, and it > will be another 15 years before the shift sees true results‹some of > them will be tied to societal changes. > > My $.03 worth of thoughts. > > Cordially, > > > Jeffrey Trimble > Associate Director & > Head of Information Services > William F. Maag Library > Youngstown State University > 330.941.2483 (Office) > [email protected] > http://www.maag.ysu.edu <http://www.maag.ysu.edu/> > http://digital.maag.ysu.edu <http://digital.maag.ysu.edu/> "For he is > the Kwisatz Haderach..." > > > > > On 2/13/2014, 6:02 PM, "Jizba, Richard" <[email protected]> wrote: > >> Hardy, >> >> I understand that discussions about the search and browse functions >> are technical issues. But before technical things happen, there needs >> to be general discussion among the users: what are the advantages and >> disadvantages of the Discovery and the traditional Search? Why have >> some users put the money or effort into customizations? I suspect >> that outside of the "techies" very few users even know they have options. >> >> It says in the manual for 3.2 that: >> >> "Search is an essential component of discovery in DSpace. Users' >> expectations from a search engine are quite high, so a goal for >> DSpace is to supply as many search features as possible." >> >> Have there been discussions with the non-technical user community to >> determine what features really are important? It seems as though >> there is a large user base for DSpace, but I suspect most of the >> discussion is among the tech folks, not the non-tech user community. >> (I'm not even sure how you would go about communicating with those >> people.) >> >> My usage stats indicate that the interaction with our open >> collections is coming from the web - folks accessing the bitstreams >> directly from web search engines, not through the native DSpace >> search. Thus, these aren't actual users of DSpace "Search". (I base >> this on the fact that bitstream downloads often greatly exceed item >> views.) >> >> What I'd like to know is: >> What search functions do "actual" end users want and need? >> How do we identify "actual" end users and communicate with them? >> >> Richard Jizba >> Health Sciences Library >> Creighton University >> (402) 280-5142 >> [email protected] >> >> >> -----Original Message----- >> From: Pottinger, Hardy J. [mailto:[email protected]] >> Sent: Thursday, February 13, 2014 3:03 PM >> To: Jizba, Richard; [email protected] >> Cc: [email protected] >> Subject: Re: [Dspace-general] Search in DSpace >> >> Hi, I note that this discussion is taking place on DSpace-general, >> it's probably best-suited for DSpace-tech. I say that mostly because >> I'm about to link to technical info :-) However, since it started in >> -general I'll leave it here. >> >> Richard, your existing Lucene customizations (in particular your >> custom filter code) are very likely portable to Solr [1]. I'm not >> promising Shangri-La, but, it's likely pretty workable. I have >> repository managers here who were interested in implementing the >> non-Porter stemming analyzer, enough that they asked me to work >> towards making that option configurable for DSpace. With a bunch of >> help from the community, we made that happen for DSpace [2]. I am >> *sure* we can get DSpace to do what you need, no matter the specifics >> of the search back-end. As we trundle on down the road to DSpace 5.0, >> I hope you'll continue to help us ensure the system remains usable for you >> and the community. Thanks! >> >> [1] https://wiki.apache.org/solr/SolrPlugins >> [2] https://jira.duraspace.org/browse/DS-849 >> >> -- >> HARDY POTTINGER <[email protected]> University of Missouri >> Library Systems http://lso.umsystem.edu/~pottingerhj/ >> https://MOspace.umsystem.edu/ >> "And remember, also" added the Princesss of Sweet Rhyme, "that many >> places you would like to see are just off the Map and many things you >> want to know are just out of sight or a little beyond your reach. But >> someday you'll reach them after all, for what you learn today, for no >> reason at all, will help you discover all the wonderful secrets of >> tomorrow." >> >> --Norton Juster, The Phantom Tollbooth >> >> >> >> >> >> >> On 2/13/14 1:58 PM, "Jizba, Richard" <[email protected]> wrote: >> >>> Solr may build on Lucene, but it may also inhibit me from taking >>> real advantage of Lucene. We had that problem a couple of years ago >>> with the porter stem filter. We couldn't conduct the kind of >>> searches we wanted because the porter stem filter stemmed our search >>> terms -- and at the time, there wasn't an easy way to turn it off. >>> >>> I understand faceting, but I also know that sometimes the most >>> effective way to search is to let people who know how to search do >>> it in the most direct way possible. It's particularly true when they >>> create the collections they want to search. We have some collections >>> that are only searched by the people who make them. They are good >>> searchers who know what they are doing. >>> >>> Faceting, it seems to me, is aimed at the naïve user who doesn't >>> know anything about searching. Do such people actually search DSpace >>> directly through the interface, or do their searches originate in >>> Google, Bing, etc? In any case, we have some user groups with closed >>> collections in our repository and they need the traditional search >>> and browse functions. I just want to make sure that future dspace >>> developments don't adversely impact their needs. Just telling me >>> that Solr builds on Lucene doesn't really answer the question. >>> >>> Richard Jizba >>> Health Sciences Library >>> Creighton University >>> (402) 280-5142 >>> [email protected] >>> >>> -----Original Message----- >>> From: [email protected] [mailto:[email protected]] On Behalf >>> Of >>> helix84 >>> Sent: Thursday, February 13, 2014 11:23 AM >>> To: Jizba, Richard >>> Cc: [email protected] >>> Subject: Re: [Dspace-general] Search in DSpace >>> >>> Hi Richard, >>> just a short reply. >>> >>> Are you aware that Solr (Discovery in DSpace uses Solr) builds on Lucene? >>> They even support the same syntax with some minor differences and >>> even that is configurable. The issue is not that Lucene is worse >>> than Solr or anything, it's just that Solr brings many features that >>> aren't in pure Lucene. The reason why we dislike keeping both is >>> that there's a significant development, maintenance and support >>> burden for DSpace commiters to keep both. Count with me - two search >>> backends times two UIs (plus other interfaces like REST API in the >>> works) are four wildly different systems to work with. DSpace is not >>> just one platform, it's a collection of platforms. If we converge >>> upon a single search platform (I don't see this happening with UIs), >>> we'll have more time to put towards improving DSpace and adding new >>> features thanks to not doing double the amount of work. This will make >>> DSpace better in the long term. >>> >> >From what you said, it seems to me that everything you have should >> >be >>> also possible to do in Discovery. I do understand that changing your >>> highly customized implementation from Lucene to Solr is a lot of work. >>> But it has very tangible advantages. >>> >>> >>> Regards, >>> ~~helix84 >>> >>> Compulsory reading: DSpace Mailing List Etiquette >>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette >>> -------------------------------------------------------------------- >>> --- >>> --- >>> ---- >>> Android apps run on BlackBerry 10 >>> Introducing the new BlackBerry 10.2.1 Runtime for Android apps. >>> Now with support for Jelly Bean, Bluetooth, Mapview and more. >>> Get your Android app in front of a whole new audience. Start now. >>> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ost >>> g.c >>> lkt >>> rk >>> _______________________________________________ >>> Dspace-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dspace-general >> >> >> --------------------------------------------------------------------- >> ----- >> ---- >> Android apps run on BlackBerry 10 >> Introducing the new BlackBerry 10.2.1 Runtime for Android apps. >> Now with support for Jelly Bean, Bluetooth, Mapview and more. >> Get your Android app in front of a whole new audience. Start now. >> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg >> .clkt >> rk >> _______________________________________________ >> Dspace-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dspace-general >> > > > > ---------------------------------------------------------------------- > -------- > Android apps run on BlackBerry 10 > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > Now with support for Jelly Bean, Bluetooth, Mapview and more. > Get your Android app in front of a whole new audience. Start now. > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg. > clktrk _______________________________________________ > Dspace-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-general > ------------------------------------------------------------------------------ Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk _______________________________________________ Dspace-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-general
