Tim,

Thank you for the informative post.

The non-committers have their work cut out for them and I think that's a good 
thing. Here at Creighton we have taken advantage of the distributed collection 
administration features as well as enabling self-submission to some 
collections. I will begin by meeting with those folks and I'll look forward to 
continuing this discussion with the Sponsors and the Steering committee.


Richard Jizba
Health Sciences Library
Creighton University
(402) 280-5142
[email protected]

-----Original Message-----
From: Tim Donohue [mailto:[email protected]] 
Sent: Friday, February 14, 2014 10:44 AM
To: Jeffrey A Trimble; Jizba, Richard; Pottinger, Hardy J.; [email protected]
Cc: [email protected]
Subject: Re: [Dspace-general] Search in DSpace

Hi All,

Just to add to the discussion here... these are all great questions and worth 
considering. If there are issues with how Discovery/Solr works or if it's not 
ideal, then we have options to either fix it or look towards a better solution 
altogether.

Apologies for what is a rather long email, but I'm trying to best explain the 
Committer thought processes here...

First, it's worth being aware that Discovery/Solr is actually being used both 
for Search and for Browse. Prior to Discover/Solr, Search was performed via 
Lucene, while Browse was a custom built system which used the underlying 
database (Postgres).

To try to explain the "Committer" point of view/discussions that have taken 
place since then:
* Committers began to run into limitations of our custom browse system and 
Lucene based on some of the Search/Browse feature requests coming in. While 
some of these features could have been implemented in Lucene, we noticed that a 
combined Search/Browse in Solr was beginning to look more favorable.
* Committers have been struggling with the sometimes "patchwork" nature of the 
DSpace codebase (which happens after 10 years). DSpace is extremely powerful 
and extremely configurable. But, as we have limited resources, we've come to 
the realization that we need to simplify and modularize the codebase little by 
little (while trying to ensure DSpace retains its strong niche).
* Committers were also noticing that more and more open source systems these 
days (Hydra, Islandora, even EPrints) are using Solr to handle both Search & 
Browse. Solr has begun to become a defacto "standard" in many ways, and it's 
used well beyond the IR community as well.

However, even within the Committers group, not all have agreed that Solr is 
*the best solution*, which is part of the reason why "Discovery" 
exists. Without getting too technical, Discovery is essentially a "generic 
search/browse" layer built for DSpace. Discovery itself actually could/can 
provide multiple plugins for multiple search engines. 
The Committers have actually discussed building a Lucene plugin for Discovery 
and even an Elastic Search plugin for Discovery.

So, in the future, it's possible DSpace Search/Browse could look like this (and 
individual institutions could choose which plugin you wanted):

* Discovery Search/Browse
     * Solr plugin for Discovery
     * Lucene plugin for Discovery (doesn't yet exist)
     * Elastic Search plugin for Discovery (doesn't yet exist)

So, the Committers have never decided to completely remove Lucene support 
forever. All that we've decided is that we need to standardize on a common 
Search/Browse platform (which is Discovery), because we just don't have enough 
development resources right now to build/maintain multiple completely separate 
search/browse codebases (Discovery & traditional Lucene search are entirely 
separate codebases, and those separate codebases still exist even in DSpace 
4.x). The reality though is that currently Discovery only supports Solr. We 
hope that it can support Lucene and other search tools as well, but we need to 
find committer resources or other volunteers to help us build those additional 
search plugins.

I also fully agree here that it'd be wonderful to find better ways to survey 
our "actual" end users to better determine their search/browse needs out of 
DSpace. This is something that I'd also love to see happen. 
Unfortunately it's not a role the Committers are able to play as we are not 
experts in writing/drafting/promoting user surveys. Perhaps however the 
upcoming DSpace Steering Committee and/or DCAT could help us in finding ways to 
survey the community and our users.

This does sound like a discussion we could begin at the upcoming DuraSpace 
Sponsors Summit in March (for those in attendance). In general we don't always 
have a smooth process in place to survey the community about these decisions. 
The Committers team sometimes has less than perfect information and we 
sometimes have to make difficult decisions based on our existing resources (and 
it's not always clear which decisions are potentially controversial to others 
in the community). So, this is a great discussion to be starting, both 
specifically with regards to Search/Browse, and generally with regards to how 
best to survey our widespread community, etc.

I hope this helps the discussion. Glad to also clarify anything I've said if 
it's unclear.

- Tim


On 2/13/2014 8:15 PM, Jeffrey A Trimble wrote:
> On a completely end user note, we have found our Discovery service for 
> our Library is not well liked by undergraduate students.  The results 
> are too large most of the time (10K+) and they (the user) frustrate 
> easily if they have to learn to customize the search.
>
> Our Information Literacy/Bibliographic Instruction Librarians have 
> stopped teaching Discovery Layer Services and the norm.  (We us EBSCO 
> Discovery Service which is a Rolls Royce!).  The EDS for us not only 
> searches our local loads (local databases, local electronic resources, 
> online catalogs, DSPACE server) but also all of OhioLINK.  It is 
> really overwhelming for them.
>
> We still teach traditional Keyword Boolean as the starting point and 
> move to the browse queries and then to the ³pre coordinated searches² 
> such as Library of Congress Subject Headings.  Pre-Coordinated 
> searches is a fancy name for Controlled Subject Vocabulary.
>
> It will be interesting to see how FAST headings will affect searching 
> as OCLC derives them from LCSH and as ILS¹ begin to index them into 
> browse searching and keyword/boolean searching.
>
> I think that Discovery Layers are attempting to compete with Google 
> searching.  And the rhetorical question or theoretical question is 
> does discovery have Œdeliverables¹ without drilling down into the 
> results to get what you really came for?
>
> Professionally and personally, I do use Discovery, but I¹m a trained 
> professional, not a dilettante in the information seeking world.
>
> We are in a major paradigm shift that has truly only begun, and it 
> will be another 15 years before the shift sees true results‹some of 
> them will be tied to societal changes.
>
> My $.03 worth of thoughts.
>
> Cordially,
>
>
> Jeffrey Trimble
> Associate Director &
> Head of Information Services
> William F.  Maag Library
> Youngstown State University
> 330.941.2483 (Office)
> [email protected]
> http://www.maag.ysu.edu <http://www.maag.ysu.edu/> 
> http://digital.maag.ysu.edu <http://digital.maag.ysu.edu/> "For he is 
> the Kwisatz Haderach..."
>
>
>
>
> On 2/13/2014, 6:02 PM, "Jizba, Richard" <[email protected]> wrote:
>
>> Hardy,
>>
>> I understand that discussions about the search and browse functions 
>> are technical issues. But before technical things happen, there needs 
>> to be general discussion among the users: what are the advantages and 
>> disadvantages of the Discovery and the traditional Search? Why have 
>> some users put the money or effort into customizations? I suspect 
>> that outside of the "techies" very few users even know they have options.
>>
>> It says in the manual for 3.2 that:
>>
>> "Search is an essential component of discovery in DSpace. Users'
>> expectations from a search engine are quite high, so a goal for 
>> DSpace is to supply as many search features as possible."
>>
>> Have there been discussions with the non-technical user community  to 
>> determine what features really are important? It seems as though 
>> there is a large user base for DSpace, but I suspect most of the 
>> discussion is among the tech folks, not the non-tech user community. 
>> (I'm not even sure how you would go about communicating with those 
>> people.)
>>
>> My usage stats indicate that the interaction with our open 
>> collections is coming from the web - folks accessing the bitstreams 
>> directly from web search engines, not through the native DSpace 
>> search. Thus, these aren't actual users of DSpace "Search". (I base 
>> this on the fact that bitstream downloads often greatly exceed item 
>> views.)
>>
>> What I'd like to know is:
>>    What search functions do "actual" end users want and need?
>>    How do we identify "actual" end users and communicate with them?
>>
>> Richard Jizba
>> Health Sciences Library
>> Creighton University
>> (402) 280-5142
>> [email protected]
>>
>>
>> -----Original Message-----
>> From: Pottinger, Hardy J. [mailto:[email protected]]
>> Sent: Thursday, February 13, 2014 3:03 PM
>> To: Jizba, Richard; [email protected]
>> Cc: [email protected]
>> Subject: Re: [Dspace-general] Search in DSpace
>>
>> Hi, I note that this discussion is taking place on DSpace-general, 
>> it's probably best-suited for DSpace-tech. I say that mostly because 
>> I'm about to link to technical info :-) However, since it started in 
>> -general I'll leave it here.
>>
>> Richard, your existing Lucene customizations (in particular your 
>> custom filter code) are very likely portable to Solr [1]. I'm not 
>> promising Shangri-La, but, it's likely pretty workable. I have 
>> repository managers here who were interested in implementing the 
>> non-Porter stemming analyzer, enough that they asked me to work 
>> towards making that option configurable for DSpace. With a bunch of 
>> help from the community, we made that happen for DSpace [2]. I am 
>> *sure* we can get DSpace to do what you need, no matter the specifics 
>> of the search back-end. As we trundle on down the road to DSpace 5.0, 
>> I hope you'll continue to help us ensure the system remains usable for you 
>> and the community. Thanks!
>>
>> [1] https://wiki.apache.org/solr/SolrPlugins
>> [2] https://jira.duraspace.org/browse/DS-849
>>
>> --
>> HARDY POTTINGER <[email protected]> University of Missouri 
>> Library Systems http://lso.umsystem.edu/~pottingerhj/
>> https://MOspace.umsystem.edu/
>> "And remember, also" added the Princesss of Sweet Rhyme, "that many 
>> places you would like to see are just off the Map and many things you 
>> want to know are just out of sight or a little beyond your reach. But 
>> someday you'll reach them after all, for what you learn today, for no 
>> reason at all, will help you discover all the wonderful secrets of 
>> tomorrow."
>>
>> --Norton Juster, The Phantom Tollbooth
>>
>>
>>
>>
>>
>>
>> On 2/13/14 1:58 PM, "Jizba, Richard" <[email protected]> wrote:
>>
>>> Solr may build on Lucene, but it may also inhibit me from taking 
>>> real advantage of Lucene. We had that problem a couple of years ago 
>>> with the porter stem filter. We couldn't conduct the kind of 
>>> searches we wanted because the porter stem filter stemmed our search 
>>> terms -- and at the time, there wasn't an easy way to turn it off.
>>>
>>> I understand faceting, but I also know that sometimes the most 
>>> effective way to search is to let people who know how to search do 
>>> it in the most direct way possible. It's particularly true when they 
>>> create the collections they want to search. We have some collections 
>>> that are only searched by the people who make them. They are good 
>>> searchers who know what they are doing.
>>>
>>> Faceting, it seems to me, is aimed at the naïve user who doesn't 
>>> know anything about searching. Do such people actually search DSpace 
>>> directly through the interface, or do their searches originate in 
>>> Google, Bing, etc? In any case, we have some user groups with closed 
>>> collections in our repository and they need the traditional search 
>>> and browse functions. I just want to make sure that future dspace 
>>> developments don't adversely impact their needs. Just telling me 
>>> that Solr builds on Lucene doesn't really answer the question.
>>>
>>> Richard Jizba
>>> Health Sciences Library
>>> Creighton University
>>> (402) 280-5142
>>> [email protected]
>>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf 
>>> Of
>>> helix84
>>> Sent: Thursday, February 13, 2014 11:23 AM
>>> To: Jizba, Richard
>>> Cc: [email protected]
>>> Subject: Re: [Dspace-general] Search in DSpace
>>>
>>> Hi Richard,
>>> just a short reply.
>>>
>>> Are you aware that Solr (Discovery in DSpace uses Solr) builds on Lucene?
>>> They even support the same syntax with some minor differences and 
>>> even that is configurable. The issue is not that Lucene is worse 
>>> than Solr or anything, it's just that Solr brings many features that 
>>> aren't in pure Lucene. The reason why we dislike keeping both is 
>>> that there's a significant development, maintenance and support 
>>> burden for DSpace commiters to keep both. Count with me - two search 
>>> backends times two UIs (plus other interfaces like REST API in the 
>>> works) are four wildly different systems to work with. DSpace is not 
>>> just one platform, it's a collection of platforms. If we converge 
>>> upon a single search platform (I don't see this happening with UIs), 
>>> we'll have more time to put towards improving DSpace and adding new 
>>> features thanks to not doing double the amount of work. This will make 
>>> DSpace better in the long term.
>>>
>> >From what you said, it seems to me that everything you have should 
>> >be
>>> also possible to do in Discovery. I do understand that changing your 
>>> highly customized implementation from Lucene to Solr is a lot of work.
>>> But it has very tangible advantages.
>>>
>>>
>>> Regards,
>>> ~~helix84
>>>
>>> Compulsory reading: DSpace Mailing List Etiquette 
>>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>> --------------------------------------------------------------------
>>> ---
>>> ---
>>> ----
>>> Android apps run on BlackBerry 10
>>> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
>>> Now with support for Jelly Bean, Bluetooth, Mapview and more.
>>> Get your Android app in front of a whole new audience.  Start now.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ost
>>> g.c
>>> lkt
>>> rk
>>> _______________________________________________
>>> Dspace-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dspace-general
>>
>>
>> ---------------------------------------------------------------------
>> -----
>> ----
>> Android apps run on BlackBerry 10
>> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
>> Now with support for Jelly Bean, Bluetooth, Mapview and more.
>> Get your Android app in front of a whole new audience.  Start now.
>> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg
>> .clkt
>> rk
>> _______________________________________________
>> Dspace-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dspace-general
>>
>
>
>
> ----------------------------------------------------------------------
> --------
> Android apps run on BlackBerry 10
> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
> Now with support for Jelly Bean, Bluetooth, Mapview and more.
> Get your Android app in front of a whole new audience.  Start now.
> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.
> clktrk _______________________________________________
> Dspace-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-general
>

------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Reply via email to