Hi All,
Just to add to the discussion here... these are all great questions and
worth considering. If there are issues with how Discovery/Solr works or
if it's not ideal, then we have options to either fix it or look towards
a better solution altogether.
Apologies for what is a rather long email, but I'm trying to best
explain the Committer thought processes here...
First, it's worth being aware that Discovery/Solr is actually being used
both for Search and for Browse. Prior to Discover/Solr, Search was
performed via Lucene, while Browse was a custom built system which used
the underlying database (Postgres).
To try to explain the "Committer" point of view/discussions that have
taken place since then:
* Committers began to run into limitations of our custom browse system
and Lucene based on some of the Search/Browse feature requests coming
in. While some of these features could have been implemented in Lucene,
we noticed that a combined Search/Browse in Solr was beginning to look
more favorable.
* Committers have been struggling with the sometimes "patchwork" nature
of the DSpace codebase (which happens after 10 years). DSpace is
extremely powerful and extremely configurable. But, as we have limited
resources, we've come to the realization that we need to simplify and
modularize the codebase little by little (while trying to ensure DSpace
retains its strong niche).
* Committers were also noticing that more and more open source systems
these days (Hydra, Islandora, even EPrints) are using Solr to handle
both Search & Browse. Solr has begun to become a defacto "standard" in
many ways, and it's used well beyond the IR community as well.
However, even within the Committers group, not all have agreed that Solr
is *the best solution*, which is part of the reason why "Discovery"
exists. Without getting too technical, Discovery is essentially a
"generic search/browse" layer built for DSpace. Discovery itself
actually could/can provide multiple plugins for multiple search engines.
The Committers have actually discussed building a Lucene plugin for
Discovery and even an Elastic Search plugin for Discovery.
So, in the future, it's possible DSpace Search/Browse could look like
this (and individual institutions could choose which plugin you wanted):
* Discovery Search/Browse
* Solr plugin for Discovery
* Lucene plugin for Discovery (doesn't yet exist)
* Elastic Search plugin for Discovery (doesn't yet exist)
So, the Committers have never decided to completely remove Lucene
support forever. All that we've decided is that we need to standardize
on a common Search/Browse platform (which is Discovery), because we just
don't have enough development resources right now to build/maintain
multiple completely separate search/browse codebases (Discovery &
traditional Lucene search are entirely separate codebases, and those
separate codebases still exist even in DSpace 4.x). The reality though
is that currently Discovery only supports Solr. We hope that it can
support Lucene and other search tools as well, but we need to find
committer resources or other volunteers to help us build those
additional search plugins.
I also fully agree here that it'd be wonderful to find better ways to
survey our "actual" end users to better determine their search/browse
needs out of DSpace. This is something that I'd also love to see happen.
Unfortunately it's not a role the Committers are able to play as we are
not experts in writing/drafting/promoting user surveys. Perhaps however
the upcoming DSpace Steering Committee and/or DCAT could help us in
finding ways to survey the community and our users.
This does sound like a discussion we could begin at the upcoming
DuraSpace Sponsors Summit in March (for those in attendance). In general
we don't always have a smooth process in place to survey the community
about these decisions. The Committers team sometimes has less than
perfect information and we sometimes have to make difficult decisions
based on our existing resources (and it's not always clear which
decisions are potentially controversial to others in the community). So,
this is a great discussion to be starting, both specifically with
regards to Search/Browse, and generally with regards to how best to
survey our widespread community, etc.
I hope this helps the discussion. Glad to also clarify anything I've
said if it's unclear.
- Tim
On 2/13/2014 8:15 PM, Jeffrey A Trimble wrote:
> On a completely end user note, we have found our Discovery service for our
> Library is not well liked by undergraduate students. The results are too
> large most of the time (10K+) and they (the user) frustrate easily if they
> have to learn to customize the search.
>
> Our Information Literacy/Bibliographic Instruction Librarians have stopped
> teaching Discovery Layer Services and the norm. (We us EBSCO Discovery
> Service which is a Rolls Royce!). The EDS for us not only searches our
> local loads (local databases, local electronic resources, online catalogs,
> DSPACE server) but also all of OhioLINK. It is really overwhelming for
> them.
>
> We still teach traditional Keyword Boolean as the starting point and move
> to the browse queries and then to the ³pre coordinated searches² such as
> Library of Congress Subject Headings. Pre-Coordinated searches is a fancy
> name for Controlled Subject Vocabulary.
>
> It will be interesting to see how FAST headings will affect searching as
> OCLC derives them from LCSH and as ILS¹ begin to index them into browse
> searching and keyword/boolean searching.
>
> I think that Discovery Layers are attempting to compete with Google
> searching. And the rhetorical question or theoretical question is does
> discovery have Œdeliverables¹ without drilling down into the results to
> get what you really came for?
>
> Professionally and personally, I do use Discovery, but I¹m a trained
> professional, not a dilettante in the information seeking world.
>
> We are in a major paradigm shift that has truly only begun, and it will be
> another 15 years before the shift sees true results‹some of them will be
> tied to societal changes.
>
> My $.03 worth of thoughts.
>
> Cordially,
>
>
> Jeffrey Trimble
> Associate Director &
> Head of Information Services
> William F. Maag Library
> Youngstown State University
> 330.941.2483 (Office)
> [email protected]
> http://www.maag.ysu.edu <http://www.maag.ysu.edu/>
> http://digital.maag.ysu.edu <http://digital.maag.ysu.edu/>
> "For he is the Kwisatz Haderach..."
>
>
>
>
> On 2/13/2014, 6:02 PM, "Jizba, Richard" <[email protected]> wrote:
>
>> Hardy,
>>
>> I understand that discussions about the search and browse functions are
>> technical issues. But before technical things happen, there needs to be
>> general discussion among the users: what are the advantages and
>> disadvantages of the Discovery and the traditional Search? Why have some
>> users put the money or effort into customizations? I suspect that outside
>> of the "techies" very few users even know they have options.
>>
>> It says in the manual for 3.2 that:
>>
>> "Search is an essential component of discovery in DSpace. Users'
>> expectations from a search engine are quite high, so a goal for DSpace is
>> to supply as many search features as possible."
>>
>> Have there been discussions with the non-technical user community to
>> determine what features really are important? It seems as though there is
>> a large user base for DSpace, but I suspect most of the discussion is
>> among the tech folks, not the non-tech user community. (I'm not even sure
>> how you would go about communicating with those people.)
>>
>> My usage stats indicate that the interaction with our open collections is
>> coming from the web - folks accessing the bitstreams directly from web
>> search engines, not through the native DSpace search. Thus, these aren't
>> actual users of DSpace "Search". (I base this on the fact that bitstream
>> downloads often greatly exceed item views.)
>>
>> What I'd like to know is:
>> What search functions do "actual" end users want and need?
>> How do we identify "actual" end users and communicate with them?
>>
>> Richard Jizba
>> Health Sciences Library
>> Creighton University
>> (402) 280-5142
>> [email protected]
>>
>>
>> -----Original Message-----
>> From: Pottinger, Hardy J. [mailto:[email protected]]
>> Sent: Thursday, February 13, 2014 3:03 PM
>> To: Jizba, Richard; [email protected]
>> Cc: [email protected]
>> Subject: Re: [Dspace-general] Search in DSpace
>>
>> Hi, I note that this discussion is taking place on DSpace-general, it's
>> probably best-suited for DSpace-tech. I say that mostly because I'm about
>> to link to technical info :-) However, since it started in -general I'll
>> leave it here.
>>
>> Richard, your existing Lucene customizations (in particular your custom
>> filter code) are very likely portable to Solr [1]. I'm not promising
>> Shangri-La, but, it's likely pretty workable. I have repository managers
>> here who were interested in implementing the non-Porter stemming
>> analyzer, enough that they asked me to work towards making that option
>> configurable for DSpace. With a bunch of help from the community, we made
>> that happen for DSpace [2]. I am *sure* we can get DSpace to do what you
>> need, no matter the specifics of the search back-end. As we trundle on
>> down the road to DSpace 5.0, I hope you'll continue to help us ensure the
>> system remains usable for you and the community. Thanks!
>>
>> [1] https://wiki.apache.org/solr/SolrPlugins
>> [2] https://jira.duraspace.org/browse/DS-849
>>
>> --
>> HARDY POTTINGER <[email protected]> University of Missouri Library
>> Systems http://lso.umsystem.edu/~pottingerhj/
>> https://MOspace.umsystem.edu/
>> "And remember, also" added the Princesss of Sweet Rhyme, "that many
>> places you would like to see are just off the Map and many things you
>> want to know are just out of sight or a little beyond your reach. But
>> someday you'll reach them after all, for what you learn today, for no
>> reason at all, will help you discover all the wonderful secrets of
>> tomorrow."
>>
>> --Norton Juster, The Phantom Tollbooth
>>
>>
>>
>>
>>
>>
>> On 2/13/14 1:58 PM, "Jizba, Richard" <[email protected]> wrote:
>>
>>> Solr may build on Lucene, but it may also inhibit me from taking real
>>> advantage of Lucene. We had that problem a couple of years ago with the
>>> porter stem filter. We couldn't conduct the kind of searches we wanted
>>> because the porter stem filter stemmed our search terms -- and at the
>>> time, there wasn't an easy way to turn it off.
>>>
>>> I understand faceting, but I also know that sometimes the most
>>> effective way to search is to let people who know how to search do it
>>> in the most direct way possible. It's particularly true when they
>>> create the collections they want to search. We have some collections
>>> that are only searched by the people who make them. They are good
>>> searchers who know what they are doing.
>>>
>>> Faceting, it seems to me, is aimed at the naïve user who doesn't know
>>> anything about searching. Do such people actually search DSpace
>>> directly through the interface, or do their searches originate in
>>> Google, Bing, etc? In any case, we have some user groups with closed
>>> collections in our repository and they need the traditional search and
>>> browse functions. I just want to make sure that future dspace
>>> developments don't adversely impact their needs. Just telling me that
>>> Solr builds on Lucene doesn't really answer the question.
>>>
>>> Richard Jizba
>>> Health Sciences Library
>>> Creighton University
>>> (402) 280-5142
>>> [email protected]
>>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf Of
>>> helix84
>>> Sent: Thursday, February 13, 2014 11:23 AM
>>> To: Jizba, Richard
>>> Cc: [email protected]
>>> Subject: Re: [Dspace-general] Search in DSpace
>>>
>>> Hi Richard,
>>> just a short reply.
>>>
>>> Are you aware that Solr (Discovery in DSpace uses Solr) builds on Lucene?
>>> They even support the same syntax with some minor differences and even
>>> that is configurable. The issue is not that Lucene is worse than Solr
>>> or anything, it's just that Solr brings many features that aren't in
>>> pure Lucene. The reason why we dislike keeping both is that there's a
>>> significant development, maintenance and support burden for DSpace
>>> commiters to keep both. Count with me - two search backends times two
>>> UIs (plus other interfaces like REST API in the works) are four wildly
>>> different systems to work with. DSpace is not just one platform, it's a
>>> collection of platforms. If we converge upon a single search platform
>>> (I don't see this happening with UIs), we'll have more time to put
>>> towards improving DSpace and adding new features thanks to not doing
>>> double the amount of work. This will make DSpace better in the long term.
>>>
>> >From what you said, it seems to me that everything you have should be
>>> also possible to do in Discovery. I do understand that changing your
>>> highly customized implementation from Lucene to Solr is a lot of work.
>>> But it has very tangible advantages.
>>>
>>>
>>> Regards,
>>> ~~helix84
>>>
>>> Compulsory reading: DSpace Mailing List Etiquette
>>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>> -----------------------------------------------------------------------
>>> ---
>>> ----
>>> Android apps run on BlackBerry 10
>>> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
>>> Now with support for Jelly Bean, Bluetooth, Mapview and more.
>>> Get your Android app in front of a whole new audience. Start now.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.c
>>> lkt
>>> rk
>>> _______________________________________________
>>> Dspace-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dspace-general
>>
>>
>> --------------------------------------------------------------------------
>> ----
>> Android apps run on BlackBerry 10
>> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
>> Now with support for Jelly Bean, Bluetooth, Mapview and more.
>> Get your Android app in front of a whole new audience. Start now.
>> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clkt
>> rk
>> _______________________________________________
>> Dspace-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dspace-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Android apps run on BlackBerry 10
> Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
> Now with support for Jelly Bean, Bluetooth, Mapview and more.
> Get your Android app in front of a whole new audience. Start now.
> http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
> _______________________________________________
> Dspace-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-general
>
------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience. Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general