Almost all searches by almost all users, globally, are done on half a 
dozen search platforms (google, bing, etc). These all use extensive 
normalisation (stemming, case folding, unicode normalisation, etc) and 
doing a fabulous job of teaching their users (and by extension our 
users) that this is the way "search" is done.

While I fully support using controlled vocabulary and authority control 
for terms we care about, I believe the battle to define how full-text 
search should work has was lost some time ago (probably last millennia).

I suspect that the real solution to problems such as Richard Jizba's is 
a automated extraction of controlled vocabularies (Medical Subject 
Headings in this case) from the full text and then a browsing / search 
interface to than.

cheers
stuart


On 03/02/11 07:36, Schumacher, John wrote:
> Hello.
>
> Opinions on this were requested.
>
> I agree completely with Richard Jizba.
>
> John
>
> John Schumacher
> Office of Library and Information Services
> SUNY System Administration
> SUNY Plaza
> Albany, NY 12246
> 518-320-1477 (Note, new number!)
> 518-320-1554 (fax)
> [email protected]
> SUNY Digital Repository
> http://dspace.sunyconnect.suny.edu/
>
>
> ==== Philosophical Discussion ====
>
> I am little surprised that the DSpace community thinks stemming like
> that done by the Porter Stemming Algorithm is so important. I have been
> searching bibliographic databases since the early 1980s and teach
> courses to our health sciences students on search techniques. We have
> always appreciated the systems that give us the power to find exactly
> the terms and the combinations we want. Language is just too rich and
> varied for any other approach in my experience. There have been many
> times when I have needed to search for a singular form of a noun vs a
> plural form or vice versa. Using truncation and wildcard operators is
> not rocket science. Lucene has some really powerful search operators,
> but their power is basically nullified by the Stemming operation.
>
> Our DSpace instance isn't aimed primarily at a broad worldwide user
> base, but select groups of students, staff and faculty with rather
> sophisticated information needs. Besides, most of our collection can
> also be discovered through Google. Why duplicate that, when I have the
> option of also creating an alternative search environment that provides
> for sophisticated, analytical searches of scholarly, curricular and
> administrative documents?
>
> You might be surprised at how quickly the people in our Office of
> Medical Education have picked up on the nuances of how and where they
> put metadata, the need for standardized vocabulary in defining lecture
> objectives, and how quickly they figured out what was happening to their
> attempts to search for "wellness" (stemmed to "well"). (It did not
> surprise me!)
>
> I think the distributed community administration available with DSpace
> will really help our faculty and staff  take seriously the data (text)
> they put into their collections. Our expertise as "consultants" and
> trainers to the staff in the Office of Medical Education has really made
> them appreciate the expertise of librarians, particularly my reference
> librarians who have very good analytical search skills. Don't sell
> people short -- they can be very sophisticated which means we need to
> provide them with powerful tools, not heavy-handed interventions (the
> Porter Algorithm)
>
> I'm planning on being at OR11 and would be happy to discuss this over a
> beer.
>
> If anybody is still with me, I would be curious if there is a
> LowerCaseFilter that would permit the retention of capital 'A's.
> Eliminating 'A's in medical research databases is a problem. Vitamin A
> is the obvious example, but there are many other occurrences of 'A' as
> an important, non-trivial term in a name.
>
> Richard Jizba
> Creighton University
>
>
> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>


-- 
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to