Almost all searches by almost all users, globally, are done on half a dozen search platforms (google, bing, etc). These all use extensive normalisation (stemming, case folding, unicode normalisation, etc) and doing a fabulous job of teaching their users (and by extension our users) that this is the way "search" is done.
While I fully support using controlled vocabulary and authority control for terms we care about, I believe the battle to define how full-text search should work has was lost some time ago (probably last millennia). I suspect that the real solution to problems such as Richard Jizba's is a automated extraction of controlled vocabularies (Medical Subject Headings in this case) from the full text and then a browsing / search interface to than. cheers stuart On 03/02/11 07:36, Schumacher, John wrote: > Hello. > > Opinions on this were requested. > > I agree completely with Richard Jizba. > > John > > John Schumacher > Office of Library and Information Services > SUNY System Administration > SUNY Plaza > Albany, NY 12246 > 518-320-1477 (Note, new number!) > 518-320-1554 (fax) > [email protected] > SUNY Digital Repository > http://dspace.sunyconnect.suny.edu/ > > > ==== Philosophical Discussion ==== > > I am little surprised that the DSpace community thinks stemming like > that done by the Porter Stemming Algorithm is so important. I have been > searching bibliographic databases since the early 1980s and teach > courses to our health sciences students on search techniques. We have > always appreciated the systems that give us the power to find exactly > the terms and the combinations we want. Language is just too rich and > varied for any other approach in my experience. There have been many > times when I have needed to search for a singular form of a noun vs a > plural form or vice versa. Using truncation and wildcard operators is > not rocket science. Lucene has some really powerful search operators, > but their power is basically nullified by the Stemming operation. > > Our DSpace instance isn't aimed primarily at a broad worldwide user > base, but select groups of students, staff and faculty with rather > sophisticated information needs. Besides, most of our collection can > also be discovered through Google. Why duplicate that, when I have the > option of also creating an alternative search environment that provides > for sophisticated, analytical searches of scholarly, curricular and > administrative documents? > > You might be surprised at how quickly the people in our Office of > Medical Education have picked up on the nuances of how and where they > put metadata, the need for standardized vocabulary in defining lecture > objectives, and how quickly they figured out what was happening to their > attempts to search for "wellness" (stemmed to "well"). (It did not > surprise me!) > > I think the distributed community administration available with DSpace > will really help our faculty and staff take seriously the data (text) > they put into their collections. Our expertise as "consultants" and > trainers to the staff in the Office of Medical Education has really made > them appreciate the expertise of librarians, particularly my reference > librarians who have very good analytical search skills. Don't sell > people short -- they can be very sophisticated which means we need to > provide them with powerful tools, not heavy-handed interventions (the > Porter Algorithm) > > I'm planning on being at OR11 and would be happy to discuss this over a > beer. > > If anybody is still with me, I would be curious if there is a > LowerCaseFilter that would permit the retention of capital 'A's. > Eliminating 'A's in medical research databases is a problem. Vitamin A > is the obvious example, but there are many other occurrences of 'A' as > an important, non-trivial term in a name. > > Richard Jizba > Creighton University > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech > -- Stuart Yeates Library Technology Services http://www.victoria.ac.nz/library/ ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

