Hello. Opinions on this were requested.
I agree completely with Richard Jizba. John John Schumacher Office of Library and Information Services SUNY System Administration SUNY Plaza Albany, NY 12246 518-320-1477 (Note, new number!) 518-320-1554 (fax) john.schumac...@suny.edu SUNY Digital Repository http://dspace.sunyconnect.suny.edu/ ==== Philosophical Discussion ==== I am little surprised that the DSpace community thinks stemming like that done by the Porter Stemming Algorithm is so important. I have been searching bibliographic databases since the early 1980s and teach courses to our health sciences students on search techniques. We have always appreciated the systems that give us the power to find exactly the terms and the combinations we want. Language is just too rich and varied for any other approach in my experience. There have been many times when I have needed to search for a singular form of a noun vs a plural form or vice versa. Using truncation and wildcard operators is not rocket science. Lucene has some really powerful search operators, but their power is basically nullified by the Stemming operation. Our DSpace instance isn't aimed primarily at a broad worldwide user base, but select groups of students, staff and faculty with rather sophisticated information needs. Besides, most of our collection can also be discovered through Google. Why duplicate that, when I have the option of also creating an alternative search environment that provides for sophisticated, analytical searches of scholarly, curricular and administrative documents? You might be surprised at how quickly the people in our Office of Medical Education have picked up on the nuances of how and where they put metadata, the need for standardized vocabulary in defining lecture objectives, and how quickly they figured out what was happening to their attempts to search for "wellness" (stemmed to "well"). (It did not surprise me!) I think the distributed community administration available with DSpace will really help our faculty and staff take seriously the data (text) they put into their collections. Our expertise as "consultants" and trainers to the staff in the Office of Medical Education has really made them appreciate the expertise of librarians, particularly my reference librarians who have very good analytical search skills. Don't sell people short -- they can be very sophisticated which means we need to provide them with powerful tools, not heavy-handed interventions (the Porter Algorithm) I'm planning on being at OR11 and would be happy to discuss this over a beer. If anybody is still with me, I would be curious if there is a LowerCaseFilter that would permit the retention of capital 'A's. Eliminating 'A's in medical research databases is a problem. Vitamin A is the obvious example, but there are many other occurrences of 'A' as an important, non-trivial term in a name. Richard Jizba Creighton University ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech