I start to see this issue as one that is compounding. If the majority of searches are for an artist's name, consider the number of ULAN entries for Da Vinci or Caravaggio? Now multiply that number by the ability of people to spell that name.
Lets put artist's names aside and consider an attribute name, an end-user may not be considering the standardized terminology that the museum cataloguers are using, so unless your search engine is utilizing something like the AAT, your terms will have a lower accuracy. Now feed misspellings into the equation. Does the algorithm also search through your user contributed tags (folksonomies) and, if so, does it account for their misspellings? How does it calculate misspelled words: using a soundex, the Levenstein distance, or is it using a pre-populated index of commonly misspelled words? This doesn't even start to address ranking issues - why does a search on our website for the term "sword" rank a "Ceremonial Cup" vastly higher then an object named "Sword (Kilij)" - or any of the other issues. Can't someone build a better search algorithm specifically to meet the needs of the museum community? Is the Getty preparing an API, so that search engine technology can be plugged into their multilingual/conceptual lexicons? Surely, if Netflix can improve their matching algorithm, Cinematch, by 8.57% (more accuracy), then we, as a community, can rally to build a better search engine. (Ok - granted, they're offering a 7 figure reward if that improvement hits 10% - but still). Or do we just accept that Google already did? Chad M Petrovay Collections Database Administrator P: 410.547.9000 x266 -----Original Message----- From: mcn-l-bounces at mcn.edu [mailto:[email protected]] On Behalf Of j trant Sent: Friday, March 28, 2008 12:07 PM To: Museum Computer Network Listserv Subject: Re: [MCN-L] Effects of Google's 'search within site'? Anyone else affected? Frankie, You've done more than a lot of people have done in looking at your search logs. When i looked at the Guggenheims's -- as a prototype for some steve data analysis -- i did a literature search and couldn't find any serious studies of museum searching [this really surprised me, btw.] Each collection is likely to have its own patterns: in this modern art museum 63% of the most common searches (searched 10 or more times) were for artists' names. Amazingly about 25% of the searches of this collection produced no result [in an age of millions of results elsewhere, this is a real problem]. Looking at the search failures: - 36% were caused by spelling errors, so "did you mean..." would really help. - 50% of the unsuccessful artist name searches were caused by spelling errors. The paper, and a blog post with more detail, are at http://conference.archimuse.com/blog/jtrant/searching_museum_collections _on_line_what_do_peo Since search is a favourite navigation mechanism we really do need to pay more attention to it, both on and now off museum sites. /jt At 3:46 PM +0000 3/28/08, frankie roberto wrote: > > The most likely impact for us is in upcoming modifications to our own >> search. > >... > >> Do we give up, and acknowledge >> that doing search in any way different from Google is a) now competing more >> directly with them, and b) probably just getting more confusing for most >> visitors; or do we focus on these (probably fewer and fewer) visitors who >> come to our search expecting it to work just the way it should, not the way >> that's easiest? > >Hmm, very interesting point. > >I'm sure I'm not alone in saying that we spend far far less time >looking at our search interface than we ought to. Our site search is >powered by a Google Mini, and other than providing thumbnails object >pages returned, it's pretty much working in its out-of-the-box >configuration. We haven't invested any time in editorially 'promoting' >results for certain search terms, for instance, or in setting up >synonyms. > >In fact, this discussion has prompted me to do a quick report of the >most popular search terms, which are: > >1. games - 1,1012 >2. grain strain (old game) - 498 >3. jobs - 488 >4. wroughton (object storage site) - 474 >5. search - 447 (amusing this is the default search text, so >represents people pressing search without typing anything) >6. launchball - 280 >7. opening times - 252 (shockingly, this doesn't return anything >hugely useful, and so 11% try refining their search) >8. bbc micro - 202 (in the news recently, but only returns press releases) >9. builder - 192 (no idea what this is about) >10. energy - 150 (presumably teachers looking for energy microsite) > >This data is for the last month, and was gathered by the excellent >'site serch' function which can be set up in Google Analytics (which >allows you to monitor search terms, regardless of which search >technology you use). > >Generally, I site search seems to be hugely neglected by website >owners (mea culpa), which is presumably why people are turning to >Google more and more. > >Frankie >_______________________________________________ >You are currently subscribed to mcn-l, the listserv of the Museum >Computer Network (http://www.mcn.edu) > >To post to this list, send messages to: mcn-l at mcn.edu > >To unsubscribe or change mcn-l delivery options visit: >http://toronto.mediatrope.com/mailman/listinfo/mcn-l -- __________ J. Trant jtrant at archimuse.com Partner & Principal Consultant phone: +1 416 691 2516 Archives & Museum Informatics fax: +1 416 352 6025 158 Lee Ave, Toronto Ontario M4E 2P3 Canada http://www.archimuse.com __________ _______________________________________________ You are currently subscribed to mcn-l, the listserv of the Museum Computer Network (http://www.mcn.edu) To post to this list, send messages to: mcn-l at mcn.edu To unsubscribe or change mcn-l delivery options visit: http://toronto.mediatrope.com/mailman/listinfo/mcn-l
