I start to see this issue as one that is compounding. If the majority of
searches are for an artist's name, consider the number of ULAN entries
for Da Vinci or Caravaggio? Now multiply that number by the ability of
people to spell that name.

Lets put artist's names aside and consider an attribute name, an
end-user may not be considering the standardized terminology that the
museum cataloguers are using, so unless your search engine is utilizing
something like the AAT, your terms will have a lower accuracy. Now feed
misspellings into the equation. Does the algorithm also search through
your user contributed tags (folksonomies) and, if so, does it account
for their misspellings? How does it calculate misspelled words: using a
soundex, the Levenstein distance, or is it using a pre-populated index
of commonly misspelled words?

This doesn't even start to address ranking issues - why does a search on
our website for the term "sword" rank a "Ceremonial Cup" vastly higher
then an object named "Sword (Kilij)" - or any of the other issues.

Can't someone build a better search algorithm specifically to meet the
needs of the museum community? Is the Getty preparing an API, so that
search engine technology can be plugged into their
multilingual/conceptual lexicons?

Surely, if Netflix can improve their matching algorithm, Cinematch, by
8.57% (more accuracy), then we, as a community, can rally to build a
better search engine. (Ok - granted, they're offering a 7 figure reward
if that improvement hits 10% - but still).

Or do we just accept that Google already did?

Chad M Petrovay
Collections Database Administrator
P: 410.547.9000 x266


-----Original Message-----
From: mcn-l-bounces at mcn.edu [mailto:[email protected]] On Behalf Of
j trant
Sent: Friday, March 28, 2008 12:07 PM
To: Museum Computer Network Listserv
Subject: Re: [MCN-L] Effects of Google's 'search within site'? Anyone
else affected?


Frankie,

You've done more than a lot of people have done in looking at your 
search logs. When i looked at the Guggenheims's -- as  a prototype 
for some steve data analysis -- i did a literature search and 
couldn't find any serious studies of museum searching [this really 
surprised me, btw.]

Each collection is likely to have its own patterns: in this modern 
art museum 63% of the most common searches (searched 10 or more 
times) were for artists' names.

Amazingly about 25% of the searches of this collection produced  no 
result [in an age of millions of results elsewhere, this is a real 
problem].

Looking at the search failures:
        - 36% were caused by spelling errors, so "did you mean..." 
would really help.
        - 50% of  the unsuccessful artist name searches were caused 
by spelling errors.

The paper, and a blog post with more detail, are at 
http://conference.archimuse.com/blog/jtrant/searching_museum_collections
_on_line_what_do_peo

Since search is a favourite navigation mechanism we really do need to 
pay more attention to it, both on and now off museum sites.

/jt

At 3:46 PM +0000 3/28/08, frankie roberto wrote:
>  >  The most likely impact for us is in upcoming modifications to our
own
>>   search.
>
>...
>
>>   Do we give up, and acknowledge
>>   that doing search in any way different from Google is a) now
competing more
>>   directly with them, and b) probably just getting more confusing for
most
>>   visitors; or do we focus on these (probably fewer and fewer)
visitors who
>>   come to our search expecting it to work just the way it should, not
the way
>>   that's easiest?
>
>Hmm, very interesting point.
>
>I'm sure I'm not alone in saying that we spend far far less time
>looking at our search interface than we ought to. Our site search is
>powered by a Google Mini, and other than providing thumbnails object
>pages returned, it's pretty much working in its out-of-the-box
>configuration. We haven't invested any time in editorially 'promoting'
>results for certain search terms, for instance, or in setting up
>synonyms.
>
>In fact, this discussion has prompted me to do a quick report of the
>most popular search terms, which are:
>
>1. games - 1,1012
>2. grain strain (old game) - 498
>3. jobs - 488
>4. wroughton (object storage site) - 474
>5. search - 447 (amusing this is the default search text, so
>represents people pressing search without typing anything)
>6. launchball - 280
>7. opening times - 252 (shockingly, this doesn't return anything
>hugely useful, and so 11% try refining their search)
>8. bbc micro - 202 (in the news recently, but only returns press
releases)
>9. builder - 192 (no idea what this is about)
>10. energy - 150 (presumably teachers looking for energy microsite)
>
>This data is for the last month, and was gathered by the excellent
>'site serch' function which can be set up in Google Analytics (which
>allows you to monitor search terms, regardless of which search
>technology you use).
>
>Generally, I site search seems to be hugely neglected by website
>owners (mea culpa), which is presumably why people are turning to
>Google more and more.
>
>Frankie
>_______________________________________________
>You are currently subscribed to mcn-l, the listserv of the Museum 
>Computer Network (http://www.mcn.edu)
>
>To post to this list, send messages to: mcn-l at mcn.edu
>
>To unsubscribe or change mcn-l delivery options visit:
>http://toronto.mediatrope.com/mailman/listinfo/mcn-l


-- 
__________
J. Trant                                jtrant at archimuse.com
Partner & Principal Consultant          phone: +1 416 691 2516
Archives & Museum Informatics           fax: +1 416 352 6025
158 Lee Ave, Toronto
Ontario M4E 2P3 Canada          http://www.archimuse.com
__________
_______________________________________________
You are currently subscribed to mcn-l, the listserv of the Museum
Computer Network (http://www.mcn.edu)

To post to this list, send messages to: mcn-l at mcn.edu

To unsubscribe or change mcn-l delivery options visit:
http://toronto.mediatrope.com/mailman/listinfo/mcn-l

Reply via email to