A bunch of us are using Solr/lucene for discovery over library bibliographic records, which is based on the basic tf*idf weighting type algorithm, with a bunch of tweaks. So all of us doing that, and finding it pretty successful, are probably surprised to hear that this approach won't work on library data. :)

Jonathan

On 2/15/2011 4:13 PM, Dave Caroline wrote:
I wrote my own search engine for my system and thought long and hard
about relevancy, in the end went for none! and display alphabetical.

Dave Caroline

On Tue, Feb 15, 2011 at 8:32 PM, Till Kinstler<[email protected]>  wrote:
There is a vivid discussion about relevance ranking for library
resources in discovery interfaces in recent years. In articles, blog
posts and presentations on this topic, again and again possible ranking
factors are discussed beyond well known term statistic based methods
like the vector space retrieval model with tf*idf weighting (often after
claiming term statistics based approaches wouldn't work on library data,
of course without proofing that).

Usually the following possible factors are mentioned:
- popularity (often after stressing Google's success with PageRank),
measured in several ways like holding quantities, circulation
statistics, clicks in catalogues, explicit user ratings, number of
citations, ...
- freshness: rank newer items higher (ok, we have that in many old
school Boolean OPACs as "sort by date", but not in combination with
other ranking factors like term statistics)
- availability
- contextual/individual factors, eg. if (user.status=student)
boost(textbook); if (user.faculty=economics) boost(Karl Marx); if
season=christmas boost(gingerbread recipes); ...
- ...

I tried to find examples where such factors beyond term statistics are
used to rank search results in libraryland. But I hardly find them, only
lots of theoretical discussions about all the pros and cons of all
thinkable factors going on since the 1980s. I mean, all that is doable
with search engines like Solr today. But it seems, it is hardly
implemented somewhere in real systems (beyond simple cases, for example
we slightly boost hits in collections a user has immediate online access
to, but we never asked users, if they like it or notice at all).
WorldCat does a little bit something, it seems. They, of course, boost
resources with local holdings in WorldCat local. And they use language
preferences (Accept-Language HTTP header) for boosting titles in users'
preferred languages. And there might be more in WorldCat ranking. But
there is not much published on that, it seems?

So, if you implemented something beyond term statistics based ranking,
speak up and show. I am very interested in real world implementations
and experiences (like user feedback, user studies etc.).

Thanks,
Till

Reply via email to