Re: [CODE4LIB] Ranking factors for library resources: Who really uses what?

Jonathan Rochkind Tue, 15 Feb 2011 13:50:21 -0800

A bunch of us are using Solr/lucene for discovery over librarybibliographic records, which is based on the basic tf*idf weighting typealgorithm, with a bunch of tweaks. So all of us doing that, andfinding it pretty successful, are probably surprised to hear that thisapproach won't work on library data. :)


Jonathan


On 2/15/2011 4:13 PM, Dave Caroline wrote:

I wrote my own search engine for my system and thought long and hard
about relevancy, in the end went for none! and display alphabetical.

Dave Caroline

On Tue, Feb 15, 2011 at 8:32 PM, Till Kinstler<[email protected]>  wrote:

There is a vivid discussion about relevance ranking for library
resources in discovery interfaces in recent years. In articles, blog
posts and presentations on this topic, again and again possible ranking
factors are discussed beyond well known term statistic based methods
like the vector space retrieval model with tf*idf weighting (often after
claiming term statistics based approaches wouldn't work on library data,
of course without proofing that).

Usually the following possible factors are mentioned:
- popularity (often after stressing Google's success with PageRank),
measured in several ways like holding quantities, circulation
statistics, clicks in catalogues, explicit user ratings, number of
citations, ...
- freshness: rank newer items higher (ok, we have that in many old
school Boolean OPACs as "sort by date", but not in combination with
other ranking factors like term statistics)
- availability
- contextual/individual factors, eg. if (user.status=student)
boost(textbook); if (user.faculty=economics) boost(Karl Marx); if
season=christmas boost(gingerbread recipes); ...
- ...

I tried to find examples where such factors beyond term statistics are
used to rank search results in libraryland. But I hardly find them, only
lots of theoretical discussions about all the pros and cons of all
thinkable factors going on since the 1980s. I mean, all that is doable
with search engines like Solr today. But it seems, it is hardly
implemented somewhere in real systems (beyond simple cases, for example
we slightly boost hits in collections a user has immediate online access
to, but we never asked users, if they like it or notice at all).
WorldCat does a little bit something, it seems. They, of course, boost
resources with local holdings in WorldCat local. And they use language
preferences (Accept-Language HTTP header) for boosting titles in users'
preferred languages. And there might be more in WorldCat ranking. But
there is not much published on that, it seems?

So, if you implemented something beyond term statistics based ranking,
speak up and show. I am very interested in real world implementations
and experiences (like user feedback, user studies etc.).

Thanks,
Till

Re: [CODE4LIB] Ranking factors for library resources: Who really uses what?

Reply via email to