Quoting Keith Jenkins <k...@cornell.edu>:


The frequency of an LCSH term within the LC catalog could also be
useful for ranking, although I'm not sure if such data would be
readily available.

Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy of the LC subject authority file. The entries in this file form the basis for subject headings, most of which add "facets" to the authority entry when forming the subject heading. One could do a left-anchored match against actual headings, and that might provide some interesting statistics.

Edward Betts of the Open Library project did some casual data gathering for subjects, and posted his "top 1000" subject headings (not subject authorities):
http://edwardbetts.com/ol/top_1000_subjects
The OL has decided to break up the subject headings into their subfields, and somewhere there are some pages that show some subfields with the highest ranking subfields they appear with. (There must be a better way to say that! Sorry, too early, too few cups of tea.) One example is here:
http://home.us.archive.org/~edward/related/Cheese.html
I think that something like this will be incorporated into the next version of OL, which will be heavily navigation-oriented rather than search-oriented.

kc
p.s. Anyone who wants to play with a file can grab the OL data export:

http://openlibrary.org/dev/docs/jsondump

Unfortunately it includes both LC and non-LC subjects (mainly BISAC from Amazon)


Another possibility would be a simple count of broader terms +
narrower terms + related terms or something like that.  Although
PageRank would probably be better, since even some "important" terms
might have a relatively small number of immediately-adjacent links.

Keith


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Reply via email to