not quite the same idea, but close!
===
http://www.cnnfn.com/digitaljam/newsbytes/130617.html
WWW Conference - IBM Search Engine "Trawls" Web
May 13, 1999: 3:37 p.m. ET
TORONTO, ONTARIO, CANADA (NB) -- By Grant Buckler,
Newsbytes. If three separate World Wide Web pages contain
hyperlinks to three other sites, something is going on. That
principle underlies Clever, search technology developed by IBM
[NYSE:IBM] researchers that might offer a better way of locating
information on the Web. Clever brought the research team - Ravi
Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew
Tomkins - the distinction of best paper at this year's World Wide
Web Conference, taking place here this week.
Entitled "Trawling the Web for Emerging Cyber Communities," the
paper describes an approach that uses patterns of
interconnections among Web sites to discover communities that
one might not expect. It can also locate information in ways a
conventional search engine would not.
In an interview with Newsbytes, Raghavan said a good deal of
research went into choosing the number three as the critical point.
One page pointing to two other sites usually does not indicate any
important connection among the three sites, he said. Even two
pointing to two is not significant enough. But when three sites all
point to three other sites, and the links are not what Raghavan
called nepotistic - that is, there is an obvious link among the pages
such as the fact that they are part of the same organization's Web
site - it is about 95 percent certain that there is something
meaningful in common.
One use of this principle could be in discovering unexpected
communities on the Web. That could have applications to
marketing, among other things, Raghavan said.
Another application is finding sites that meet certain criteria but
that might not be easy to find with the conventional text-searching
approach. An example might be an analyst who wants information
about mainframe computers. IBM is one of the major
manufacturers of mainframe computers, and yet, Raghavan said, if
you look at IBM's Web site you will not find the word "mainframe."
Because the word is out of fashion, vendors use others such as
"server." So a conventional search on the word "mainframe" might
miss a lot of the most useful material.
However, there are on the Web many of what Raghavan called "hub
sites," often created by people interested in a particular topic.
Such a site dealing with mainframes might point to IBM's site as
well as those of Fujitsu and Hitachi. A Clever search would pick up
on the fact that sites pointing to one of these companies' sites also
tend to point to the others, and would discern a community. Given
a term such as "mainframe" or the names IBM and Hitachi, say, it
would find mainframe-related sites including the vendors' home
pages, even though those pages don't actually contain the word
"mainframe."
Raghavan said IBM sees Clever not as a general-purpose
consumer search engine but as a tool for more sophisticated
searchers. It is currently in pilot use within IBM, and the company
will move quickly to bring it to market, possibly by working with one
or more marketing partners, he said.
More information on Clever is available through
Raghavan's Web page at
http://www.almaden.ibm.com/cs/k53/clever.html .
===