not quite the same idea, but close!

===
http://www.cnnfn.com/digitaljam/newsbytes/130617.html


WWW Conference - IBM Search Engine "Trawls" Web  
 May 13, 1999: 3:37 p.m. ET

TORONTO, ONTARIO, CANADA (NB) -- By Grant Buckler, 
Newsbytes. If three separate World Wide Web pages contain 
hyperlinks to three other sites, something is going on. That 
principle underlies Clever, search technology developed by IBM 
[NYSE:IBM] researchers that might offer a better way of locating 
information on the Web. Clever brought the research team - Ravi 
Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew 
Tomkins - the distinction of best paper at this year's World Wide 
Web Conference, taking place here this week.  

Entitled "Trawling the Web for Emerging Cyber Communities," the 
paper describes an approach that uses patterns of 
interconnections among Web sites to discover communities that 
one might not expect. It can also locate information in ways a 
conventional search engine would not.  

In an interview with Newsbytes, Raghavan said a good deal of 
research went into choosing the number three as the critical point. 
One page pointing to two other sites usually does not indicate any 
important connection among the three sites, he said. Even two 
pointing to two is not significant enough. But when three sites all 
point to three other sites, and the links are not what Raghavan 
called nepotistic - that is, there is an obvious link among the pages 
such as the fact that they are part of the same organization's Web 
site - it is about 95 percent certain that there is something 
meaningful in common.  

One use of this principle could be in discovering unexpected 
communities on the Web. That could have applications to 
marketing, among other things, Raghavan said.  

Another application is finding sites that meet certain criteria but 
that might not be easy to find with the conventional text-searching 
approach. An example might be an analyst who wants information 
about mainframe computers. IBM is one of the major 
manufacturers of mainframe computers, and yet, Raghavan said, if 
you look at IBM's Web site you will not find the word "mainframe." 
Because the word is out of fashion, vendors use others such as 
"server." So a conventional search on the word "mainframe" might 
miss a lot of the most useful material.  

However, there are on the Web many of what Raghavan called "hub 
sites," often created by people interested in a particular topic. 
Such a site dealing with mainframes might point to IBM's site as 
well as those of Fujitsu and Hitachi. A Clever search would pick up 
on the fact that sites pointing to one of these companies' sites also 
tend to point to the others, and would discern a community. Given 
a term such as "mainframe" or the names IBM and Hitachi, say, it 
would find mainframe-related sites including the vendors' home 
pages, even though those pages don't actually contain the word 
"mainframe."  

Raghavan said IBM sees Clever not as a general-purpose 
consumer search engine but as a tool for more sophisticated 
searchers. It is currently in pilot use within IBM, and the company 
will move quickly to bring it to market, possibly by working with one 
or more marketing partners, he said.  

More information on Clever is available through  
 Raghavan's Web page at 
http://www.almaden.ibm.com/cs/k53/clever.html . 

===

Reply via email to