http://www.buffalo.edu/news/fast-execute.cgi/article-page.html?article=72910
009
In War on Terrorism, New Search Engine Seeks Hidden Vulnerabilities
Release date: Friday, May 13, 2005
Contact: Ellen Goldbaum, [EMAIL PROTECTED]
Phone: 716-645-5000 ext 1415
Fax: 716-645-3765
BUFFALO, N.Y. -- As part of an effort to anticipate -- and thwart -- the
plans of potential terrorists, the Federal Aviation Administration is
supporting the development of a new search engine by University at Buffalo
researchers that is designed to detect "hidden" information that can be
gleaned from public Web sites.
Once the technique is developed and validated, it has the potential to make
the Web searches that the public performs daily far more effective in
locating meaningful information on the Internet.
The UB team recently completed an initial prototype system, designed
explicitly to enable searches for "hidden" information within the 9/11
Commission Report.
The system permits users to find the best trail of evidence through many
documents that connects two or more apparently unrelated concepts.
Funded by the FAA, as well as by the National Science Foundation
specifically for anti-terrorism applications, the UB project is based on
Unintended Information Revelation, or UIR, a search technique designed to
uncover hidden information.
The premise of UIR is that pieces of information that by themselves appear
to be innocent may be linked together to reveal inadvertently highly
sensitive data.
The need for such a tool arose after 9/11 when the FAA started focusing on
information being disseminated on its Web site.
"It couldn't tell if it was possible to infer things that the FAA doesn't
want others to infer by putting together data from this page and that page
and that page," said Rohini Srihari, Ph.D., UB professor of computer science
and engineering, who is developing the new search engine with her colleagues
in the Center of Excellence in Document Analysis and Recognition in the UB
School of Engineering and Applied Sciences.
Existing search engines process individual documents based on the number of
times a key word appears in a single document, she explained.
In contrast, UIR is based on the construction of concept chain graphs that
search for the best path connecting two concepts within a multitude of
documents.
"A concept chain graph will show you what's common between two seemingly
unconnected things," Srihari said.
The UIR is designed to detect automatically the "hidden" revelation of
sensitive information.
At the same time, Srihari's NSF research is geared toward developing the
core algorithms that expose hidden paths in trails of numerous documents
that may have been generated by different individuals or organizations.
While a single Web site or document may not reveal malicious intentions, a
concept chain graph may reveal such intentions "hidden" among numerous
documents.
"With regular searches, the input is a set of key words," Srihari explained.
"The search produces a ranked list of documents, any one of which could
satisfy the query.
"UIR, on the other hand, is a composite query, not a keyword query. It is
designed to find the best path, the best chain of associations between two
or more ideas. It returns to you an evidence trail that says 'This is how
these pieces are connected.'"
To develop the method, the UB researchers used the chapters of the 9/11
Commission Report to establish concept ontologies -- lists of terms of
interest in the specific domains relevant to the researchers: aviation,
security and anti-terrorism issues.
According to Srihari, the key was coming up with a sophisticated content
representation method for processing, or mining, text.
"UIR is an example of text mining, going across documents and uncovering
things that are not apparent to the user," she said.
One search the UB researchers used to test their prototype involved
exploring the chapters in the 9/11 Commission Report for connections between
the three terms that they knew had a connection: "Hamburg," "San Diego" and
"imam" (a Muslim leader).
Srihari explained that the model generated by the system on the basis of the
9/11 corpus found that terrorists Binal Shibh and Mohamed Atta shared
apartments in Hamburg, Germany; Atta and Nawaf al Hazmi were hijackers
involved in the 9/11 attacks and Hazmi found an apartment in San Diego with
the help of Anwar Aulaq, an imam named at a mosque in San Diego.
"The concept chains show you what may be of interest, but the real
intelligence here is gleaned from looking for patterns of interest," said
Srihari. "Once a pattern of interest is identified, then you can ask, 'Are
there more patterns like this?'"
A more robust prototype is expected to be delivered to FAA for evaluation by
the end of the year.
Eventually, the UB search tool may also be used for other applications, such
as helping biomedical researchers conduct more effective investigations into
the connections between genes, proteins and disease.
Sudarshan Lamkhede, Anmol Bhasin and Wei Dai, graduate students, in the UB
Department of Computer Science and Engineering, and Nick Schwartzmeyer, a
graduate student in the Department of Linguistics in the College of Arts and
Sciences, are working with Srihari on the project.
The University at Buffalo is a premier research-intensive public university,
the largest and most comprehensive campus in the State University of New
York.
You are a subscribed member of the infowarrior list. Visit
www.infowarrior.org for list information or to unsubscribe. This message
may be redistributed freely in its entirety. Any and all copyrights
appearing in list messages are maintained by their respective owners.