http://www.independent.co.uk/news/Digital/Features/spies221199.shtml
Spies in the 'forests'
By Suelette Dreyfus
22 November 1999
THE US Department of Defense is lab-testing technology that could
make it easier automatically to sift through a vast pool of private
communications, including international telephone phone calls, in a
similar manner to using an Internet search engine.
The technology, called "Semantic Forests", is a software program
that analyses voice transcripts and other documents in order to
allow intelligent searching for specific topics. The software could
be used to analyse computer-transcribed telephone conversations. It
is named for its use of an electronic dictionary to make a weighted
"tree" of meanings for each word in a target document.
Two US Department of Defense academic papers, published as part of
the Text Retrieval Conference (TREC) in 1997 and 1998, provide the
first evidence that the US government has actually built a working
prototype of this technology and is testing it. The papers reveal
that the US military had been honing Semantic Forests over at least
two years, from 1996 to 1998, to make it more effective at
siphoning off useful information.
According to the 1998 paper, the software was originally developed
to "work with imperfect speech recogniser transcripts". The US
Department of Defense declined to comment on the matter.
In a series of lab tests, the software sifted through large pools
of documents, including transcripts of speech and data from
Internet discussion groups. In one set of tests, scientists
increased the average precision rate for finding relevant documents
per query from 19 per cent to 27 per cent in just one year, from
1997 to 1998.
It appears that Semantic Forests is intelligent enough to handle
questions given in plain English. One of the sample questions used
to test the software was, "What have the effects of the UN
sanctions against Iraq been on the Iraqi people, the Iraqi economy,
or world oil prices?"
The US National Security Agency is also closely associated with
Semantic Forests. One of the authors of Semantic Forests, Patrick
Shone, was also one of the inventors of an NSA-patented system for
eavesdropping on international phone calls, which is similar to
Semantic Forests.
The NSA applied for the patent, No 5,937,422, seven months before
the first Semantic Forest paper was delivered at TREC. However, the
patent only became public after winning US Patent Office approval
in August this year.
The NSA is believed to conduct large-scale, automatic eavesdropping
on some types of written international communications such as
e-mail, according to a May 1999 interim report commissioned by
European Parliament's Scientific and Technical Options Assessment
(STOA) panel.
Glyn Ford MEP, who instigated the STOA's investigation, said he was
concerned that the US was testing technology that might be used to
eavesdrop on international telephone calls. "It appears the NSA has
abilities over and above what has been indicated to us to date," he
said.
There was "strong circumstantial evidence" that the NSA had been
engaged in economic espionage on occasion, passing intercepted
information on to American companies to give them a competitive
advantage, he said. While he was happy for intelligence agencies to
spy on terrorists, he said that the NSA's "blanket approach" to
monitoring telephone calls and e-mails was "a serious breach of
privacy rights".
Cryptographer Julian Assange, who moderates the online Australian
discussion forum AUCRYPTO, discovered the department papers while
investigating NSA capabilities. "This is not some theoretical
exercise. The US has actually built and lab tested this technology,
which is clearly aimed at telephone calls. You don't make a wheel
like this unless you have something to put it on," he said.
US Congressman Bob Barr, who previously served with the CIA, said:
"This report underscores the need to update oversight procedures
and legal standards designed in the 1970s and not updated since, in
light of the revolutionary technological changes of the past two
decades. A perfected system to intercept voice communications and
allow government agencies to precisely pinpoint conversational
topics of interest would create a truly awesome potential for
privacy-invading abuses."
The outspoken Georgia Republican has been a driving force behind
proposed legislation to force the NSA and CIA to report the legal
standards that they use while conducting signals intelligence
activities, including electronic surveillance. The legislation has
passed both houses of Congress and is awaiting signature by
President Clinton.
Dr Brian Gladman, the former director of Strategic Electronic
Communications at the Ministry of Defence, said the NSA would
always like to find better ways to filter "voice traffic" -
international phone conversations - automatically for
information. "The NSA's problem is finding needles in haystacks,
and any technology that can chuck out hay without chucking out
needles is of value to them," he said.
"Automation is essential. It is likely the success rate will be
low, but this may not be an issue. It is better to deploy something
that will allow 10 per cent of the interesting traffic to be found,
than doing nothing and finding nothing."
Dr Gladman speculated that the NSA was not using the new technology
on international telephone calls at the moment, but was doing
trials on it "to see if it is worth deploying".
The two Semantic Forests academic papers came from the speech
research branch of the US Department of Defense at Fort Meade,
Maryland - the location of the headquarters of the NSA. When the
1998 paper was downloaded from the TREC conference Internet site,
the name of the file was listed as "nsa-rev.pdf".
Bruce Schneier, the author of Applied Cryptography, claims that,
paired with other types of spying technology, this software could
have a significant impact on people's privacy. "This technology can
be combined with voice-recognition technology to automatically find
certain conversations by a particular person or ethnic group," he
said.