-Caveat Lector-
By Suelette Dreyfus
Special Correspondent
CyberWire Dispatch
"Semantic Forests" doesn't mean much to the average person. But if
you say it in concert with the words "automatic voice telephone
interception" and "U.S. National Security Agency" to a computational
linguist, you might just witness the physical manifestations of the word
"fear."
Words are funny things, often so imprecise. Two people can have a
telephone conversation about sex, without ever mentioning the word.
And when the artist formerly known as Prince sang a song about
"cream," he wasn't talking about a dairy product.
All this linguistic imprecision has largely protected our voice
conversations from the prying ears of governments. Until now.
Or, more particularly, it protected us until 15 April, 1997 - the
date the NSA lodged a secret patent application at the US Patent
Office. Of course, the content of the NSA patent was not made public for
two years, since the Patent Office keeps patent applications secret until
they are approved, which in this case was August 10, 1999.
What is so worrying about patent number 5,937,422? The NSA is
believed to be the largest and by far most well-funded spy agency in the
world, a Microsoft of Spookdom. This document provides the first hard
evidence that the NSA appears to be well on its way to creating
eavesdropping software capable of listening to millions of international
telephone calls a day. Automatically.
Patents are sometimes simply ambit claims, legal handcuffs on what
often amounts to little more than theory. Not in this case. This is
real. The U.S. Department of Defense has developed the NSA's patent
ideas into a real software program, called "Semantic Forests," which it
has been lab testing for at least two years.
Two important reports to the European Parliament, in 1998 and 1999,
and Nicky Hager's 1996 book "Secret Power" reveal that the NSA
intercepts international faxes and emails. At the time, this
revelation upset a great number of people, no doubt including the
European companies which lost competitive tenders to American
corporations not long after the NSA found its post-Cold War "new
economy" calling: economic espionage.
Voice telephone calls, however, well, that is another story. Not
even the world's most technically advanced spy agency has the
ability to do massive telephone interception and automatically
massage the content looking for particular words, and presumably
topics. Or so said a comprehensive recent report to the European
Parliament.
In April 1999, a report commissioned by the Parliament's Office of
Scientific and Technological Options Assessment (STOA), concluded
that "effective voice 'wordspotting' systems do not exist" and "are
not in use".
The tricky bit there is "do not exist". Maybe these systems haven't
been deployed en masse, but it is looking increasingly like they do
actually exist, probably in some form which may be closer to the
more powerful topic spotting.
Do The Math
============
There are two new pieces of evidence to support this, and added
together, they raise some fairly explosive questions about exactly
what the NSA is doing with the millions of international phone calls it
intercepts every day in its electronic eavesdropping web commonly known
as Echelon.
First. The NSA's shiny new patent describes a method of
"automatically generating a topic description for text and sorting
text by topic." Sound like a sophisticated web search engine? That's
because it is.
This is a search engine designed to trawl through "machine
transcribed speech," in the words of the patent application. Think
computers automatically typing up words falling from human lips. Now
think of a powerful search engine trawling through those words.
Now sweat...
Maybe the spy agency only wants to transcribe the BBC Radio World
News, but I don't think so. The patent contains a few more
linguistic clues about the NSA's intent - little golden Easter eggs
buried in the legal long grass. The "Background to the Invention"
section of every patent application is the place where the
intellectual property lawyers desperately try to waive away everyone
else's right to claim anything even remotely touching on the patent.
In this section, the NSA attorneys observed there has been "growing
Interest" in automatically identifying topics in "unconstrained
speech."
Only a lawyer could make talking sound so painful. "Unconstrained
speech" means human conversation. Maybe it's been "unconstrained" by the
likelihood of being automatically transcribed for real time topic
searching.
Here's the part where the imprecision of words - particularly spoken
words - comes in. Machine transcribed conversations are raw, and very
hard to analyze automatically with software. Many experts thought the NSA
couldn't go driftnet fishing in the content of everyone's international
phone calls because the technology to transcribe and analyze those calls
was too young.
However, if the NSA didn't have the technology to do automatic
transcription of speech, why would it have patented a sifting method
which, by its very own words, is aimed at transcripts of human speech?
As Australian cryptographer Julian Assange, who discovered the DoD
and patent papers while investigating NSA capabilities observed:
"Why make tires if you don't have a car? Maybe we haven't seen the
car yet, but we can infer that it exists by all the tires and
roads."
One of the top American cryptographers, Bruce Schneier, also
believes the NSA already has machine transcription capability. "One
of the Holy Grails of the NSA is the ability to automatically search
through voice traffic," Schneier said. "They would have expended
considerable effort on this capability, and this research indicates at
least some of it has been fruitful."
Second, two Department of Defense academic papers show the U.S.
developed a real software program, called "Semantic Forests," to
implement the patented method.
Published as part of the Text REtrieval Conference (TREC) in 1997
and 1998, the Semantic Forest papers show the program has one main
purpose: "performing retrieval on the output of automatic
speech-to-text (speech recognition) systems." In other words, the
U.S. built this software *specifically* to sift through
computer-transcribed human speech.
If that doesn't send a chill down your spine, read on.
The DoD's second prime purpose for Semantic Forests was to "explore
rapid Prototyping" of this information retrieval system. That
statement was written in 1997.
There's also an unambiguous link between Semantic Forests and the
NSA patent, it's human and its name is Patrick Schone.
Schone appears on the NSA patent documents, as an inventor, and the
Semantic Forests papers, as an author and he works at Ft. Meade,
NSA's headquarters.
Specifically, he works in the DoD's "Speech Research Branch" which
just happens to be located at, you guessed it, Ft. Meade.
Very Clever Fish
================
The NSA and the DoD refused to comment on the patent or Semantic
Forests respectively. Not surprising really but no matter, since the
Semantic Forest papers speak for themselves. The papers reveal a software
program which, while somewhat raw a year ago, was advancing quickly in
its ability to fish relevant data out of various document pools,
including those based on speech.
For example, in one set of tests, the scientists increased the
average precision rate for finding relevant documents per query from 19%
to 27% in just one year, from 1997 to 1998. Tests in 1998 on another set
of documents, in the "Spoken Document Retrieval" pool were turning up
similar stats around 20-23 per cent. The team also discovered that a
little hand-fiddling in the software reaped large rewards.
According to the 1998 TREC paper: "When we supplemented the topic
lists for all the queries (by hand) to contain additional words from the
relevant documents, our average precision at the number of relevant
documents went from 28% to 50%."
The truth is that Schone and his colleagues have created a truly
clever invention. They have done some impressive research. What a
shame all this creativity and laborious testing is going to be used
for such dark, Orwellian purposes.
Let's work on the mental image of that dark landscape. The NSA sucks down
phone calls, emails - all sorts of communications to its satellite bases.
Its computers sift through the data looking for information which might
interest the U.S. or, if the Americans happen to be feeling generous that
day, their allies.
Now, whenever NSA agents want to find out about you, they pull up a
slew of details about you on their database. And not just the
run-of-the-mill gumshoe detective stuff like your social security
number, address, but the telephone number of every person you call
regularly, and everything you have said when making those calls to
1-900-Lick-Me from your hotel room on those stop overs in Cleveland.
And here's the real scary stuff:
The NSA likely already has a file on many of us. It's not a
traditional manilla file with your name typed neatly on the front.
It's the ability to reference you, or anyone who matches your
patterns of behavior and contacts, in the NSA's databases. Now, or
in the near future, this file may not just include who you are, but
what you *say*.
British Member of the European Parliament Glyn Ford is one of the
few politicians around who is truly concerned with the individual's
right to privacy. A driving force behind the European Parliament's
STOA panel's two year investigation into electronic communications,
Ford is worried that the NSA possesses technologies that are
"potentially very dangerous" to privacy and yet have no controls
over their activities.
The Australian aboriginal activist and lawyer Noel Pearson once said that
that the British gave three great things to the world: tea, cricket and
common law. If unchecked, the NSA and its sister spy agencies in the
UK/USA agreement may use this technology to lead an assault on the most
important of those gifts and the common law tenet "innocent until proven
guilty" may be the first casualty.
How ironic: one Blair wrote '1984' as fiction, and another is
helping to make it fact.
= = = = = = = = = = = = = = = =
An Australian-American writer, Suelette Dreyfus was educated in the
UK and US, studied at Oxford University and Columbia University in
New York, where she won the prestigious Teichmann Prize for
excellence and originality in writing. She is the author of
Underground, the first book about Australian computer hacking,
available at
= = = = = = = = = = = = = = = = =
EDITOR'S NOTE: CyberWire Dispatch, with an Internet circulation
estimated at more than 600,000 is now developing plans for a
once-a-week e-mail publication. Every week, one of five well-known
investigative reporters will file for CWD. If you think your company or
organization would be interested in more information about establishing
an sponsorship relationship with CyberWire Dispatch, please contact Lewis
Z. Koch at [EMAIL PROTECTED]
===================
--
Kathleen
"I wonder who will be the first local, who, when asked
directions, will say, 'well, take a right at the next
corner and go down on Clinton'." --Member Comments on
Little Rock's proposed "Clinton Avenue."
DECLARATION & DISCLAIMER
==========
CTRL is a discussion and informational exchange list. Proselyzting propagandic
screeds are not allowed. Substance—not soapboxing! These are sordid matters
and 'conspiracy theory', with its many half-truths, misdirections and outright
frauds is used politically by different groups with major and minor effects
spread throughout the spectrum of time and thought. That being said, CTRL
gives no endorsement to the validity of posts, and always suggests to readers;
be wary of what you read. CTRL gives no credeence to Holocaust denial and
nazi's need not apply.
Let us please be civil and as always, Caveat Lector.
========================================================================
Archives Available at:
http://home.ease.lsoft.com/archives/CTRL.html
http:[EMAIL PROTECTED]/
========================================================================
To subscribe to Conspiracy Theory Research List[CTRL] send email:
SUBSCRIBE CTRL [to:] [EMAIL PROTECTED]
To UNsubscribe to Conspiracy Theory Research List[CTRL] send email:
SIGNOFF CTRL [to:] [EMAIL PROTECTED]
Om