[Via... http://www.egroups.com/group/Communist-Internet ]
.
.
----- Original Message -----
From: Francisco Javier Bernal <[EMAIL PROTECTED]>
To: STOP NATO: NO PASARAN! <[EMAIL PROTECTED]>
Sent: Sunday, June 10, 2001 3:26 PM
Subject: Re: Is this for real: "What are those words that trigger Echelon?"
[WWW.STOPNATO.ORG.UK]


STOP NATO: NO PASARAN! - HTTP://WWW.STOPNATO.ORG.UK

--------------------------- ListBot Sponsor --------------------------
Start Your Own FREE Email List at http://www.listbot.com/links/joinlb
----------------------------------------------------------------------

ECHELON Technology

The NSA has been patenting, and publishing, technology that is
relevant to ECHELON.

ECHELON is a code word for an automated global interception system
operated by the intelligence agencies of the U.S., the UK, Canada,
Australia and New Zealand. (The NSA takes the lead.) According to
reports, it is capable of intercepting and processing many types of
transmissions, throughout the globe.

Over the past few months, the U.S. House of Representatives has been
investigating ECHELON. As part of these investigations, the House
Select Committee on Intelligence requested documents from the NSA
regarding its operating standards for intelligence systems like
ECHELON that may intercept communications of Americans. To everyone's
surprise, NSA officials invoked attorney-client privilege and refused
to disclose the documents. EPIC has taken the NSA to court.

I've seen estimates that ECHELON intercepts as many as 3 billion
communications everyday, including phone calls, e-mail messages,
Internet downloads, satellite transmissions, and so on. The system
gathers all of these transmissions indiscriminately, then sorts and
distills the information through artificial intelligence programs.
Some sources have claimed that ECHELON sifts through 90% of the
Internet's traffic.

How does it do it? Read U.S. Patent 5,937,422, "Automatically
generating a topic description for text and searching and sorting
text by topic using the same," assigned to the NSA. Read two papers
titled "Text Retrieval via Semantic Forests," written by NSA
employees.

Semantic Forests, patented by the NSA (the patent does not use the
name), were developed to retrieve information "on the output of
automatic speech-to-text (speech recognition) systems" and topic
labeling. It is described as a functional software program.

The researchers tested this program on numerous pools of data, and
improved the test results from one year to the next. All this
occurred in the window between when the NSA applied for the patent,
more than two years ago, and when the patent was granted this year.

One of the major technological barriers to implementing ECHELON is
automatic searching tools for voice communications. Computers need to
"think" like humans when analyzing the often imperfect computer
transcriptions of voice conversations.

The patent claims that the NSA has solved this problem. First, a
computer automatically assigns a label, or topic description, to raw
data. This system is far more sophisticated than previous systems
because it labels data based on meaning not on keywords.

Second, the patent includes an optional pre-processing step which
cleans up text, much of which the agency appears to expect will come
from human conversations. This pre-processing will remove what the
patent calls "stutter phrases." These phrases "frequently occurs
[sic] in text based on speech." The pre-processing step will also
remove "obvious stop words" such as the article "the."

The invention is designed to sift through foreign language documents,
either in text, or "where the text may be derived from speech and
where the text may be in any language," in the words of the patent.

The papers go into more detail on the implementation of this
technology. The NSA team ran the software over several pools of
documents, some of which were text from spoken words (called SDR),
and some regular documents. They ran the tests over each pool
separately. Some of the text documents analyzed appear to include
data from "Internet discussion groups," though I can't quite
determine if these were used to train the software program, or
illustrate results.

The "30-document average precision" (whatever that is) on one test
pool rose significantly in one year, from 19% in 1997 to 27% in 1998.
This shows that they're getting better.

It appears that the tests on the pool of speech- to text-based
documents came in at between 20% to 23% accuracy (see Tables 5 and 6
of the "Semantic Forests TREC7" paper) at the 30-document average. (A
"document" in this definition can mean a topic query. In other words,
30 documents can actually mean 30 questions to the database).

It's pretty clear to me that this technology can be used to support
an ECHELON-like system. I'm surprised the NSA hasn't classified this
work.

The Semantic Forest papers:
http://trec.nist.gov/pubs/trec6/papers/nsa-rev.ps.gz
http://trec.nist.gov/pubs/trec7/papers/nsa-rev.pdf.gz

The patent:
http://www.patents.ibm.com/details?&pn=US05937422__

News reports on this:
http://www.independent.co.uk/news/Digital/Features/spies151199.shtml
http://www.independent.co.uk/news/Digital/Features/spies221199.shtml

Excellent general information on ECHELON:
http://www.echelonwatch.org
http://www.bernal.co.uk
http://www.wired.com/news/print/0,1294,32586,00.html

Good article on ECHELON:
http://mediafilter.org/caq/cryptogate/

EPIC files lawsuit against NSA to get ECHELON document released:
http://www.epic.org/open_gov/foia/nsa_suit_12_99.html
EPIC's complaint:
http://www.epic.org/open_gov/FOIA/nsa_comp.pdf
NY Times article:
http://www.nytimes.com/library/tech/99/12/cyber/articles/04spy.html


______________________________________________________________________
To unsubscribe, write to [EMAIL PROTECTED]

Reply via email to