[Via... http://www.egroups.com/group/Communist-Internet ] . . ----- Original Message ----- From: Francisco Javier Bernal <[EMAIL PROTECTED]> To: STOP NATO: NO PASARAN! <[EMAIL PROTECTED]> Sent: Sunday, June 10, 2001 3:26 PM Subject: Re: Is this for real: "What are those words that trigger Echelon?" [WWW.STOPNATO.ORG.UK] STOP NATO: NO PASARAN! - HTTP://WWW.STOPNATO.ORG.UK --------------------------- ListBot Sponsor -------------------------- Start Your Own FREE Email List at http://www.listbot.com/links/joinlb ---------------------------------------------------------------------- ECHELON Technology The NSA has been patenting, and publishing, technology that is relevant to ECHELON. ECHELON is a code word for an automated global interception system operated by the intelligence agencies of the U.S., the UK, Canada, Australia and New Zealand. (The NSA takes the lead.) According to reports, it is capable of intercepting and processing many types of transmissions, throughout the globe. Over the past few months, the U.S. House of Representatives has been investigating ECHELON. As part of these investigations, the House Select Committee on Intelligence requested documents from the NSA regarding its operating standards for intelligence systems like ECHELON that may intercept communications of Americans. To everyone's surprise, NSA officials invoked attorney-client privilege and refused to disclose the documents. EPIC has taken the NSA to court. I've seen estimates that ECHELON intercepts as many as 3 billion communications everyday, including phone calls, e-mail messages, Internet downloads, satellite transmissions, and so on. The system gathers all of these transmissions indiscriminately, then sorts and distills the information through artificial intelligence programs. Some sources have claimed that ECHELON sifts through 90% of the Internet's traffic. How does it do it? Read U.S. Patent 5,937,422, "Automatically generating a topic description for text and searching and sorting text by topic using the same," assigned to the NSA. Read two papers titled "Text Retrieval via Semantic Forests," written by NSA employees. Semantic Forests, patented by the NSA (the patent does not use the name), were developed to retrieve information "on the output of automatic speech-to-text (speech recognition) systems" and topic labeling. It is described as a functional software program. The researchers tested this program on numerous pools of data, and improved the test results from one year to the next. All this occurred in the window between when the NSA applied for the patent, more than two years ago, and when the patent was granted this year. One of the major technological barriers to implementing ECHELON is automatic searching tools for voice communications. Computers need to "think" like humans when analyzing the often imperfect computer transcriptions of voice conversations. The patent claims that the NSA has solved this problem. First, a computer automatically assigns a label, or topic description, to raw data. This system is far more sophisticated than previous systems because it labels data based on meaning not on keywords. Second, the patent includes an optional pre-processing step which cleans up text, much of which the agency appears to expect will come from human conversations. This pre-processing will remove what the patent calls "stutter phrases." These phrases "frequently occurs [sic] in text based on speech." The pre-processing step will also remove "obvious stop words" such as the article "the." The invention is designed to sift through foreign language documents, either in text, or "where the text may be derived from speech and where the text may be in any language," in the words of the patent. The papers go into more detail on the implementation of this technology. The NSA team ran the software over several pools of documents, some of which were text from spoken words (called SDR), and some regular documents. They ran the tests over each pool separately. Some of the text documents analyzed appear to include data from "Internet discussion groups," though I can't quite determine if these were used to train the software program, or illustrate results. The "30-document average precision" (whatever that is) on one test pool rose significantly in one year, from 19% in 1997 to 27% in 1998. This shows that they're getting better. It appears that the tests on the pool of speech- to text-based documents came in at between 20% to 23% accuracy (see Tables 5 and 6 of the "Semantic Forests TREC7" paper) at the 30-document average. (A "document" in this definition can mean a topic query. In other words, 30 documents can actually mean 30 questions to the database). It's pretty clear to me that this technology can be used to support an ECHELON-like system. I'm surprised the NSA hasn't classified this work. The Semantic Forest papers: http://trec.nist.gov/pubs/trec6/papers/nsa-rev.ps.gz http://trec.nist.gov/pubs/trec7/papers/nsa-rev.pdf.gz The patent: http://www.patents.ibm.com/details?&pn=US05937422__ News reports on this: http://www.independent.co.uk/news/Digital/Features/spies151199.shtml http://www.independent.co.uk/news/Digital/Features/spies221199.shtml Excellent general information on ECHELON: http://www.echelonwatch.org http://www.bernal.co.uk http://www.wired.com/news/print/0,1294,32586,00.html Good article on ECHELON: http://mediafilter.org/caq/cryptogate/ EPIC files lawsuit against NSA to get ECHELON document released: http://www.epic.org/open_gov/foia/nsa_suit_12_99.html EPIC's complaint: http://www.epic.org/open_gov/FOIA/nsa_comp.pdf NY Times article: http://www.nytimes.com/library/tech/99/12/cyber/articles/04spy.html ______________________________________________________________________ To unsubscribe, write to [EMAIL PROTECTED]
