----- Original Message ----- From: "BlindNews Mailing List" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, May 28, 2007 6:06 AM Subject: ReCAPTCHA System Improves Internet Security and Book Searchability
> CCN Magazine, Canada > Sunday, May 27, 2007 > > ReCAPTCHA System Improves Internet Security and Book Searchability > > 2007-05-27 13:04:35 > > A Carnegie Mellon University computer scientist is enlisting the unwitting > help of thousands, if not millions, of Web users each day to eliminate a > technical bottleneck that has slowed efforts to transform books, > newspapers and other printed materials into digitized text that is > computer searchable. > > A Carnegie Mellon University computer scientist is enlisting the unwitting > help of thousands, if not millions, of Web users each day to eliminate a > technical bottleneck that has slowed efforts to transform books, > newspapers and other printed materials into digitized text that is > computer searchable. Luis von Ahn, an assistant professor of computer > science and recipient of a MacArthur Foundation "genius grant," says the > project will also improve Web security systems used to reduce spam and > make it possible for individuals to safeguard their own email addresses > from spammers. > > Key to the new project is assigning a new, dual use to existing > technology: CAPTCHAs, the distorted-letter tests found at the bottom of > registration forms on Yahoo, Hotmail, PayPal, Wikipedia and hundreds of > other sites worldwide. CAPTCHAs, an acronym for Completely Automated > Public Turing Test to Tell Computers and Humans Apart, distinguish between > legitimate human users and malevolent computer programs designed by > spammers to harvest thousands of free email accounts. The tests require > users to type the distorted letters they see inside a box - a task that is > difficult for computers, but easy for humans. > > Working with a team that includes computer science professor Manuel Blum, > undergraduate student Ben Maurer and research programmer Mike Crawford, > von Ahn invented a new version of the tests, called reCAPTCHAs, that will > help convert printed text into computer-readable letters on behalf of the > Internet Archive. The San Francisco-based non-profit group administers the > Open Content Alliance and is one of several large initiatives working to > digitize books and other printed materials under open principles, making > the text searchable by computer and capable of being reformatted for new > uses. > > Optical character recognition (OCR) systems that automatically perform > this conversion are often stumped by underlined text, scribbles and fuzzy > or otherwise poorly printed letters. ReCAPTCHAs will use words from these > troublesome passages to replace the artificially distorted letters and > numbers typically used in CAPTCHAs. > > The new tests continue to distinguish between humans and machines because > they use text that OCR systems have already failed to read. And because > people must decipher these words to pass the reCAPTCHA test, they will > help complete the expensive digitization process. > > "I think it's a brilliant idea - using the Internet to correct OCR > mistakes," said Brewster Kahle, director of the Internet Archive. > ReCAPTCHAs will speed the digitization process while also helping to > improve OCR methods and perhaps extend them to additional languages, he > said. "This is an example of why having open collections in the public > domain is important," he added. "People are working together to build a > good, open system." Von Ahn hopes to substitute his reCAPTCHAs for as many > conventional CAPTCHAs as possible. "It is estimated that 60 million or > more CAPTCHAs are solved each day, with each test taking about 10 > seconds," he said. "That's more than 150,000 precious hours of human work > that are lost each day, but that we can put to good use with reCAPTCHAs." > > With support from Intel Corp., von Ahn's team has devised a free, > Web-based service that allows individual webmasters to install reCAPTCHAs > to protect their sites. Individuals can also use the service to protect > their own email addresses, or lists of addresses they post on personal Web > pages. In the case of some commercial Web sites with heavy traffic, > reCAPTCHA may charge a fee to pay for additional bandwidth. > > To make certain that people are correctly deciphering the printed text, > the reCAPTCHA system will require Web site visitors to type two words, one > of which the system already knows. Each unknown word will be submitted to > multiple visitors. If the visitor types the known word correctly, the > system has greater confidence that the unknown word is being typed > correctly. If several visitors type the same answer for the unknown word, > that answer will be assumed to be correct. > > An audio version of reCAPTCHA, which will transcribe portions of radio > programs that have defied speech recognition programs, will also be > available for blind Web users. > > > http://www.ccnmag.com/news.php?id=5301 > > -- > BlindNews mailing list > > To contact a list moderator about a problem or to make a request, send a > message to [EMAIL PROTECTED] > > The BlindNews list is archived at: http://GeoffAndWen.com/blind/ > > To address a message to all members of the list, send mail to: > [EMAIL PROTECTED] > > Access your subscription info at: > http://blindprogramming.com/mailman/listinfo/blindnews_blindprogramming.com > > To unsubscribe via e-mail: send a message to > [EMAIL PROTECTED] with the word unsubscribe in either > the subject or body of the message Send instant messages to your online friends http://in.messenger.yahoo.com To unsubscribe send a message to [EMAIL PROTECTED] with the subject unsubscribe. To change your subscription to digest mode or make any other changes, please visit the list home page at http://accessindia.org.in/mailman/listinfo/accessindia_accessindia.org.in
