http://www.wired.com/news/business/0,1367,51112-2,00.html
 
The Push for News Returns
By Kendra Mayfield
 
2:00 a.m. March 30, 2002 PST
"News is what a chap who doesn't care much about anything wants to read. And it's only news until he's read it. After that it's dead."

English novelist Evelyn Waugh's prescient statement still holds true today: Consumers only want to read what's news to them.

Since the advent of the Web, humans have tried to find new ways to use technology to sift through the barrage of online news.

First there was PointCast, the much heralded push-technology purveyor that was supposed to dethrone the Internet browser by delivering headlines to the desktop. Other upstarts like Marimba jumped early to join the push bandwagon.

After the promise of push slowly fizzled, startups like Farcast used customized agents to scan various news feeds and automatically send customized news to e-mail accounts.

Now, in the latest attempt to automate the news, a group of Columbia researchers have launched Newsblaster, a project that uses natural language processing techniques to summarize top headlines.

The project attempts to cut the saturation of daily headlines by fusing content from multiple online news sources into concise summaries.

"(Newsblaster) grooms information together and cuts redundancy," said Regina Barzilay, a computer science doctoral student who is working on the Newsblaster project. "It allows the users to see information much faster."

Researchers decided to launch Newsblaster shortly after Sept. 11 to track news related to the terrorist attacks and to test algorithms in a live environment, said Kathleen McKeown, the computer science professor overseeing the project.

Newsblaster's software classifies articles into six categories: United States, world, finance, entertainment, science/technology and sports.

The program extracts nouns, proper nouns and noun phrases to measure similarity between articles and determine when they cover the same event.

Newsblaster looks for similar themes from various sources (such as Yahoo, CNN, Reuters, The Washington Post, USA Today and Wired News). Each theme will generate one sentence in a summary. The software parses these sentences and compares them to find repeated phrases, which it cuts and pastes to form a summary of a particular news event.

Google recently launched a similar service to collect headlines from multiple sources. Google's News Search (beta) service uses a unique grouping technology that automatically puts related stories together in the same search result.

Unlike Google's news search, which provides clusters of related documents, Newsblaster actually culls similar content into one descriptive summary.

"While some (commercial) sites do provide a summary of a single summary, they do not summarize over multiple articles," McKeown said.

The University of Michigan is working on a similar service called NewsInEssence, which also uses natural language techniques to find and summarize multiple news articles on the Web.

A user enters a URL of a single news story from a news website (from a source that NewsInEssence understands, currently BBC News, Yahoo News, CNN, MSNBC or USA Today) and sets search parameters.

NewsInEssence's search agent, called NewsTroll, searches for stories related to the same event. The agent then enters keywords into search engines of news sites and produces summaries of a subset of stories that it finds.

NewsTroll reports the number of links that it has followed, tested and retrieved. Since the system uses several levels of filtering, NewsTroll can screen out large numbers of Web pages and return results in real time.

"Our system is the only one that allows users to specify which sources they find more important and adjust the summaries accordingly," said Dragomir Radev, an adviser for the NewsInEssence project.

"The main challenge is to scale up with hundreds of news sources and thousands of users," Radev said.

Users seem satisfied with Newsblaster's accuracy. According to a user survey last January, about 88 percent of Newsblaster's summaries were deemed acceptable.

But artificial intelligence systems like NewsInEssence and Newsblaster are far from perfect. Summaries aren't always as coherent as those written by human editors.

Newsblaster often assumes that all articles in a particular category are about the same event. Sometimes the sentences have odd punctuation and do not flow smoothly. The site is also upgraded only once a day, so news may appear stale.

"It does make errors, and it's not always going to be correct," McKeown said. "Even when it's acceptable, it's not always going to be ideal."

Despite these shortcomings, Columbia researchers insist that Newsblaster is still a valuable tool.

"(Newsblaster) is not intended to replace human editors," McKeown said. "Rather, it provides a complementary tool to help humans cope with the exploding quantity of information on the Web in a timely fashion. Even with errors, it is useful in this way."

"I personally don't think it will be able to substitute a human editor," Barzilay agreed. "But it will be able to provide more efficient access to what humans have written."

While some are trying to automate the news, human attempts to personalize the news are gaining popularity. Weblogs like Scripting News, MetaFilter and others provide a personal, human editorial slant that machines can't mimic.

Columbia researchers are working on making Newsblaster more efficient, so users can receive real-time updates. They are also trying to improve tools to remove unnecessary phrases and improve fluency of text.

The research team is also working on identifying inconsistencies across sources, and techniques for tracking events across days as news events develop. The program will eventually be customizable and include multilingual summarizations.

But for now at least, it looks like human journalists aren't in jeopardy of being superceded by automated news.

"It's a good framework for filtering the news and making it interesting," Barzilay said. "The question is how much can we do to make it totally customizable."

Reply via email to