Obviously you can compute the Levenstein distance on the text, but that is way too computationally intensive to scale. So the goal is to find something that would be workable in a production system. For example, a given NYT article, and its printer friendly version should be deemed to be the same.
-Mike
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
