Hi,
I have one simple problem: doing the regular expression while parsing HTML in Nutch parser. For example, while crawling and parsing ton of web pages, I'd like to write a plugin in Nutch so that it can matched some specific pattern, annotate it and store it. As far as I know Nutch has the HTMLMetaTag argument in method HtmlParseFilter.filter(). My concern is can we also have other html tags like span and so on ? If it is which packages/classes should I look into ? THanks -- Khang

