Hello everybody, I'm just wondering if it is possible to fetch specific metadata with an existing nutch plugin.
Let's take an example. I want to extract some metadata from "div" or "td" tags from html pages that have specific ids and name them the way I like (this is done at parser time). Then, at indexer time, I would use index-metadata (a very good plugin) to add my custom metadata. Currently from what I've seen on the wiki and by quickly analyzing plugins I suppose I have to code my own plugin each time I've got a new site (with a new html structure). I've already done that by using a node walker in a custom htmlParseFilter but the extraction can be a little bit boring :) So on my side i've coded a little plugin that enables me to specify xpaths in an xml file. But before diving into more functionalities I'm just wondering if I did not missed something. This work allowed me to explore some nutch aspects but I don't want to reinvent the wheel or miss something. Albin