To this end, would it suffice to abstract the Page and Link classes and make expanded implementations of these?
Jeremy -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Erik Hatcher Sent: Thursday, August 18, 2005 11:45 AM To: [email protected] Subject: [Nutch-dev] Outlink metadata? First a question about the current behavior... does Nutch adhere to the <a rel="nofollow"...> conventions? If so, where is that coded? On a related note, it seems carrying metadata around on Outlink would be beneficial, not just anchor text and URL. For example, my application will crawl HTML sites with a HEAD <link> to RDF data. I'd like to, in an HtmlParseFilter, add ParseData metadata so that an indexer (a custom one currently, not the Nutch one) can get at the RDF data that has been fetched by the URL stored in the metadata. Make sense? Would my use indicate that Outlink should carry along metadata or is there another way to achieve this (besides writing a custom HTML parser)? Thanks, Erik ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
