Doug Cutting (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364125 ]
My apologies for commenting here - JIRA produces broken HTML for me, I can't use it...
Doug Cutting commented on NUTCH-139: ------------------------------------ I think we're near agreement here. Here are the changes I think this patch still needs: MetadataNames belongs in the protocol package, not util.
Erhm.. please bear with me. I'd rather see these two classes in a separate package altogether, org.apache.nutch.metadata. The reason is that most likely these two classes will be used elsewhere too, not just in the protocol and parse/fetch related context. I'm specifically referring to the CrawlData.
We should rename ContentProperties to Metadata.
+1.
We should add an add() method to Metadata, and change set() to replace all values rather than add a new value. Protocol code which creates properties from headers should then use add().
+1
We could commit after simply moving MetadataNames to protocol, and leave the changes to ContentProperties for another commit, but I'd prefer it all be done together.
Either way is fine with me. Perhaps splitting this into two commits would make it easier to fix potential breakage...
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com