[EMAIL PROTECTED] wrote:
Hi All,I think I am missing something. In the Injector reduce code we have the following. ------------------------------------------------------------------------ while (values.hasNext()) { CrawlDatum val = (CrawlDatum)values.next(); if (val.getStatus() == CrawlDatum.STATUS_INJECTED) { injected = val; injected.setStatus(CrawlDatum.STATUS_DB_UNFETCHED); } else { old = val; } } CrawlDatum res = null; if (old != null) res = old; // don't overwrite existing value else res = injected; ------------------------------------------------------------------------ Basically if it is not just injected then don't overwrite. But I am not seeing where the input could be such that the CrawlDatum wasn't just injected and could have previous values. Is this just in case someone uses the Injector as a Reducer and not a Mapper or am I missing how this condition can occur.
This handles an important case, when you inject URLs that already exist in the DB - then you have both the old value and the newly created value under the same key. In previous versions of Injector CrawlDatum-s for such URLs could be overwritten with new values, and you could lose valuable metadata accumulated in old values.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
