Hi Folks,
I was looking into the code within Nutch 2.X HostDbUpdateReducer and
'think' I've discovered a bug in the way we output Host data.
https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/host/HostDbUpdateReducer.java#L87
I feel that the follwoing code

host.getInlinks().put(new Utf8(outlink), new
Utf8(Integer.toString(outlinkCount.getCount(outlink))));

should be changed to the following

host.getOutlinks().put(new Utf8(outlink), new
Utf8(Integer.toString(outlinkCount.getCount(outlink))));

Is anyone actively using the HostDb and can comment?
Thank you
Lewis

-- 
*Lewis*

Reply via email to