[ 
https://issues.apache.org/jira/browse/NUTCH-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270641#comment-14270641
 ] 

Hudson commented on NUTCH-1907:
-------------------------------

SUCCESS: Integrated in Nutch-nutchgora #1296 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1296/])
NUTCH-1907 Incorrect output of Outlinks to Hosts within HostDbUpdateReducer 
(lewismc: http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1650446)
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/src/java/org/apache/nutch/host/HostDbUpdateReducer.java


> Incorrect output of Outlinks to Hosts within HostDbUpdateReducer 
> -----------------------------------------------------------------
>
>                 Key: NUTCH-1907
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1907
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 2.2.1
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 2.3
>
>         Attachments: NUTCH-1907.patch
>
>
> I 
> [explained|http://www.mail-archive.com/user%40nutch.apache.org/msg12917.html] 
> that I found a big in the 2.X HostDb.
> I was looking into the code within Nutch 2.X HostDbUpdateReducer and
> 'think' I've discovered a bug in the way we output Host data.
> https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/host/HostDbUpdateReducer.java#L87
> I feel that the following code
> {code}
> host.getInlinks().put(new Utf8(outlink), new
> Utf8(Integer.toString(outlinkCount.getCount(outlink))));
> {code}
> should be changed to the following
> {code}
> host.getOutlinks().put(new Utf8(outlink), new
> Utf8(Integer.toString(outlinkCount.getCount(outlink))));
> {code}
> Notice the difference in population of Outlinks to Host instead of repeated 
> Inlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to