[ 
https://issues.apache.org/jira/browse/NUTCH-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774982#comment-16774982
 ] 

Markus Jelsma commented on NUTCH-2694:
--------------------------------------

I see, i never made a patch, or i lost it.

Anyway, attached patch changes HostDatum and HostDbReducer to accomodate the 
longs. Patch includes change notification in CHANGES.txt with indentation of 
existing change for it to be in line with the style of the breaking change of 
1.15.

> HostDB to aggregate by long instead of integer
> ----------------------------------------------
>
>                 Key: NUTCH-2694
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2694
>             Project: Nutch
>          Issue Type: Bug
>          Components: hostdb
>    Affects Versions: 1.15
>            Reporter: Markus Jelsma
>            Priority: Major
>             Fix For: 1.16
>
>         Attachments: NUTCH-2694.patch
>
>
> Last week we got Pinterest in our database, it has a neat set of sitemaps, 
> and a lot of entries, over 2 billion. When first making HostDatum i foolishly 
> used ints instead of longs, which shows in -1.9 billion records for Pinterest.
> I propose a simple move from int to long with an upgrade note mentioning the 
> databases are not compatible and the suggestion to delete any existing 
> HostDB. Agreed?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to