Markus Jelsma created NUTCH-2694:
------------------------------------
Summary: HostDB to aggregate by long instead of integer
Key: NUTCH-2694
URL: https://issues.apache.org/jira/browse/NUTCH-2694
Project: Nutch
Issue Type: Bug
Components: hostdb
Affects Versions: 1.15
Reporter: Markus Jelsma
Fix For: 1.16
Last week we got Pinterest in our database, it has a neat set of sitemaps, and
a lot of entries, over 2 billion. When first making HostDatum i foolishly used
ints instead of longs, which shows in -1.9 billion records for Pinterest.
I propose a simple move from int to long with an upgrade note mentioning the
databases are not compatible and the suggestion to delete any existing HostDB.
Agreed?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)