[
https://issues.apache.org/jira/browse/NUTCH-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel closed NUTCH-2003.
----------------------------------
> topN is not work correctly
> --------------------------
>
> Key: NUTCH-2003
> URL: https://issues.apache.org/jira/browse/NUTCH-2003
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 2.3
> Reporter: Talat Uyarer
> Priority: Minor
> Fix For: 2.5
>
>
> I want to crawl top 1000 urls which are ordered by scores from webpage table.
> It doesnt work correctly.
> When I use topN parameter, it is divided by map task counts (topN/
> maptaskcounts = maptasktopN) Every map tasks generate maptasktopN urls of map
> tasks. Assume as I have 25 map tasks and I set topN parameter as 1000 and
> maptasktopN is calculated as 40. As Result We dont have top 1000 highest
> scored urls, we have 1000 urls of generated 40 highest scored urls per 25 map
> tasks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)