[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217172#comment-13217172 ]
Mathijs Homminga commented on NUTCH-1289: ----------------------------------------- Nice catch. The PartitionUrlByHost seems broken indeed. I would suggest that we use the existing o.a.n.crawl.URLPartitioner class which has support for three URL partition modes (host, domain, IP) and which is used by the GeneratorJob too. Pros: support for different partition modes in the Fetcher + no duplicate code. Or is there a reason why the Fetcher has its own partition logic? The URLPartitioner class is a Partitioner<SelectorEntry, WebPage> instead of a Partitioner<IntWritable, FetchEntry> but you can perhaps extract a method and use it from both classes, or create one URLPartitioner with two specific inner classes for the Generator and Fetcher. > In distributed mode URL's are not partitioned > --------------------------------------------- > > Key: NUTCH-1289 > URL: https://issues.apache.org/jira/browse/NUTCH-1289 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: nutchgora > Reporter: Dan Rosher > Fix For: nutchgora > > Attachments: NUTCH-1289.patch > > > In distributed mode URL's are not partitioned to a specific machine which > means the politeness policy is voided -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira