Hi Sebastian, Thank you for review my email. "a pluggable RankingJob" means a Job that has pluggable ranking backends for graph based algorithms. This job is similar our present architecture of IndexingJob. If we create a RankingJob in our crawler workflow, we can create a dummy Scoring Filter that can reach page's calculated score for generating fetch list at GeneratorJob scope. This job provides implementing custom graph based ranking algorithms (Hostrank, Trustrank, Linkrank etc.)
Talat > The scoring plugin interface fits into the crawler work and data flow: > links are feed into CrawlDb/Webtable, fetch lists are generated, etc. > OPIC can be used because it's "online". Other link rank algorithms > define a complex work flow with additional steps, iterations, etc. > >> a pluggable RankingJob > > "Pluggable" and "job" are somewhat in contradiction. > > Plugins in Nutch never define jobs, most define a simple > interface which can be called "functional" (stateless, > return value depends only on the function arguments). > In addition, most plugins can be used in combination > (e.g., OPIC + custom plugin for focused crawling). > > Yes, it may be worth to think about functions and data > structures which could be shared between ranking algorithms. > I'm skeptic whether there will be enough similarities and overlaps > to make ranking pluggable. > > But, to avoid any misunderstanding: that's not against writing > a "RankingJob for Giraph". > > Sebastian > > > On 05/03/2014 12:10 AM, Talat Uyarer wrote: >> Hi all, >> >> A long time ago, we talked with Julien and Lewis about major needs for 2.x >> on the mail list. >> >> I know that Giraph uses only map slots as workers. At the present our >> architecture of scoring >> plugins don't permit. Giraph and Opic have different work types. IMHO We >> should create a pluggable >> RankingJob like as IndexingJob for Giraph and BSP based systems. The >> Pluggable architecture can >> permit us for implementing custom pagerank (hostrank,usagerank etc.) >> algorithms. Wdyt? >> >> We use different giraph algorithms similar this solution in our company. If >> this makes sense for >> everybody, After 2.3 is released than i can implement it. >> >> I wait your comments :) >> >> Talat >> > -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

