Hi Talat,

> At the present our architecture of scoring plugins don't permit.

The scoring plugin interface fits into the crawler work and data flow:
links are feed into CrawlDb/Webtable, fetch lists are generated, etc.
OPIC can be used because it's "online".  Other link rank algorithms
define a complex work flow with additional steps, iterations, etc.

> a pluggable RankingJob

"Pluggable" and "job" are somewhat in contradiction.

Plugins in Nutch never define jobs, most define a simple
interface which can be called "functional" (stateless,
return value depends only on the function arguments).
In addition, most plugins can be used in combination
(e.g., OPIC + custom plugin for focused crawling).

Yes, it may be worth to think about functions and data
structures which could be shared between ranking algorithms.
I'm skeptic whether there will be enough similarities and overlaps
to make ranking pluggable.

But, to avoid any misunderstanding: that's not against writing
a "RankingJob for Giraph".

Sebastian


On 05/03/2014 12:10 AM, Talat Uyarer wrote:
> Hi all,
> 
> A long time ago, we talked with Julien and Lewis about major needs for 2.x on 
> the mail list.
> 
> I know that Giraph uses only map slots as workers. At the present our 
> architecture of scoring
> plugins don't permit. Giraph and Opic have different work types. IMHO We 
> should create a pluggable
> RankingJob like as IndexingJob for Giraph and BSP based systems. The 
> Pluggable architecture can
> permit us for implementing custom pagerank (hostrank,usagerank etc.) 
> algorithms.  Wdyt?
> 
> We use different giraph algorithms similar this solution in our company. If 
> this makes sense for
> everybody, After 2.3 is released than i can implement it.
> 
> I wait your comments :)
> 
> Talat
> 

Reply via email to