Hi Sebastian,
Thank you for review my email. "a pluggable RankingJob" means a Job
that has pluggable ranking backends for graph based algorithms. This
job is similar our present architecture of IndexingJob. If we create a
RankingJob in our crawler workflow, we can create a dummy Scoring
Filter that can reach page's calculated score for generating fetch
list at GeneratorJob scope. This job provides implementing custom
graph based ranking algorithms (Hostrank, Trustrank, Linkrank etc.)

Talat

> The scoring plugin interface fits into the crawler work and data flow:
> links are feed into CrawlDb/Webtable, fetch lists are generated, etc.
> OPIC can be used because it's "online".  Other link rank algorithms
> define a complex work flow with additional steps, iterations, etc.
>
>> a pluggable RankingJob
>
> "Pluggable" and "job" are somewhat in contradiction.
>
> Plugins in Nutch never define jobs, most define a simple
> interface which can be called "functional" (stateless,
> return value depends only on the function arguments).
> In addition, most plugins can be used in combination
> (e.g., OPIC + custom plugin for focused crawling).
>
> Yes, it may be worth to think about functions and data
> structures which could be shared between ranking algorithms.
> I'm skeptic whether there will be enough similarities and overlaps
> to make ranking pluggable.
>
> But, to avoid any misunderstanding: that's not against writing
> a "RankingJob for Giraph".
>
> Sebastian
>
>
> On 05/03/2014 12:10 AM, Talat Uyarer wrote:
>> Hi all,
>>
>> A long time ago, we talked with Julien and Lewis about major needs for 2.x 
>> on the mail list.
>>
>> I know that Giraph uses only map slots as workers. At the present our 
>> architecture of scoring
>> plugins don't permit. Giraph and Opic have different work types. IMHO We 
>> should create a pluggable
>> RankingJob like as IndexingJob for Giraph and BSP based systems. The 
>> Pluggable architecture can
>> permit us for implementing custom pagerank (hostrank,usagerank etc.) 
>> algorithms.  Wdyt?
>>
>> We use different giraph algorithms similar this solution in our company. If 
>> this makes sense for
>> everybody, After 2.3 is released than i can implement it.
>>
>> I wait your comments :)
>>
>> Talat
>>
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Reply via email to