is there any substitution to Template Detection?  Any easy hack or
already-made plugins or open source projects that can improve the search
results in certain degree without template detection?
Thanks.


Andrzej Bialecki wrote:
> 
> dealmaker wrote:
>> Hi,
>>   Does Nutch or any plugin have the template detection?  It seems that
>> navigation and footer sections usually distort the ranking of search
>> results.  Is there already open source project or code that I can
>> integrate
>> to Nutch to give it the ability of template detection?
>> Thanks.
> 
> There is no ready-made component in Nutch for this task. The task itself 
> is complicated and there are no ideal solutions. There are several 
> algorithms described in the literature, primarily falling into two 
> groups: page-at-a-time (usually single pass) and whole-corpus (usually 
> several passes). They work with varying degrees of success, strongly 
> dependent on the test corpus.
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Template-Detection--tp22655736p22661543.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to