is there any substitution to Template Detection? Any easy hack or already-made plugins or open source projects that can improve the search results in certain degree without template detection? Thanks.
Andrzej Bialecki wrote: > > dealmaker wrote: >> Hi, >> Does Nutch or any plugin have the template detection? It seems that >> navigation and footer sections usually distort the ranking of search >> results. Is there already open source project or code that I can >> integrate >> to Nutch to give it the ability of template detection? >> Thanks. > > There is no ready-made component in Nutch for this task. The task itself > is complicated and there are no ideal solutions. There are several > algorithms described in the literature, primarily falling into two > groups: page-at-a-time (usually single pass) and whole-corpus (usually > several passes). They work with varying degrees of success, strongly > dependent on the test corpus. > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > -- View this message in context: http://www.nabble.com/Template-Detection--tp22655736p22661543.html Sent from the Nutch - User mailing list archive at Nabble.com.