Nutch determines the pages' scores from the number of inbound links and the authority value of those links. HITS-ish algorithm. If a sub-level page has more inbound links and/or more important ones, it'll outscore the front page, which usually has a high score. A nice solution would be to modify the weighting step, where internal/external links are weighted to a score, and add consideration of the depth of the page as well... that should be relatively painless. You could also hack the segment manually after you created it (SegmentReader/SegmentWriter), giving fake inbound links to certain pages or just modifying their score.
Anywho, maybe that gives you an idea or two. I have pretty poor knowledge of the actual ranking algorithm, so perhaps someone will come up with better suggestions... Fredrik On 7/28/05, EM <[EMAIL PROTECTED]> wrote: > Is there a chance that the ranking algorithm in Analyze would give higher > value to a subpage than the root domain page? > > For example: > http://abc.com <- 34.432 > http://abc.com/something.html <- 50 > > > Is the above scenario possible, or does nutch always rank root pages > highest? > > Regards, > EM > >
