I've checked it, the source is in DOMContentUtils. Anchors with rel="nofollow" 
are discarded.
 
 
-----Original message-----
> From:weishenyun <[email protected]>
> Sent: Thu 16-Aug-2012 11:09
> To: [email protected]
> Subject: RE: Can Nutch process rel-tag likes rel=&quot;nofollow&quot;?
> 
> Well, I have read TikaParser.java code in Nutch 1.x and Nutch 2.0. I can
> easily get source code like these below.
> 
> if (!metaTags.getNoFollow()) { // okay to follow links
>       ArrayList<Outlink> l = new ArrayList<Outlink>(); // extract outlinks
>       URL baseTag = utils.getBase(root);
>       if (LOG.isTraceEnabled()) {
>         LOG.trace("Getting links...");
>       }
>       utils.getOutlinks(baseTag != null ? baseTag : base, l, root);
>       outlinks = l.toArray(new Outlink[l.size()]);
>       if (LOG.isTraceEnabled()) {
>         LOG.trace("found " + outlinks.length + " outlinks in " + base);
>       }
>     }
> 
> But I think these code is trying to process nofollow or noIndex in metadata
> tags. For example, <meta name="robots" content="nofollow"> or <meta
> name="robots" content="noindex">. And these tags control all the links on
> that page.
> 
> But my problem is that a single link on one page just like  a
> href="http://www.google.com"; rel="nofollow" . In this case, will Nutch
> discard this link according to tags rel='nofollow'. 
> Thanks Markus. 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Can-Nutch-process-rel-tag-likes-rel-nofollow-tp4001541p4001582.html
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
> 

Reply via email to