[ 
https://issues.apache.org/jira/browse/NUTCH-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2387.
------------------------------------
    Resolution: Cannot Reproduce

Hi [~eyeris], I've verified that the linked HTML page gets deleted if 
indexer.delete.robots.noindex is true when the indexer is called.

If the problem persists, please reopen with details how the indexer is called. 
Thanks!

> Nutch should not index document with "noindex" meta
> ---------------------------------------------------
>
>                 Key: NUTCH-2387
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2387
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.13
>         Environment: Linux mint 18,
>            Reporter: Eyeris Rodriguez Rueda
>            Priority: Major
>              Labels: index, meta, robots,
>             Fix For: 1.16
>
>
> I'm using nutch 1.12 in local mode and solr 4.10.3.
> For some reason i have detected that nutch index document with "noindex" 
> robots meta.
>  I have use nutch script for a complete cycle: 
> bin/crawl -i urls/ crawl/ -2
> with this url:
> https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/ 
> After various testing the problem persist and aproximately 200 document with 
> this robots meta are indexed incorrectly.
> I have read the method configure in IndexerMapReduce.java class and it has a 
> line for that property but for some reason it is not doing appropiately.
> this.deleteRobotsNoIndex =  
> job.getBoolean(INDEXER_DELETE_ROBOTS_NOINDEX,false);   (line 97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to