[ 
https://issues.apache.org/jira/browse/NUTCH-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianyun He updated NUTCH-1416:
------------------------------

    Priority: Critical  (was: Major)
    
> Can not update the index
> ------------------------
>
>                 Key: NUTCH-1416
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1416
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>            Reporter: Jianyun He
>            Priority: Critical
>
> When we update the index,can not guarantee that the contents which be indexed 
> is the latest.In the class IndexerMapReduce and method reduce(), it has the 
> following code:
> public void reduce(Text key, Iterator<NutchWritable> values,
>                      OutputCollector<Text, NutchDocument> output, Reporter 
> reporter) throws IOException {
>    ……
>    } else if (value instanceof ParseData) {  
>       parseData = (ParseData)value;
>    } else if (value instanceof ParseText) { 
>       parseText = (ParseText)value;
>    }
>    ……
> }
> For example,30 days ago,I fetched the web page A,now I fetch it again. Then 
> the key A will correspond to two ParseData objects(located in different 
> segments).But in this code,it does not compare the fetch time and simply 
> overwrites the previous value.So the final value maybe the old one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to