[ 
https://issues.apache.org/jira/browse/NUTCH-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904243#comment-13904243
 ] 

Sebastian Nagel commented on NUTCH-1706:
----------------------------------------

Rerun the test to double-check (Ubuntu 12.04, Java 1.7.0_51, Hadoop 1.2.0, 
local mode):
* (current trunk): index job failed with NPE in line 192, see comment in 
NUTCH-1646
* (trunk patched with NUTCH-1706-trunk.patch as of Jan. 16th): 
http://tika.apache.org/ is not added to index when segment 20140217140849 is 
included
* (patched with NUTCH-1706-trunk-v2.patch as of today): ok, same number of 
documents added to index

Regarding the fetch_retry: you're right, [~markus17]! But I would prefer to 
open separate issues for this and possible problems when indexing multiple 
segments in one turn. This issue (resp. NUTCH-1646) is a blocker and to release 
1.8 soon, let's keep the patch as minimalistic as possible.



> IndexerMapReduce does not remove db_redir_temp etc
> --------------------------------------------------
>
>                 Key: NUTCH-1706
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1706
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.7
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Blocker
>             Fix For: 1.8
>
>         Attachments: NUTCH-1706-trunk-v2.patch, NUTCH-1706-trunk.patch, 
> nutch-1706-testdata.tgz
>
>
> Code path is wrong in IndexerMapReduce, the delete code should be located 
> after all reducer values have been gathered.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to