[
https://issues.apache.org/jira/browse/NUTCH-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278445#comment-16278445
]
Hudson commented on NUTCH-2469:
-------------------------------
SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1597 (See
[https://builds.apache.org/job/Nutch-nutchgora/1597/])
NUTCH-2469 Documents not commited to solr in Sever mode - applied patch
(snagel:
[https://github.com/apache/nutch/commit/cc2f4abeb7b8326acbb00f9d10b46a092bbbe9a5])
* (edit) src/java/org/apache/nutch/indexer/IndexingJob.java
> Documents not commited to solr in Sever mode
> --------------------------------------------
>
> Key: NUTCH-2469
> URL: https://issues.apache.org/jira/browse/NUTCH-2469
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 2.3.1
> Reporter: Ninaad Joshi
> Assignee: Sebastian Nagel
> Priority: Blocker
> Fix For: 2.4
>
> Attachments: NinaadJoshi.IndexingJob.java.patch
>
>
> I found there is a discrepancy in execution paths when running Nutch in local
> standalone mode vis-à-vis server mode.
> I observed, in local standalone mode, when the indexing process is done the
> document along with its fields get indexed and committed in solr and is
> returned if queried immediately. However, the same when done through server
> mode, the document gets indexed but is not committed in solr, hence not
> returned if queried immediately. When we restart solr the indexed document is
> returned if queried.
> I browsed through the IndexingJob.java file to understand the cause for this.
> I found out:
> # There are two different entry paths for the local standalone mode and the
> server mode
> ** Server mode entry point: public Map<String, Object> run(Map<String,
> Object> args)
> ** Standalone mode entry point:
> *** public int run(String[] args)
> *** public void index(String batchId)
> # The local standalone mode path did extra stuff than the server mode
> ** The public void index(String batchId) function initially calls the server
> mode path: public Map<String, Object> run(Map<String, Object> args)
> ** And then does this extra stuff
> *** Gets IndexWriters
> *** Using IndexWriters Describes
> Using IndexWriters commits if COMMIT_INDEX=true is specified in the
> configuration
> *** The aforementioned extra stuff is not done in the server mode
> I feel the execution paths for both the modes should be same and hence
> propose to:
> # Move the extra stuff done using IndexWriters in public void index(String
> batchId) to the end of server mode execution path i.e public Map<String,
> Object> run(Map<String, Object> args) function
> # Call public Map<String, Object> run(Map<String, Object> args) function
> directly from Standalone mode entry point: public int run(String[] args)
> # public int run(String[] args) becomes redundant and can be safely removed.
> I have attached the proposed patch along with this issue. Kindly go through
> the same and approve.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)