[
https://issues.apache.org/jira/browse/NUTCH-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021452#comment-16021452
]
ASF GitHub Bot commented on NUTCH-2388:
---------------------------------------
lewismc closed pull request #191: NUTCH-2388 bin/crawl indexing only webpages
of current batch instead of all
URL: https://github.com/apache/nutch/pull/191
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> bin/crawl indexing only webpages containing batchID instead of all in 2.x
> -------------------------------------------------------------------------
>
> Key: NUTCH-2388
> URL: https://issues.apache.org/jira/browse/NUTCH-2388
> Project: Nutch
> Issue Type: Bug
> Components: bin
> Affects Versions: 2.3
> Reporter: Kaidul Islam
> Assignee: Kaidul Islam
> Priority: Trivial
> Fix For: 2.4
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> During each iteration, after generating, fetching, parsing and updating the
> current batch into DB, the indexer is supposed to index the current batch
> too. But its indexing all currently.
> {code}
> __bin_nutch index $commonOptions -D solr.server.url=$SOLRURL -all -crawlId
> "$CRAWL_ID"
> {code}
> It should be like below i guess -
> {code}
> __bin_nutch index $commonOptions -D solr.server.url=$SOLRURL $batchId
> -crawlId "$CRAWL_ID"
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)