[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

ASF GitHub Bot (Jira) Thu, 09 Jan 2020 04:07:24 -0800


    [ 
https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011746#comment-17011746
 ]


ASF GitHub Bot commented on NUTCH-2184:
---------------------------------------

sebastian-nagel commented on issue #95: NUTCH-2184 Enable IndexingJob to 
function with no crawldb
URL: https://github.com/apache/nutch/pull/95#issuecomment-572532933
 
 
   Closed in favor of #486 
   - indexing without a CrawlDb record has already been implemented in 
NUTCH-2456/#240
   - various improvements from this PR have been integrated in #486 
   - separation of mapper and reducer classes is part of NUTCH-2375/#221
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Enable IndexingJob to function with no crawldb
> ----------------------------------------------
>
>                 Key: NUTCH-2184
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2184
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 1.17
>
>         Attachments: NUTCH-2184.patch, NUTCH-2184v2.patch
>
>
> Sometimes when working with distributed team(s), we have found that we can 
> 'loose' data structures which are currently considered as critical e.g. 
> crawldb, linkdb and/or segments.
> In my current scenario I have a requirement to index segment data with no 
> accompanying crawldb or linkdb. 
> Absence of the latter is OK as linkdb is optional however currently in 
> [IndexerMapReduce|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/indexer/IndexerMapReduce.java]
>  crawldb is mandatory. 
> This ticket should enhance the IndexerMapReduce code to support the use case 
> where you ONLY have segments and want to force an index for every record 
> present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

Reply via email to