[jira] Commented: (NUTCH-907) DataStore API doesn't support multiple storage areas for multiple disjoint crawls
[ https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910109#action_12910109 ] Andrzej Bialecki commented on NUTCH-907: - That's very good news - in that case I'm fine with the Gora API as it is now, we should change Nutch to make use of this functionality. DataStore API doesn't support multiple storage areas for multiple disjoint crawls - Key: NUTCH-907 URL: https://issues.apache.org/jira/browse/NUTCH-907 Project: Nutch Issue Type: Bug Reporter: Andrzej Bialecki Fix For: 2.0 In Nutch 1.x it was possible to easily select a set of crawl data (crawldb, page data, linkdb, etc) by specifying a path where the data was stored. This enabled users to run several disjoint crawls with different configs, but still using the same storage medium, just under different paths. This is not possible now because there is a 1:1 mapping between a specific DataStore instance and a set of crawl data. In order to support this functionality the Gora API should be extended so that it can create stores (and data tables in the underlying storage) that use arbitrary prefixes to identify the particular crawl dataset. Then the Nutch API should be extended to allow passing this crawlId value to select one of possibly many existing crawl datasets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-880) REST API (and webapp) for Nutch
[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-880: Attachment: API.patch Initial patch for discussion. This is a work in progress, so only some functionality is implemented, and even less than that is actually working ;) I would appreciate a review and comments. REST API (and webapp) for Nutch --- Key: NUTCH-880 URL: https://issues.apache.org/jira/browse/NUTCH-880 Project: Nutch Issue Type: New Feature Affects Versions: 2.0 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Attachments: API.patch This issue is for discussing a REST-style API for accessing Nutch. Here's an initial idea: * I propose to use org.restlet for handling requests and returning JSON/XML/whatever responses. * hook up all regular tools so that they can be driven via this API. This would have to be an async API, since all Nutch operations take long time to execute. It follows then that we need to be able also to list running operations, retrieve their current status, and possibly abort/cancel/stop/suspend/resume/...? This also means that we would have to potentially create manage many threads in a servlet - AFAIK this is frowned upon by J2EE purists... * package this in a webapp (that includes all deps, essentially nutch.job content), with the restlet servlet as an entry point. Open issues: * how to implement the reading of crawl results via this API * should we manage only crawls that use a single configuration per webapp, or should we have a notion of crawl contexts (sets of crawl configs) with CRUD ops on them? this would be nice, because it would allow managing of several different crawls, with different configs, in a single webapp - but it complicates the implementation a lot. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (NUTCH-908) Infinite Loop and Null Pointer Bugs in Searching
Infinite Loop and Null Pointer Bugs in Searching Key: NUTCH-908 URL: https://issues.apache.org/jira/browse/NUTCH-908 Project: Nutch Issue Type: Bug Affects Versions: 1.1, 1.0.0 Environment: All Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.2 It is possible for the NutchBean to drop into an infinite loop while trying to optimize a query to re-search for more results. There are also two Null Pointer bugs in the search process. One in NutchBean where there was an incorrect loop assignment and a second in DistributedSegementsBean when a segment is null (shouldn't happen but still should be handled.) A patch is available for both. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-908) Infinite Loop and Null Pointer Bugs in Searching
[ https://issues.apache.org/jira/browse/NUTCH-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-908: --- Attachment: NUTCH-908-1-20100916.patch Fixes infinite loop and null pointer bugs. Infinite Loop and Null Pointer Bugs in Searching Key: NUTCH-908 URL: https://issues.apache.org/jira/browse/NUTCH-908 Project: Nutch Issue Type: Bug Affects Versions: 1.0.0, 1.1 Environment: All Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.2 Attachments: NUTCH-908-1-20100916.patch Original Estimate: 4h Remaining Estimate: 4h It is possible for the NutchBean to drop into an infinite loop while trying to optimize a query to re-search for more results. There are also two Null Pointer bugs in the search process. One in NutchBean where there was an incorrect loop assignment and a second in DistributedSegementsBean when a segment is null (shouldn't happen but still should be handled.) A patch is available for both. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (NUTCH-908) Infinite Loop and Null Pointer Bugs in Searching
[ https://issues.apache.org/jira/browse/NUTCH-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-908 started by Chris A. Mattmann. Infinite Loop and Null Pointer Bugs in Searching Key: NUTCH-908 URL: https://issues.apache.org/jira/browse/NUTCH-908 Project: Nutch Issue Type: Bug Affects Versions: 1.0.0, 1.1 Environment: All Reporter: Dennis Kubes Assignee: Chris A. Mattmann Fix For: 1.2 Attachments: NUTCH-908-1-20100916.patch Original Estimate: 4h Remaining Estimate: 4h It is possible for the NutchBean to drop into an infinite loop while trying to optimize a query to re-search for more results. There are also two Null Pointer bugs in the search process. One in NutchBean where there was an incorrect loop assignment and a second in DistributedSegementsBean when a segment is null (shouldn't happen but still should be handled.) A patch is available for both. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.