[jira] Commented: (NUTCH-907) DataStore API doesn't support multiple storage areas for multiple disjoint crawls

2010-09-16 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910109#action_12910109
 ] 

Andrzej Bialecki  commented on NUTCH-907:
-

That's very good news - in that case I'm fine with the Gora API as it is now, 
we should change Nutch to make use of this functionality.

 DataStore API doesn't support multiple storage areas for multiple disjoint 
 crawls
 -

 Key: NUTCH-907
 URL: https://issues.apache.org/jira/browse/NUTCH-907
 Project: Nutch
  Issue Type: Bug
Reporter: Andrzej Bialecki 
 Fix For: 2.0


 In Nutch 1.x it was possible to easily select a set of crawl data (crawldb, 
 page data, linkdb, etc) by specifying a path where the data was stored. This 
 enabled users to run several disjoint crawls with different configs, but 
 still using the same storage medium, just under different paths.
 This is not possible now because there is a 1:1 mapping between a specific 
 DataStore instance and a set of crawl data.
 In order to support this functionality the Gora API should be extended so 
 that it can create stores (and data tables in the underlying storage) that 
 use arbitrary prefixes to identify the particular crawl dataset. Then the 
 Nutch API should be extended to allow passing this crawlId value to select 
 one of possibly many existing crawl datasets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-880) REST API (and webapp) for Nutch

2010-09-16 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated NUTCH-880:


Attachment: API.patch

Initial patch for discussion. This is a work in progress, so only some 
functionality is implemented, and even less than that is actually working ;)

I would appreciate a review and comments.

 REST API (and webapp) for Nutch
 ---

 Key: NUTCH-880
 URL: https://issues.apache.org/jira/browse/NUTCH-880
 Project: Nutch
  Issue Type: New Feature
Affects Versions: 2.0
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Attachments: API.patch


 This issue is for discussing a REST-style API for accessing Nutch.
 Here's an initial idea:
 * I propose to use org.restlet for handling requests and returning 
 JSON/XML/whatever responses.
 * hook up all regular tools so that they can be driven via this API. This 
 would have to be an async API, since all Nutch operations take long time to 
 execute. It follows then that we need to be able also to list running 
 operations, retrieve their current status, and possibly 
 abort/cancel/stop/suspend/resume/...? This also means that we would have to 
 potentially create  manage many threads in a servlet - AFAIK this is frowned 
 upon by J2EE purists...
 * package this in a webapp (that includes all deps, essentially nutch.job 
 content), with the restlet servlet as an entry point.
 Open issues:
 * how to implement the reading of crawl results via this API
 * should we manage only crawls that use a single configuration per webapp, or 
 should we have a notion of crawl contexts (sets of crawl configs) with CRUD 
 ops on them? this would be nice, because it would allow managing of several 
 different crawls, with different configs, in a single webapp - but it 
 complicates the implementation a lot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-908) Infinite Loop and Null Pointer Bugs in Searching

2010-09-16 Thread Dennis Kubes (JIRA)
Infinite Loop and Null Pointer Bugs in Searching


 Key: NUTCH-908
 URL: https://issues.apache.org/jira/browse/NUTCH-908
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.1, 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.2


It is possible for the NutchBean to drop into an infinite loop while trying to 
optimize a query to re-search for more results.  There are also two Null 
Pointer bugs in the search process.  One in NutchBean where there was an 
incorrect loop assignment and a second in DistributedSegementsBean when a 
segment is null (shouldn't happen but still should be handled.)  A patch is 
available for both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-908) Infinite Loop and Null Pointer Bugs in Searching

2010-09-16 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-908:
---

Attachment: NUTCH-908-1-20100916.patch

Fixes infinite loop and null pointer bugs.

 Infinite Loop and Null Pointer Bugs in Searching
 

 Key: NUTCH-908
 URL: https://issues.apache.org/jira/browse/NUTCH-908
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.0.0, 1.1
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.2

 Attachments: NUTCH-908-1-20100916.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 It is possible for the NutchBean to drop into an infinite loop while trying 
 to optimize a query to re-search for more results.  There are also two Null 
 Pointer bugs in the search process.  One in NutchBean where there was an 
 incorrect loop assignment and a second in DistributedSegementsBean when a 
 segment is null (shouldn't happen but still should be handled.)  A patch is 
 available for both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (NUTCH-908) Infinite Loop and Null Pointer Bugs in Searching

2010-09-16 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-908 started by Chris A. Mattmann.

 Infinite Loop and Null Pointer Bugs in Searching
 

 Key: NUTCH-908
 URL: https://issues.apache.org/jira/browse/NUTCH-908
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.0.0, 1.1
 Environment: All
Reporter: Dennis Kubes
Assignee: Chris A. Mattmann
 Fix For: 1.2

 Attachments: NUTCH-908-1-20100916.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 It is possible for the NutchBean to drop into an infinite loop while trying 
 to optimize a query to re-search for more results.  There are also two Null 
 Pointer bugs in the search process.  One in NutchBean where there was an 
 incorrect loop assignment and a second in DistributedSegementsBean when a 
 segment is null (shouldn't happen but still should be handled.)  A patch is 
 available for both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.