Hi all, for NUTCH-251:

I suppose that NUTCH-251 is relatively a significant issue by the votes. Stafan has written a good plugin for the admin gui and i have updated it to work with nutch-0.8, hadoop 0.4.

Some of the features in the patch is not appropriate for our use cases and it requires hadoop changes, thus I am currently working on an alternative implementation of the administration gui, which runs a hadoop server( like JobTraker) to listen to submitted Jobs, an web Gui to submit and track the jobs from the browser and a job runner.

The architechture details of the patch is as follows :

- An interface AdminJob which is an abstract class representing a Job in nutch.
 - various classes extending AdminJob. for ex FetchAdminJob, IndexAdminJob.
- A queue which sorts the jobs in priority order, by a modified a topological sort(jobs can be dependent).
 - an interface to submit Jobs
 - a rpc server to listen to job submissions
 - an extension point (basically same as the previous)
 - a web server to serve plugin jsp's

upon the features will be
   - submitting jobs from code, command line or web interface,
   - tracking jobs from the command line or web interface
   - scheduling jobs

I could send the code or details if anyone is interested in pretesting. And i will appreciate any comments and suggestions on this. I am planning to complete the patch and submit it to Jira ASAP.

Sami Siren wrote:
Hello,

It has been a while from a previous release (0.8.1) and looking at the
great fixes done in trunk I'd start thinking about baking a new release
soon.

Looking at the jira roadmaps there are 1 blocking issues (fixing the
license headers) for 0.8.2 and two other blocking issues for 0.9.0 of
which I think NUTCH-233 is safe to put in.

The top 10 voted issues are currently:

NUTCH-61         Adaptive re-fetch interval. Detecting umodified content
NUTCH-48        "Did you mean" query enhancement/refignment feature
NUTCH-251       Administration GUI
NUTCH-289       CrawlDatum should store IP address
NUTCH-36        Chinese in Nutch
NUTCH-185       XMLParser is configurable xml parser plugin.            
NUTCH-59        meta
data support in webdb
NUTCH-92        DistributedSearch incorrectly scores results            
NUTCH-68        A
tool to generate arbitrary fetchlists           NUTCH-87        Efficient
site-specific crawling for a large number of sites

Are there any opinions about issues that should go in before the next
release (Answering yes means that you are willing to provide a patch for
it).

--
 Sami Siren


Reply via email to