Hi, Sounds pretty harmless to have that method public IMHO
Julien On 29 October 2012 16:57, Lewis John Mcgibbney <[email protected]>wrote: > Hi Julien, > > Thanks for the comments. Any additional ones regarding the accessibility > of the getDataStoreClass? > > Thanks again > > Lewis > > > On Mon, Oct 29, 2012 at 4:52 PM, Julien Nioche < > [email protected]> wrote: > >> Hi Lewis >> >> see comments below >> >>> >>> So I thought I'd take this one on tonight and see if I can resolve. >>> Basically, my high level question is as follows... >>> Is each line of a text file (seed file) which we attempt to inject >>> into the webdb considered as an individual map task? >>> >> >> no - each file in a map task >> >> >>> The idea is to establish a counter for the successfully injected URLS >>> (and possibly a counter for unsuccessful ones as well) so determining >>> how many URLs are (or should be) present within the webdb can be >>> determined after bootstrapping Nutch via the inject command. >>> >>> you get this information from the Hadoop Mapreduce Admin - the number of >> seeds is the Map input records of the first job, the number post >> filtering and normalisation is in Map output records as for the final >> number of urls in the crawldb post merging with whatever is in the Reduce >> Output Record. >> >> Just get the values from the counters of these 2 jobs to display a user >> friendly message in the log >> >> In general I would advise anyone to use the pseudo distributed mode >> instead of the local one as you get a lot more info from the Hadoop admin >> screen and won't have to trawl through the log files. >> >> HTH >> >> Julien >> >> >> -- >> * >> *Open Source Solutions for Text Engineering >> >> http://digitalpebble.blogspot.com/ >> http://www.digitalpebble.com >> http://twitter.com/digitalpebble >> >> > > > -- > *Lewis* > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

