Hi,

Sounds pretty harmless to have that method public IMHO

Julien

On 29 October 2012 16:57, Lewis John Mcgibbney <[email protected]>wrote:

> Hi Julien,
>
> Thanks for the comments. Any additional ones regarding the accessibility
> of the getDataStoreClass?
>
> Thanks again
>
> Lewis
>
>
> On Mon, Oct 29, 2012 at 4:52 PM, Julien Nioche <
> [email protected]> wrote:
>
>> Hi Lewis
>>
>> see comments below
>>
>>>
>>> So I thought I'd take this one on tonight and see if I can resolve.
>>> Basically, my high level question is as follows...
>>> Is each line of a text file (seed file) which we attempt to inject
>>> into the webdb considered as an individual map task?
>>>
>>
>> no - each file in a map task
>>
>>
>>> The idea is to establish a counter for the successfully injected URLS
>>> (and possibly a counter for unsuccessful ones as well) so determining
>>> how many URLs are (or should be) present within the webdb can be
>>> determined after bootstrapping Nutch via the inject command.
>>>
>>> you get this information from the Hadoop Mapreduce Admin - the number of
>> seeds is the Map input records of the first job, the number post
>> filtering and normalisation is in Map output records as for the final
>> number of urls in the crawldb post merging with whatever is in the Reduce
>> Output Record.
>>
>> Just get the values from the counters of these 2 jobs to display a user
>> friendly message in the log
>>
>> In general I would advise anyone to use the pseudo distributed mode
>> instead of the local one as you get a lot more info from the Hadoop admin
>> screen and won't have to trawl through the log files.
>>
>> HTH
>>
>> Julien
>>
>>
>> --
>> *
>> *Open Source Solutions for Text Engineering
>>
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> http://twitter.com/digitalpebble
>>
>>
>
>
> --
> *Lewis*
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to