Hi,

short status of testing from my side:

- successfully run a small test crawl in local mode
  (only inject + few generate-fetch-parse-update cycles)

- crawling in distributed mode (on Hadoop cluster) fails,
  generator does not generate fetch lists:
    17/09/14 13:56:09 WARN crawl.Generator: Generator: 0 records selected for 
fetching, exiting ...

  I've retried generator with the current master: it's definitely related to the
  current NUTCH-2375 branch/PR. Afaics, this is due to not properly set 
configuration variables,
  changes are requested.


Best,
Sebastian



On 09/11/2017 08:06 AM, Omkar Reddy wrote:
> Hi,
> 
> Kenneth thank you for your appreciation. Please participate in the code 
> review. As Lewis said the
> more eyes we get on this the better.
> 
> Sebastian please find the pull request here [0]. The code is stable with "ant 
> clean runtime test"
> passing successfully. This is my first experience submitting a java patch at 
> this scale. Please feel
> free to provide any suggestion. 
> 
> Everyone is welcome to test this code and review it.
> 
> Thanks,
> Omkar
> 
> [0] https://github.com/apache/nutch/pull/221 
> 
> On 11 September 2017 at 00:03, kenneth mcfarland <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Nice work Omkar, thumbs up from a fellow student.
> 
>     On Sep 10, 2017 10:37 AM, "Omkar Reddy" <[email protected]
>     <mailto:[email protected]>> wrote:
> 
> 
>         Hi Sebastian,
> 
>         While squashing the pull request there was some mistake and the 
> commits were deleted. I will
>         send a new pull request and keep you posted in this thread.
> 
>         Thanks,
>         ~Omkar
> 
>         > On 10-Sep-2017, at 11:01 PM, Sebastian Nagel 
> <[email protected]
>         <mailto:[email protected]>> wrote:
>         >
>         > Hi,
>         >
>         > thanks, Omkar for your work!
>         >
>         > Just wanted to start testing, but looks like the pull request is 
> lost.
>         >
>         > Thanks,
>         > Sebastian
>         >
>         >> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
>         >> Hi user@ and dev@,
>         >>
>         >> As part of the Nutch Google Summer of Code effort this year, Omkar 
> Reddy and I have been
>         working
>         >> persistently throughout the summer months on the Hadoop MapReduce 
> API upgrade e.g. NUTCH-2375
>         >> Upgrade the code base from org.apache.hadoop.mapred to 
> org.apache.hadoop.mapreduce [0].
>         >> We believe we are now at a stage where this code is stable and 
> should be opened for
>         widespread
>         >> community review. It is a large patch, so the more eyes we can get 
> on this the better.
>         Upgrading
>         >> MapReduce API usage in Nutch is long overdue so this will be a 
> significant addition to
>         the Nutch
>         >> project.
>         >>
>         >> The proposed pull request can be found at [1]. Please report any 
> outcomes back to the
>         issue tracker
>         >> at [1].
>         >>
>         >> Thank you
>         >> Lewis
>         >>
>         >> N.B. Please note that the official version of Apache Hadoop 
> supported by Nutch master
>         branch at this
>         >> time is 2.7.2.
>         >>
>         >> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
>         <https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375>
>         >> [1] https://github.com/apache/nutch/pull/188 
> <https://github.com/apache/nutch/pull/188>
>         >>
>         >> --
>         >> http://home.apache.org/~lewismc/ <http://home.apache.org/~lewismc/>
>         >> @hectorMcSpector
>         >> http://www.linkedin.com/in/lmcgibbney 
> <http://www.linkedin.com/in/lmcgibbney>
>         >
> 
> 

Reply via email to