Hi,
short status of testing from my side:
- successfully run a small test crawl in local mode
(only inject + few generate-fetch-parse-update cycles)
- crawling in distributed mode (on Hadoop cluster) fails,
generator does not generate fetch lists:
17/09/14 13:56:09 WARN crawl.Generator: Generator: 0 records selected for
fetching, exiting ...
I've retried generator with the current master: it's definitely related to the
current NUTCH-2375 branch/PR. Afaics, this is due to not properly set
configuration variables,
changes are requested.
Best,
Sebastian
On 09/11/2017 08:06 AM, Omkar Reddy wrote:
> Hi,
>
> Kenneth thank you for your appreciation. Please participate in the code
> review. As Lewis said the
> more eyes we get on this the better.
>
> Sebastian please find the pull request here [0]. The code is stable with "ant
> clean runtime test"
> passing successfully. This is my first experience submitting a java patch at
> this scale. Please feel
> free to provide any suggestion.
>
> Everyone is welcome to test this code and review it.
>
> Thanks,
> Omkar
>
> [0] https://github.com/apache/nutch/pull/221
>
> On 11 September 2017 at 00:03, kenneth mcfarland <[email protected]
> <mailto:[email protected]>> wrote:
>
> Nice work Omkar, thumbs up from a fellow student.
>
> On Sep 10, 2017 10:37 AM, "Omkar Reddy" <[email protected]
> <mailto:[email protected]>> wrote:
>
>
> Hi Sebastian,
>
> While squashing the pull request there was some mistake and the
> commits were deleted. I will
> send a new pull request and keep you posted in this thread.
>
> Thanks,
> ~Omkar
>
> > On 10-Sep-2017, at 11:01 PM, Sebastian Nagel
> <[email protected]
> <mailto:[email protected]>> wrote:
> >
> > Hi,
> >
> > thanks, Omkar for your work!
> >
> > Just wanted to start testing, but looks like the pull request is
> lost.
> >
> > Thanks,
> > Sebastian
> >
> >> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
> >> Hi user@ and dev@,
> >>
> >> As part of the Nutch Google Summer of Code effort this year, Omkar
> Reddy and I have been
> working
> >> persistently throughout the summer months on the Hadoop MapReduce
> API upgrade e.g. NUTCH-2375
> >> Upgrade the code base from org.apache.hadoop.mapred to
> org.apache.hadoop.mapreduce [0].
> >> We believe we are now at a stage where this code is stable and
> should be opened for
> widespread
> >> community review. It is a large patch, so the more eyes we can get
> on this the better.
> Upgrading
> >> MapReduce API usage in Nutch is long overdue so this will be a
> significant addition to
> the Nutch
> >> project.
> >>
> >> The proposed pull request can be found at [1]. Please report any
> outcomes back to the
> issue tracker
> >> at [1].
> >>
> >> Thank you
> >> Lewis
> >>
> >> N.B. Please note that the official version of Apache Hadoop
> supported by Nutch master
> branch at this
> >> time is 2.7.2.
> >>
> >> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
> <https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375>
> >> [1] https://github.com/apache/nutch/pull/188
> <https://github.com/apache/nutch/pull/188>
> >>
> >> --
> >> http://home.apache.org/~lewismc/ <http://home.apache.org/~lewismc/>
> >> @hectorMcSpector
> >> http://www.linkedin.com/in/lmcgibbney
> <http://www.linkedin.com/in/lmcgibbney>
> >
>
>