On Wed, Apr 7, 2010 at 20:32, Andrzej Bialecki a...@getopt.org wrote:
On 2010-04-07 18:54, Doğacan Güney wrote:
Hey everyone,
On Tue, Apr 6, 2010 at 20:23, Andrzej Bialecki a...@getopt.org wrote:
On 2010-04-06 15:43, Julien Nioche wrote:
Hi guys,
I gather that we'll jump straight to 2.0
Hi,
On Wed, Apr 7, 2010 at 21:19, MilleBii mille...@gmail.com wrote:
Just a question ?
Will the new HBase implementation allow more sophisticated crawling
strategies than the current score based.
Give you a few example of what I'd like to do :
Define different crawling frequency for
Not sure what u mean by pig script, but I'd like to be able to make a
multi-criteria selection of Url for fetching...
The scoring method forces into a kind of mono dimensional approach
which is not really easy to deal with.
The regex filters are good but it assumes you want select URLs on data
On Thu, Apr 8, 2010 at 21:11, MilleBii mille...@gmail.com wrote:
Not sure what u mean by pig script, but I'd like to be able to make a
multi-criteria selection of Url for fetching...
I mean a query language like
http://hadoop.apache.org/pig/
if we expose data correctly, then you should be
Hi,
I'm not sure what is the status of the nutchbase - it's missed a lot of
fixes and changes in trunk since it's been last touched ...
yes, maybe we should start the 2.0 branch from 1.1 instead
Dogacan - what do you think?
BTW I see there is now a 2.0 label under JIRA, thanks to whoever
Hey everyone,
On Tue, Apr 6, 2010 at 20:23, Andrzej Bialecki a...@getopt.org wrote:
On 2010-04-06 15:43, Julien Nioche wrote:
Hi guys,
I gather that we'll jump straight to 2.0 after 1.1 and that 2.0 will be
based on what is currently referred to as NutchBase. Shall we create a
branch for
Hi,
On 04/07/2010 07:54 PM, Doğacan Güney wrote:
Hey everyone,
On Tue, Apr 6, 2010 at 20:23, Andrzej Bialeckia...@getopt.org wrote:
On 2010-04-06 15:43, Julien Nioche wrote:
Hi guys,
I gather that we'll jump straight to 2.0 after 1.1 and that 2.0 will be
based on what is
Forgot to say that, at Hadoop, it is the convention that big issues,
like the ones under discussion come with a design document. So that a
solid design is agreed upon for the work. We can apply the same pattern
at Nutch.
On 04/07/2010 07:54 PM, Doğacan Güney wrote:
Hey everyone,
On Tue, Apr
On 2010-04-07 18:54, Doğacan Güney wrote:
Hey everyone,
On Tue, Apr 6, 2010 at 20:23, Andrzej Bialecki a...@getopt.org wrote:
On 2010-04-06 15:43, Julien Nioche wrote:
Hi guys,
I gather that we'll jump straight to 2.0 after 1.1 and that 2.0 will be
based on what is currently referred to
On 2010-04-07 19:24, Enis Söztutar wrote:
Also, the goal of the crawler-commons project is to provide APIs and
implementations of stuff that is needed for every open source crawler
project, like: robots handling, url filtering and url normalization, URL
state management, perhaps
Just a question ?
Will the new HBase implementation allow more sophisticated crawling
strategies than the current score based.
Give you a few example of what I'd like to do :
Define different crawling frequency for different set of URLs, say
weekly for some url, monthly or more for others.
On 2010-04-06 15:43, Julien Nioche wrote:
Hi guys,
I gather that we'll jump straight to 2.0 after 1.1 and that 2.0 will be
based on what is currently referred to as NutchBase. Shall we create a
branch for 2.0 in the Nutch SVN repository and have a label accordingly for
JIRA so that we can
12 matches
Mail list logo