+1 from me
thought I had already done it - sorry
J.
On 14 June 2010 16:30, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
Hey Nutch PMC’ers:
*nudge*
We currently have 2 PMC binding +1's on this VOTE:
Chris Mattmann
Doğacan Güney
Would be great to wrap up the 1.1
Dogacan has produced a patch for svn nutchbase that brings it to the level
of github. See https://issues.apache.org/jira/browse/NUTCH-650
The patch has been marked as 'licensed for inclusion in ASF work' and works
fine.
Any objections to this patch being committed?
Thanks Dogacan for producing
, deletion of old
plugins, etc...
Thanks
J.
On 29 June 2010 21:27, Dennis Kubes ku...@apache.org wrote:
+1 on this
On 06/29/2010 08:57 AM, Julien Nioche wrote:
Dogacan has produced a patch for svn nutchbase that brings it to the level
of github. See https://issues.apache.org/jira/browse/NUTCH
(This question is mostly to Dogacan Enis, but I encourage anyone familiar
with the code to join the threads with [Nutchbase] - the sooner the better
;) ).
I'm looking at src/gora/webpage.avsc and WebPage.java friends...
presumably the java code was autogenerated from avsc using Gora? If
On 2 July 2010 12:22, Andrzej Bialecki a...@getopt.org wrote:
On 2010-07-02 12:42, Julien Nioche wrote:
Hi guys,
You've probably seen that there has been some progress on 2.0 lately.
We've
updated the nutchbase svn branch with the latest developments done on
Dogacan's Github i.e. using
Hi Cesar,
This can definitely be done using a custom parse plugin and an indexing
plugin. We did something like this sometime ago to classify adult pages
using our text classification API (
http://code.google.com/p/textclassification/) which is based on SVM.
Out of interest, what categories are
Hi Ken,
Thank you for your comments and analysis. We should probably modify the
HTMLHandler so that it does not discard a frameset because of the bodylevel
being equal to 0. I suggested earlier on the Tika list having a mechanism
for specifying a custom handler via the Context, that would give
Daniel,
Your message is not relevant for this mailing list. If you have questions
about the TC API use http://groups.google.com/group/digitalpebble instead.
Thanks
On 8 July 2010 01:56, dgimenes dran...@gmail.com wrote:
Julien,
I'm in Luan's project too.
I'd like to know if you have
BUILD SUCCESSFUL
Total time: 24 minutes 31 seconds
Publishing Javadoc
Archiving artifacts
ERROR: No artifacts found that match the file pattern
trunk/build/*.tar.gz. Configuration error?
ERROR: 'trunk/build/*.tar.gz' doesn't match anything: 'trunk' exists but
not 'trunk/build/*.tar.gz'
Now that you mention upgrade solutions from 1.x to 2.0 I suggest that we
open
a JIRA to discuss this. IMHO we probably don't want to keep the 'old'
code in
src/java when we merge but could have the code for the conversion
utilities
and the Nutch 1.x jars in a the contrib/ directory
Thanks for your comments Chris
However we still need to address the issue raise by Dogacan i.e shall we
provide tools to convert from 1.x structures to 2.0 and if so how shall
we
organise it. Again - some things have been removed fom NutchBase for the
sake
of clarity but since they are
Before doing so,
let's:
1. tag current trunk as
http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 (EOL'ed won't
be
worked on, but nice to save). This way someone doesn't have to remember
the
Nutchbase rev # before the Nutchbase branch lands in the trunk.
Then we can:
2. svn
On 23 July 2010 10:20, Julien Nioche lists.digitalpeb...@gmail.com wrote:
Before doing so,
let's:
1. tag current trunk as
http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 (EOL'ed won't
be
worked on, but nice to save). This way someone doesn't have to remember
the
Nutchbase
) and request Hudson Zones karma from @infra.
I’d be happy to be this guy since I do the RM’ing a lot, but it might be
nice to have someone else do it in case I get hit by a bus :)
Cheers,
Chris
On 7/26/10 10:24 PM, Julien Nioche lists.digitalpeb...@gmail.com
wrote:
does anyone have any idea
issue in JIRA and then
link your issue to the issue that you wanted to reopen. It’s just as easy
and doesn’t cause the out of sync problem.
OK, makes sense
Cheers,
Chris
On 8/9/10 7:45 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote:
I reopened https://issues.apache.org/jira
It's probably more an issue with DNS resolution than robots.txt. Even if you
respect the robots.txt instructions you can still have N host or even domain
names pointing to a single server. This can be avoided in Nutch by setting
'partition.url.mode' and 'fetcher.queue.mode' to 'byIP'.
On 16
Hi David,
I haven't used the Hbase backend with GORA for quite some time but from what
I can remember you'll need the following things :
* conf/hbase-site.xml = this should correspond to your local configuration
* conf/gora-hbase-mapping.xml = see below
* conf/gora.properties = don't think there
Hi Faruk,
You can either set a lower value for the parameter http.content.limit or
modify the mapping and set
field name=content column=content jdbc-type=MEDIUMBLOB/
which should work for mysql.
See the discussion on http://github.com/enis/gora/issues/closed#issue/48
HTH
Julien
--
*
*Open
Hi guys,
I've summarized the steps to follow for having GORA+Hbase with Nutch 2.0 on
http://wiki.apache.org/nutch/GORA_HBase
Feel free to amend and improve as you see fit.
Please bear in mind that Nutch 2.0 is at a very early stage and is far from
being bug-proof, see in particular [1].
HTH
on this?
Julien
On 4 January 2011 21:44, Julien Nioche lists.digitalpeb...@gmail.comwrote:
+1 from me. I've committed today a bunch of patches which were in 1.2 but
not in 1.3 (just one last one to do) but haven't compared with 2.0
Having a release based on 1.3 would be great as it would be a nice
Hi Jurgen,
Since I wrote this email - which I thought got ignored by the
Nutch developers -
Thanks for reporting the problem Jurgen. and sorry that you felt you were
being ignored. The few active developers Nutch has contribute during their
spare time, the reason why you did not get any
On 22 March 2011 04:15, Kirby Bohling kirby.bohl...@gmail.com wrote:
Is there some reason this is allowed to continue to build if nobody is
going to actually get it to build successfully? I am assuming this
has something to do with the Ivy resolution of the Gora library that
isn't publicly
Gabriele,
I think it is a good idea to have a script like this however your proposal
could be improved. It currently works only on a single machine and uses
commands such as mv, ls etc... which won't work on a pseudo or fully
distributed cluster. You should use the 'hadoop fs' commands instead.
See http://www.slf4j.org/faq.html#IllegalAccessError
This error is caused by the static initilizer of the LoggerFactory class
attempting to directly access the SINGLETON field of
org.slf4j.impl.StaticLoggerBinder. While this was allowed in SLF4J 1.5.5
and earlier, in 1.5.6 and later the
Yep. 0.1 has been released and the artifacts should be available soon
On Friday, 8 April 2011, Otis Gospodnetic ogjunk-nu...@yahoo.com wrote:
Hi,
Just curious - is the plan to wait for the GORA 0.1 release to get published
somewhere (not familiar with Ivy, so I'm not sure where things need to
Someone suggested that we used an ant task to generate the pom from the Ivy
files. This would be far a cleaner option then having to keep this bl***d
pom.xml file in sync all the time
On 12 April 2011 15:11, Markus Jelsma markus.jel...@openindex.io wrote:
Hi guys,
I found out that pom.xml
/makepom.html) and
remove the pom.xml from SVN? Is there anything in that pom.xml that wouldn't
be generated by makepom?
J.
On 12 April 2011 15:24, Julien Nioche lists.digitalpeb...@gmail.com wrote:
Someone suggested that we used an ant task to generate the pom from the Ivy
files. This would
://digitalpebble.blogspot.com/
http://www.digitalpebble.com
On 14 April 2011 08:55, Julien Nioche lists.digitalpeb...@gmail.com wrote:
There has been a large number of substantial changes with 1.3 (search
delegated to SOLR, separation between local and distributed runtimes, )
and we'll need to reflect
Hi Chris,
Thanks for the RC.
I think we should fix the 2 issues below.
https://issues.apache.org/jira/browse/NUTCH-985 : bug with lastModifiedDate
https://issues.apache.org/jira/browse/NUTCH-983 : port SOLRJ to 3.1
I expect many users would use the latest version of SOLR so we might as well
Hi Markus
Any param overridden by the users should be in nutch-site.xml, not just
http.agent, so why make an exception for it? Moreover that will not
necessarily prevent people from using nutch-default.xml
Maybe we could set nutch-default to readonly? Could be changed by the user
but this might
Hi Markus
We might as well do it properly and commit in the same way as index and
clean do.
Thanks for all your excellent work BTW
Julien
On 27 April 2011 15:16, Markus Jelsma markus.jel...@openindex.io wrote:
Hi,
Title says it all. The job doesn't send a commit while index and clean do.
Hi Chris,
I don't think we have finished with the dates and update of SOLR to 3.1 yet.
I'll also try to do
NUTCH-888https://issues.apache.org/jira/browse/NUTCH-888in the next
couple of days.
Thanks
Julien
On 30 April 2011 05:20, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
;-)
On 4 May 2011 16:26, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov
wrote:
Awww, sniffbye parse-rss!
On May 4, 2011, at 11:20 AM, jnio...@apache.org jnio...@apache.org
wrote:
Author: jnioche
Date: Wed May 4 15:20:00 2011
New Revision: 1099483
URL:
Would need to check in the code but I think that this field is used for
storing the value of the meta tags cache-control.
Since we don't do caching anymore since delegating to SOLR, this is not
really useful but could be again the future. Let's leave it as is for now
and document what the field
Hi
Could you please open a JIRA with a description of the problem and attach a
patch generated against the branch-1.3 with 'svn diff'?
Thanks
2011/5/9 ldk_5370 ldk_5...@163.com
hi,
I found a bug about calss org.apache.nutch.protocol.http.HttpResponse,
HttpResponse can not got all html
everywhere then
format it properly in the SOLRWriter. We could of course to the latter now,
but since I have no time to do it in the short time and don't want to twist
your arm I'll let you decide
On Thursday 05 May 2011 15:34:56 Julien Nioche wrote:
Hi Markus,
Sorry for the late reply
Hi,
The title says it all. I'm searching for interesting use cases for my Nutch
talk at Berlin. Do you use Nutch in an interesting way or on a particularly
large scale? If you think your use case could be a good illustration of what
Nutch does, please get in touch and I'll happily include it in
? Ready for RC2 on 1.3? Got some free time tonight and in the releasing
mood :-)
Cheers,
Chris
On Apr 30, 2011, at 9:41 AM, Julien Nioche wrote:
Hi Chris,
I don't think we have finished with the dates and update of SOLR to 3.1
yet. I'll also try to do NUTCH-888 in the next couple of days
.
Thanks
Jul
On 21 May 2011 03:51, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
Hey Guys,
WDYT? Ready for RC2 on 1.3? Got some free time tonight and in the
releasing mood :-)
Cheers,
Chris
On Apr 30, 2011, at 9:41 AM, Julien Nioche wrote:
Hi
Viksit,
Please check if this has already been reported on the JIRA and if not open a
new issue (for 2.0)
Thanks
Julien
On 25 May 2011 19:02, Viksit Gaur vik.list.nu...@gmail.com wrote:
[Cross posting since this might be more relevant here.]
--
Hi all,
Trying to run nutch on Elastic
Mattmann
Markus Jelsma
Julien Nioche
Lewis John McGibbney
I'll go ahead and push the release to the mirrors and release the Maven
repo to Central and then send an ANNOUNCE.
Thanks!
Cheers,
Chris
++
Chris Mattmann, Ph.D
Guys,
I added a new label 1.4 on the JIRA. Shall we create a new branch 1.4 on SVN
from the existing 1.3? I agree that it is a pain to have to maintain 1.x AND
trunk in parallel but my feeling is that 2.0 needs more work before being
completely reliable and in the meantime we might want to add
http://nutch.apache.org/mailing_lists.html
- dev-unsubscr...@nutch.apache.org
On 12 June 2011 14:33, Tolga Soyata tolgasoy...@gmail.com wrote:
Please remove me from the mailing list
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
Hi,
Please open a new issue on https://issues.apache.org/jira/browse/NUTCH
Thanks
Julien
On 13 June 2011 04:20, Yavinty yavi...@gmail.com wrote:
Hello,
I have a bug-fix for Nutch 1.3 (solrdedup throwing
NullPointerException), where do I submit it?
Thanks.
--
*
*Open Source
Guys,
I've created a new branch for 1.4 on *
https://svn.apache.org/repos/asf/nutch/branches/branch-1.4 *
Thanks
Jul
On 10 June 2011 12:11, Markus Jelsma markus.jel...@openindex.io wrote:
Guys,
I added a new label 1.4 on the JIRA. Shall we create a new branch 1.4 on
SVN from the
Hi,
[...]
Yes indeed. I see that Gora is still in incubation and I have not been
using trunk for sometime as it has been broken due to Gora dependencies? I
think this suggestion is the only sensible way to continue. As I have not
been using trunk, what is the current situation with this?
Hi Lewis,
Currently the slightly (in places) dated roadmap can be found here [1], I
was wondering if we could give this an overhaul/update as it would give a
more robust overview of where trunk is going. Most of the points you make
are still in development, however some have been achieved and
Hi Lewis,
As I am back home I propose to rebuild the site to link the current
tutorial link to the new 1.3 tutorial on the wiki. I would also like to
formally make my first committ by adding my name to the list of committers
before I progress with other bits and pieces.
Good idea!
See
http://nutch.apache.org/mailing_lists.html
Hey,
please delete my E-Mail address from your mailing list or whatever. I
receive more than 50 mails every day.
Bye
--
Marcel Schubert
Auszubildener
TU ClausthalE-Mail: schub...@rz.tu-clausthal.de
Rechenzentrum
Hi Matthew,
This is usually achieved by writing a script containing the individual Nutch
commands (as opposed to calling 'nutch crawl') and index at the end of a
generate-fetch-parse-update-linkdb sequence. You don't need any plugins for
that
HTH
Julien
On 12 July 2011 13:35, Matthew Painter
%20incremental%20script
On Tue, Jul 12, 2011 at 2:15 PM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
Hi Matthew,
This is usually achieved by writing a script containing the individual
Nutch commands (as opposed to calling 'nutch crawl') and index at the
end
Are you sure we don't we already filter and normalize at the end of the
parse? (not in front of code - sorry can't check)
On 14 July 2011 16:37, Markus Jelsma markus.jel...@openindex.io wrote:
Hi,
If we filter and normalize hyperlinks in the parse job, we wouldn't have to
filter and
http://www.google.co.uk/search?q=nutch+mailing+list - 1st result
On 14 July 2011 16:50, Zanzico Gioele gioele.zanz...@vitecgroup.com wrote:
how can i be deleted from this mailing list pls ?
tks
ciao
gioele
Gioele Zanzico
Senior Web Analyst
Vitec Group Imaging Staging Division
Direct
updated the db and it worked. Now i have two urls.
not clear. Was there only one outlink in that seed? Did the filtering work
or not?
More thoughts? :)
On Thursday 14 July 2011 18:31:07 Julien Nioche wrote:
Are you sure we don't we already filter and normalize at the end of the
parse
On Thursday 14 July 2011 15:03:34 Julien Nioche wrote:
Have been thinking about this again. We could make so that the indexer
does
not necessarily require a linkDB : some people are not particularly
interested in getting the anchors. At the moment you have to have a
linkDB
Please excuse (and correct) my ignorance, but I need to clear this one up so
I understand correctly. The purpose the mvn.template file serves is so we
can specify exactly who can commit a Nutch maven pom. The pom in turn
specifies the build dirs e.g. source dir as well as test dir. Then finally
make Nutch require Lucene as a dependency -- this
would provide more stable updates.
Dawid
On Mon, Jul 25, 2011 at 10:35 AM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
Hi Kirby,
Thanks for sharing this. It is definitely relevant for Nutch and I am sure
that there would be quite
-
Key: NUTCH-1071
URL: https://issues.apache.org/jira/browse/NUTCH-1071
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Julien Nioche
Assignee: Julien Nioche
Markus,
Have just committed a change to CrawlDBReducer (rev 1152254)
see line 155
- reporter.getCounter(CrawlDB status, CrawlDatum.getStatusName(*
old*.getStatus())).increment(1);
was using the wrong object :-(
Would you mind giving it a try?
Thanks
Julien
nope, see https://issues.apache.org/jira/browse/NUTCH-875
On 3 August 2011 01:09, Tom Davidson tdavid...@covario.com wrote:
Does the LinkAnalysisScoringFilter in Nutch 2 work?
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
Hi Kirby,
Grumble, Grumble. (adding dev@nutch, as that is more than likely
where this discussion really belongs)...
am adding gora-...@incubator.apache.org as well
It'd be really nice if folks could just follow the commands in the
nightly build, and get a build pushed out. I've pointed
Hi Tom,
I have been using Nutch 1.x for the last 9 months or so and it works well
for large scale crawls up to around a billion pages. However, the inherent
lack of random access in HDFS really starts to become a burden on our hadoop
cluster when going through the whole
That's great, thanks!
On 10 August 2011 14:58, lewis john mcgibbney lewis.mcgibb...@gmail.comwrote:
Hi,
Just for information purposes, I committed our DOAP which can now be found
under trunk svn. I have been informed by site-dev@ that the system they
use oes not support more than one doap
I must be missing something here but how do you plan to get the nightly
builds to compile without declaring Gora as a dependency in Ivy? Will you
put a hard copy of the jars?
The public artefacts for Gora 0.1.incubating are incorrect, as for 0.1.1
they have not been published yet - in a nutshell
+1 let's replace it with a shell script instead.
On 22 August 2011 21:56, Markus Jelsma markus.jel...@openindex.io wrote:
Hi,
The crawl command seems to add a lot of confusion. It hides the entire
crawl
cycle logic from new users, leading to questions, lack of understanding of
basic Nutch
Julien
Is an immediate crawl-with-one-command a desired feature? Provided as Java
code or shell script?
On Tuesday 23 August 2011 10:12:57 Julien Nioche wrote:
+1 let's replace it with a shell script instead.
On 22 August 2011 21:56, Markus Jelsma markus.jel...@openindex.io
wrote:
Hi
Simone,
Would you mind opening a JIRA for this and attach your patch + grant it to
ASF? I know it is fairly small but it makes it easier to track the progress,
link to svn commits, etc...
Thanks
Julien
On 23 August 2011 07:53, Simone Frenzel psimon...@googlemail.com wrote:
--
Make sure you specify the params in runtime/deploy/conf unless you rebuild
the job file with 'ant job'
On 24 August 2011 16:09, Ferdy Galema ferdy.gal...@kalooga.com wrote:
Hi,
Compiling Nutch 1.3 with patch NUTCH-993 (newest patch) and configuring
mapreduce.job.jar.unpack.**pattern and
Resending your messages every hour won't get you more answers - at the
opposite
On 26 August 2011 09:28, Kaiwii Ho kaiwi...@gmail.com wrote:
I'm a freshman learning about the nutch.
Here,I have serval questions:
1、URLNormalizer is a kind of a ExtensionPoint.But why does it implement the
Thanks Lewis, that's great!
On 16 September 2011 12:20, lewis john mcgibbney
lewis.mcgibb...@gmail.comwrote:
Branch 1.4 build set up and 'should' be running succesfully from now on.
This will also auto update any JIRA issues which have been committed with
some Jenkins commentary.
At least
Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall we
reduce the various options described before to a single one?
Julien
On 15 September 2011 19:55, Markus Jelsma markus.jel...@openindex.iowrote:
Hi Guys,
I thought I'd chime in on this thread. My comments below:
Hi,
Following the discussions [1] on the dev-list about the future of Nutch 2.0,
I would like to call for a vote on moving Nutch 2.0 from the trunk to a
separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The
arguments for / against can be found in the thread I mentioned.
The
Here is my vote :
+1 : Shelve 2.0 and move 1.4 to trunk
Julien
On 18 September 2011 10:21, Julien Nioche lists.digitalpeb...@gmail.comwrote:
Hi,
Following the discussions [1] on the dev-list about the future of Nutch
2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk
wanted to do that in 2.0. Again, if people want to get involved and
improve it they will be able to do so.
Thanks
Julien
On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
Here is my vote :
+1 : Shelve 2.0 and move 1.4 to trunk
Julien
On 18 September
Hi Folks,
Okey dok, this VOTE has passed with the following tallies:
+1 PMC
Markus Jelsma
Sami Siren
Chris Mattmann
Lewis John McGibbney
Dennis Kubes
Julien Nioche
Andrzej Bialecki
-1 PMC
Alexis de Tréglodé
-1 Community
Radim Kola
Accordingly we will move the current Nutch trunk to a bew
Elisabeth,
Great. Could you attach your patch to the original issue in JIRA instead and
check the box : Grant license to ASF for inclusion in ASF works?
Julien
On 21 September 2011 16:47, Elisabeth Adler elisabeth.ad...@gmail.comwrote:
Hi,
Based on the suggestions/code from
+1 thanks Chris
On 22 September 2011 04:12, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
Guys,
If no one objects, I will execute the move Friday by 12pm PDT.
Will that work?
Cheers,
Chris
On Sep 21, 2011, at 3:09 AM, Julien Nioche wrote:
Hi Folks,
Okey dok
Can someone please tell me how changes to
https://svn.apache.org/repos/asf/nutch/site/ are populated to actually
update our site. My suspicions are that the URL gets 'svn up' on
people.apache.org to publish our website, however I wish I could confirm
this.
IIRC it uses SVNpubsub
The
Thanks Chris!
On 24 September 2011 01:36, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
Okey dok, the news item is now published.
Let the dev'ing commence!
Cheers,
Chris
On Sep 23, 2011, at 4:57 PM, Mattmann, Chris A (388J) wrote:
Hi Folks,
Per:
We don't have moderators for the user and dev lists
On 26 September 2011 20:09, lewis john mcgibbney
lewis.mcgibb...@gmail.comwrote:
Thanks Markus,
Who is mailing list moderator? If I can get this info before trying to
contact infra it would be great.
On Mon, Sep 26, 2011 at 7:37 PM,
You are welcome. Thank you for all your work!
On 26 September 2011 18:47, lewis john mcgibbney
lewis.mcgibb...@gmail.comwrote:
Hi ,
As per Julian's recent commit to include correct gora artefacts I have
established a new build [1] for nutchgora branch development. We have some
issues with
would like to step down from the moderator status and have someone else
do moderation instead, because frankly I have not been doing a great job
with it. Any volunteers?
--
Sami Siren
On Tue, Sep 27, 2011 at 12:09 AM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
We don't have
+1
have created a 1.5 version in JIRA.
Thanks
Julien
On 27 September 2011 22:01, Markus Jelsma markus.jel...@openindex.iowrote:
Hi,
There are some bad issues in 1.3 that are fixed in early 1.4 revisions.
Also,
1.4 has some nice improvements and new features. I know some would like to
:
yes +1
Thanks for bringing this up Markus.
I would like to get NUTCH-1078 sorted out ASAP. However I'll comment on
that issue separately.
On Wed, Sep 28, 2011 at 9:46 AM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
+1
have created a 1.5 version in JIRA.
Thanks
Julien
On 27
Hi,
A while back the NUTCH PMC nominated Ferdy Galema for Nutch committership
and PMC membership. The VOTE tallies in Nutch PMC have occurred and I'm
happy to announce that Ferdy is now a Nutch committer.
Ferdy, feel free to say a little bit about yourself. Your account has been
created and you
Guys,
I have probably missed a discussion on this lately but I really don't
remember that we'd decided to move from ANT+IVY. We've had numerous
discussions on this in the past, all leading to the conclusion that
maintaining two systems is a bad idea.
Have I missed something?
Jul
PS: If we had
+MAVEN at all?
Jul
Thanks!
Cheers,
Chris
On 31 October 2011 15:39, Markus Jelsma markus.jel...@openindex.io
wrote:
This was the thing, isn't it?
https://issues.apache.org/jira/browse/NUTCH-995
On Monday 31 October 2011 16:28:18 Julien Nioche wrote:
Guys,
I have probably
, Oct 31, 2011 at 4:38 PM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
I was under the impression that to publish Nutch artefacts to maven repo
we need to have a working pom.xml? Is this correct? This was all I was
referring to.
OK, sorry I now understand
yes we do. It should
Thanks Chris,
* it would be good to have the same folder name for the src and bin
versions. They are currently 'nutch-1.4' and 'apache-nutch-1.4'
* do we really need to include the KEYS file?
* bin version contains pom.xml, src version does not. Either include in
both or remove altogether
* What
I do use Eclipse for editing the code but build the jars/jobs with ANT. I
use IVYDE for managing the dependencies
On 7 November 2011 23:23, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote:
Hi guys,
Can anyone inform whether they are using Nutch trunk within Eclipse?
Thanks
Lewis
--
.
About the runtime/local thing, I think we can do that for 1.5, but I am
totally +1 for it.
OK for 1.5
Thanks a lot
Julien
Let me know what you think. Thanks!
Cheers,
Chris
On Nov 7, 2011, at 7:59 AM, Julien Nioche wrote:
Thanks Chris,
* it would be good to have the same
, Julien Nioche wrote:
Hi Chris
Thanks for the review. Would you consider the below blockers, or
would-be-nice-to-fix? If none are blockers I propose fixing them in 1.5
and pushing 1.4. Thoughts?
see below
I agree on the naming, sorry for the screw-up.
no probs. Do you
We (DigitalPebble) managed the crawl for them and wrote the custom bits
they required. The problems they mentioned were more related to EC2 than
Hadoop as such. More on
http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html
Jul
On 17 November 2011 16:57, Lewis John Mcgibbney
Hi Alexander,
Which version of OpenJDK is it? I have Nutch running on one of my servers
with
*java version 1.6.0_22
OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)*
and I don't have any problems compiling
Julien
guys,
any idea as to why this is not compiling anymore?
J.
On 4 January 2012 04:20, Apache Jenkins Server jenk...@builds.apache.orgwrote:
See https://builds.apache.org/job/Nutch-trunk/1714/
--
[...truncated 2386 lines...]
resolve-default:
January 2012 10:52:08 Julien Nioche wrote:
guys,
any idea as to why this is not compiling anymore?
J.
On 4 January 2012 04:20, Apache Jenkins Server
jenk...@builds.apache.orgwrote:
See https://builds.apache.org/job/Nutch-trunk/1714
Note : the latest stable is 1215090 i.e. things started to get bad when
moving to hadoop 0.22 (rev 1220786).
On 4 January 2012 16:45, Julien Nioche lists.digitalpeb...@gmail.comwrote:
The problem is not with the urlfilter package as such but with the fact
that the junit jar is removed from
).
that's novelty to me - do we know what causes them to fail?
Any hints?
Note : the latest stable is 1215090 i.e. things started to get bad when
moving to hadoop 0.22 (rev 1220786).
On 4 January 2012 16:45, Julien Nioche lists.digitalpeb...@gmail.com
wrote:
The problem
Hi Eddie,
Great to hear that! Just to add to what Markus said there are also quite a
few tasks to do on the NutchGora branch if that's something you'd be
interested in. Or outside the tasks on JIRA, there is always a fair bit to
do on the Wiki e.g. how to run in distributed mode etc...
Just out
Hi Eddie,
* I've also re-created the lucene index plugin as part of our plugin, as we
don't use Solr, but our own search application. *
One task you could be interested in is to make the indexing backends
pluggable. See https://issues.apache.org/jira/browse/NUTCH-1047 / for
details. This would
1 - 100 of 1456 matches
Mail list logo