Re: [VOTE] Apache Nutch 1.1 Release Candidate #4

2010-06-14 Thread Julien Nioche
+1 from me thought I had already done it - sorry J. On 14 June 2010 16:30, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Nutch PMC’ers: *nudge* We currently have 2 PMC binding +1's on this VOTE: Chris Mattmann Doğacan Güney Would be great to wrap up the 1.1

Update svn nutchbase - Nutch 2.0

2010-06-29 Thread Julien Nioche
Dogacan has produced a patch for svn nutchbase that brings it to the level of github. See https://issues.apache.org/jira/browse/NUTCH-650 The patch has been marked as 'licensed for inclusion in ASF work' and works fine. Any objections to this patch being committed? Thanks Dogacan for producing

Re: Update svn nutchbase - Nutch 2.0

2010-06-30 Thread Julien Nioche
, deletion of old plugins, etc... Thanks J. On 29 June 2010 21:27, Dennis Kubes ku...@apache.org wrote: +1 on this On 06/29/2010 08:57 AM, Julien Nioche wrote: Dogacan has produced a patch for svn nutchbase that brings it to the level of github. See https://issues.apache.org/jira/browse/NUTCH

Re: [Nutchbase] WebPage class is a generated code?

2010-07-02 Thread Julien Nioche
(This question is mostly to Dogacan Enis, but I encourage anyone familiar with the code to join the threads with [Nutchbase] - the sooner the better ;) ). I'm looking at src/gora/webpage.avsc and WebPage.java friends... presumably the java code was autogenerated from avsc using Gora? If

Re: Nutch 2.0 : Design issue

2010-07-02 Thread Julien Nioche
On 2 July 2010 12:22, Andrzej Bialecki a...@getopt.org wrote: On 2010-07-02 12:42, Julien Nioche wrote: Hi guys, You've probably seen that there has been some progress on 2.0 lately. We've updated the nutchbase svn branch with the latest developments done on Dogacan's Github i.e. using

Re: Classifying pages on Nutch: plugins?

2010-07-06 Thread Julien Nioche
Hi Cesar, This can definitely be done using a custom parse plugin and an indexing plugin. We did something like this sometime ago to classify adult pages using our text classification API ( http://code.google.com/p/textclassification/) which is based on SVM. Out of interest, what categories are

Re: Parse-tika ignores too much data...

2010-07-07 Thread Julien Nioche
Hi Ken, Thank you for your comments and analysis. We should probably modify the HTMLHandler so that it does not discard a frameset because of the bodylevel being equal to 0. I suggested earlier on the Tika list having a mechanism for specifying a custom handler via the Context, that would give

Re: Classifying pages on Nutch: plugins?

2010-07-08 Thread Julien Nioche
Daniel, Your message is not relevant for this mailing list. If you have questions about the TC API use http://groups.google.com/group/digitalpebble instead. Thanks On 8 July 2010 01:56, dgimenes dran...@gmail.com wrote: Julien, I'm in Luan's project too. I'd like to know if you have

Re: Build failed in Hudson: Nutch-trunk #1202

2010-07-09 Thread Julien Nioche
BUILD SUCCESSFUL Total time: 24 minutes 31 seconds Publishing Javadoc Archiving artifacts ERROR: No artifacts found that match the file pattern trunk/build/*.tar.gz. Configuration error? ERROR: 'trunk/build/*.tar.gz' doesn't match anything: 'trunk' exists but not 'trunk/build/*.tar.gz'

Re: svn commit: r965815 - in /nutch/branches/nutchbase/src: java/org/apache/nutch/parse/ParseStatus.java java/org/apache/nutch/parse/ParseText.java test/org/apache/nutch/parse/TestParseText.java

2010-07-20 Thread Julien Nioche
Now that you mention upgrade solutions from 1.x to 2.0 I suggest that we open a JIRA to discuss this. IMHO we probably don't want to keep the 'old' code in src/java when we merge but could have the code for the conversion utilities and the Nutch 1.x jars in a the contrib/ directory

Re: svn commit: r965815 - in /nutch/branches/nutchbase/src: java/org/apache/nutch/parse/ParseStatus.java java/org/apache/nutch/parse/ParseText.java test/org/apache/nutch/parse/TestParseText.java

2010-07-20 Thread Julien Nioche
Thanks for your comments Chris However we still need to address the issue raise by Dogacan i.e shall we provide tools to convert from 1.x structures to 2.0 and if so how shall we organise it. Again - some things have been removed fom NutchBase for the sake of clarity but since they are

Re: Nutchbase merge strategy

2010-07-23 Thread Julien Nioche
Before doing so, let's: 1. tag current trunk as http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 (EOL'ed won't be worked on, but nice to save). This way someone doesn't have to remember the Nutchbase rev # before the Nutchbase branch lands in the trunk. Then we can: 2. svn

Re: Nutchbase merge strategy

2010-07-23 Thread Julien Nioche
On 23 July 2010 10:20, Julien Nioche lists.digitalpeb...@gmail.com wrote: Before doing so, let's: 1. tag current trunk as http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 (EOL'ed won't be worked on, but nice to save). This way someone doesn't have to remember the Nutchbase

Re: Build failed in Hudson: Nutch-trunk #1213

2010-07-27 Thread Julien Nioche
) and request Hudson Zones karma from @infra. I’d be happy to be this guy since I do the RM’ing a lot, but it might be nice to have someone else do it in case I get hit by a bus :) Cheers, Chris On 7/26/10 10:24 PM, Julien Nioche lists.digitalpeb...@gmail.com wrote: does anyone have any idea

Re: [VOTE] Apache Nutch 1.2 Release Candidate #1

2010-08-09 Thread Julien Nioche
issue in JIRA and then link your issue to the issue that you wanted to reopen. It’s just as easy and doesn’t cause the out of sync problem. OK, makes sense Cheers, Chris On 8/9/10 7:45 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: I reopened https://issues.apache.org/jira

Re: When a crawl goes bad...

2010-08-16 Thread Julien Nioche
It's probably more an issue with DNS resolution than robots.txt. Even if you respect the robots.txt instructions you can still have N host or even domain names pointing to a single server. This can be avoided in Nutch by setting 'partition.url.mode' and 'fetcher.queue.mode' to 'byIP'. On 16

Re: Nutch 2.0 Help

2010-09-02 Thread Julien Nioche
Hi David, I haven't used the Hbase backend with GORA for quite some time but from what I can remember you'll need the following things : * conf/hbase-site.xml = this should correspond to your local configuration * conf/gora-hbase-mapping.xml = see below * conf/gora.properties = don't think there

Re: nutch 2.0 (trunk)

2010-09-07 Thread Julien Nioche
Hi Faruk, You can either set a lower value for the parameter http.content.limit or modify the mapping and set field name=content column=content jdbc-type=MEDIUMBLOB/ which should work for mysql. See the discussion on http://github.com/enis/gora/issues/closed#issue/48 HTH Julien -- * *Open

Re: Nutch 2.0 Help

2010-09-08 Thread Julien Nioche
Hi guys, I've summarized the steps to follow for having GORA+Hbase with Nutch 2.0 on http://wiki.apache.org/nutch/GORA_HBase Feel free to amend and improve as you see fit. Please bear in mind that Nutch 2.0 is at a very early stage and is far from being bug-proof, see in particular [1]. HTH

Backport to 1.3 (was: Release planning)

2011-01-05 Thread Julien Nioche
on this? Julien On 4 January 2011 21:44, Julien Nioche lists.digitalpeb...@gmail.comwrote: +1 from me. I've committed today a bunch of patches which were in 1.2 but not in 1.3 (just one last one to do) but haven't compared with 2.0 Having a release based on 1.3 would be great as it would be a nice

Re: Nutch Parser annoyingly faulty

2011-03-04 Thread Julien Nioche
Hi Jurgen, Since I wrote this email - which I thought got ignored by the Nutch developers - Thanks for reporting the problem Jurgen. and sorry that you felt you were being ignored. The few active developers Nutch has contribute during their spare time, the reason why you did not get any

Re: Build failed in Jenkins: Nutch-trunk #1433

2011-03-22 Thread Julien Nioche
On 22 March 2011 04:15, Kirby Bohling kirby.bohl...@gmail.com wrote: Is there some reason this is allowed to continue to build if nobody is going to actually get it to build successfully? I am assuming this has something to do with the Ivy resolution of the Gora library that isn't publicly

http://wiki.apache.org/nutch/Tutorial%20on%20incremental%20crawling

2011-03-27 Thread Julien Nioche
Gabriele, I think it is a good idea to have a script like this however your proposal could be improved. It currently works only on a single machine and uses commands such as mv, ls etc... which won't work on a pseudo or fully distributed cluster. You should use the 'hadoop fs' commands instead.

Re: All solr* commands fail in 1.3

2011-04-08 Thread Julien Nioche
See http://www.slf4j.org/faq.html#IllegalAccessError This error is caused by the static initilizer of the LoggerFactory class attempting to directly access the SINGLETON field of org.slf4j.impl.StaticLoggerBinder. While this was allowed in SLF4J 1.5.5 and earlier, in 1.5.6 and later the

Re: GORA dependency and build failures

2011-04-08 Thread Julien Nioche
Yep. 0.1 has been released and the artifacts should be available soon On Friday, 8 April 2011, Otis Gospodnetic ogjunk-nu...@yahoo.com wrote: Hi, Just curious - is the plan to wait for the GORA 0.1 release to get published somewhere (not familiar with Ivy, so I'm not sure where things need to

Re: Nutch' pom.xml

2011-04-12 Thread Julien Nioche
Someone suggested that we used an ant task to generate the pom from the Ivy files. This would be far a cleaner option then having to keep this bl***d pom.xml file in sync all the time On 12 April 2011 15:11, Markus Jelsma markus.jel...@openindex.io wrote: Hi guys, I found out that pom.xml

Re: Nutch' pom.xml

2011-04-12 Thread Julien Nioche
/makepom.html) and remove the pom.xml from SVN? Is there anything in that pom.xml that wouldn't be generated by makepom? J. On 12 April 2011 15:24, Julien Nioche lists.digitalpeb...@gmail.com wrote: Someone suggested that we used an ant task to generate the pom from the Ivy files. This would

Re: Nutch 1.3 release

2011-04-14 Thread Julien Nioche
://digitalpebble.blogspot.com/ http://www.digitalpebble.com On 14 April 2011 08:55, Julien Nioche lists.digitalpeb...@gmail.com wrote: There has been a large number of substantial changes with 1.3 (search delegated to SOLR, separation between local and distributed runtimes, ) and we'll need to reflect

Re: [VOTE] Apache Nutch 1.3 Release Candidate #1

2011-04-24 Thread Julien Nioche
Hi Chris, Thanks for the RC. I think we should fix the 2 issues below. https://issues.apache.org/jira/browse/NUTCH-985 : bug with lastModifiedDate https://issues.apache.org/jira/browse/NUTCH-983 : port SOLRJ to 3.1 I expect many users would use the latest version of SOLR so we might as well

Re: Precopy http.agent properties to nutch-site

2011-04-26 Thread Julien Nioche
Hi Markus Any param overridden by the users should be in nutch-site.xml, not just http.agent, so why make an exception for it? Moreover that will not necessarily prevent people from using nutch-default.xml Maybe we could set nutch-default to readonly? Could be changed by the user but this might

Re: SolrDedup doesn't commit

2011-04-27 Thread Julien Nioche
Hi Markus We might as well do it properly and commit in the same way as index and clean do. Thanks for all your excellent work BTW Julien On 27 April 2011 15:16, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Title says it all. The job doesn't send a commit while index and clean do.

Re: 1.3 RC2?

2011-04-30 Thread Julien Nioche
Hi Chris, I don't think we have finished with the dates and update of SOLR to 3.1 yet. I'll also try to do NUTCH-888https://issues.apache.org/jira/browse/NUTCH-888in the next couple of days. Thanks Julien On 30 April 2011 05:20, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote:

Re: svn commit: r1099483 - in /nutch/branches/branch-1.3: ./ conf/ src/plugin/ src/plugin/parse-rss/ src/plugin/parse-tika/ src/plugin/parse-tika/sample/ src/plugin/parse-tika/src/test/org/apache/nutc

2011-05-04 Thread Julien Nioche
;-) On 4 May 2011 16:26, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Awww, sniffbye parse-rss! On May 4, 2011, at 11:20 AM, jnio...@apache.org jnio...@apache.org wrote: Author: jnioche Date: Wed May 4 15:20:00 2011 New Revision: 1099483 URL:

Re: Usefulness of cache field

2011-05-08 Thread Julien Nioche
Would need to check in the code but I think that this field is used for storing the value of the meta tags cache-control. Since we don't do caching anymore since delegating to SOLR, this is not really useful but could be again the future. Let's leave it as is for now and document what the field

Re: found a nutch bug

2011-05-09 Thread Julien Nioche
Hi Could you please open a JIRA with a description of the problem and attach a patch generated against the branch-1.3 with 'svn diff'? Thanks 2011/5/9 ldk_5370 ldk_5...@163.com hi, I found a bug about calss org.apache.nutch.protocol.http.HttpResponse, HttpResponse can not got all html

Re: Update schema to get solrdedup working again

2011-05-11 Thread Julien Nioche
everywhere then format it properly in the SOLRWriter. We could of course to the latter now, but since I have no time to do it in the short time and don't want to twist your arm I'll let you decide On Thursday 05 May 2011 15:34:56 Julien Nioche wrote: Hi Markus, Sorry for the late reply

Collecting Nutch use cases for talk @BerlinBuzzwords

2011-05-16 Thread Julien Nioche
Hi, The title says it all. I'm searching for interesting use cases for my Nutch talk at Berlin. Do you use Nutch in an interesting way or on a particularly large scale? If you think your use case could be a good illustration of what Nutch does, please get in touch and I'll happily include it in

Re: 1.3 RC2?

2011-05-21 Thread Julien Nioche
? Ready for RC2 on 1.3? Got some free time tonight and in the releasing mood :-) Cheers, Chris On Apr 30, 2011, at 9:41 AM, Julien Nioche wrote: Hi Chris, I don't think we have finished with the dates and update of SOLR to 3.1 yet. I'll also try to do NUTCH-888 in the next couple of days

Re: 1.3 RC2?

2011-05-24 Thread Julien Nioche
. Thanks Jul On 21 May 2011 03:51, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Guys, WDYT? Ready for RC2 on 1.3? Got some free time tonight and in the releasing mood :-) Cheers, Chris On Apr 30, 2011, at 9:41 AM, Julien Nioche wrote: Hi

Re: Nutch bug - assumption of HDFS in CrawlDb.java even if using other file systems like S3

2011-05-25 Thread Julien Nioche
Viksit, Please check if this has already been reported on the JIRA and if not open a new issue (for 2.0) Thanks Julien On 25 May 2011 19:02, Viksit Gaur vik.list.nu...@gmail.com wrote: [Cross posting since this might be more relevant here.] -- Hi all, Trying to run nutch on Elastic

Re: [RESULT] [VOTE] Apache Nutch 1.3 Release Candidate #3

2011-06-08 Thread Julien Nioche
Mattmann Markus Jelsma Julien Nioche Lewis John McGibbney I'll go ahead and push the release to the mirrors and release the Maven repo to Central and then send an ANNOUNCE. Thanks! Cheers, Chris ++ Chris Mattmann, Ph.D

new branch 1.4 and possible features

2011-06-10 Thread Julien Nioche
Guys, I added a new label 1.4 on the JIRA. Shall we create a new branch 1.4 on SVN from the existing 1.3? I agree that it is a pain to have to maintain 1.x AND trunk in parallel but my feeling is that 2.0 needs more work before being completely reliable and in the meantime we might want to add

Re: Please remove me from the mailing list

2011-06-12 Thread Julien Nioche
http://nutch.apache.org/mailing_lists.html - dev-unsubscr...@nutch.apache.org On 12 June 2011 14:33, Tolga Soyata tolgasoy...@gmail.com wrote: Please remove me from the mailing list -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/

Re: Bug-fix for Nutch 1.3 with solrdedup

2011-06-13 Thread Julien Nioche
Hi, Please open a new issue on https://issues.apache.org/jira/browse/NUTCH Thanks Julien On 13 June 2011 04:20, Yavinty yavi...@gmail.com wrote: Hello, I have a bug-fix for Nutch 1.3 (solrdedup throwing NullPointerException), where do I submit it? Thanks. -- * *Open Source

Re: new branch 1.4 and possible features

2011-06-13 Thread Julien Nioche
Guys, I've created a new branch for 1.4 on * https://svn.apache.org/repos/asf/nutch/branches/branch-1.4 * Thanks Jul On 10 June 2011 12:11, Markus Jelsma markus.jel...@openindex.io wrote: Guys, I added a new label 1.4 on the JIRA. Shall we create a new branch 1.4 on SVN from the

Re: new branch 1.4 and possible features

2011-06-13 Thread Julien Nioche
Hi, [...] Yes indeed. I see that Gora is still in incubation and I have not been using trunk for sometime as it has been broken due to Gora dependencies? I think this suggestion is the only sensible way to continue. As I have not been using trunk, what is the current situation with this?

Re: Nutch 2.0 roadmap

2011-07-04 Thread Julien Nioche
Hi Lewis, Currently the slightly (in places) dated roadmap can be found here [1], I was wondering if we could give this an overhaul/update as it would give a more robust overview of where trunk is going. Most of the points you make are still in development, however some have been achieved and

Re: Rebuilding site

2011-07-07 Thread Julien Nioche
Hi Lewis, As I am back home I propose to rebuild the site to link the current tutorial link to the new 1.3 tutorial on the wiki. I would also like to formally make my first committ by adding my name to the list of committers before I progress with other bits and pieces. Good idea! See

Re: [Nutch Wiki] Update of NutchTutorial by JulienNioche

2011-07-12 Thread Julien Nioche
http://nutch.apache.org/mailing_lists.html Hey, please delete my E-Mail address from your mailing list or whatever. I receive more than 50 mails every day. Bye -- Marcel Schubert Auszubildener TU ClausthalE-Mail: schub...@rz.tu-clausthal.de Rechenzentrum

Re: Real-time Solr integration

2011-07-12 Thread Julien Nioche
Hi Matthew, This is usually achieved by writing a script containing the individual Nutch commands (as opposed to calling 'nutch crawl') and index at the end of a generate-fetch-parse-update-linkdb sequence. You don't need any plugins for that HTH Julien On 12 July 2011 13:35, Matthew Painter

Re: Real-time Solr integration

2011-07-14 Thread Julien Nioche
%20incremental%20script On Tue, Jul 12, 2011 at 2:15 PM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Hi Matthew, This is usually achieved by writing a script containing the individual Nutch commands (as opposed to calling 'nutch crawl') and index at the end

Re: Normalize and filter hyperlinks during parse

2011-07-14 Thread Julien Nioche
Are you sure we don't we already filter and normalize at the end of the parse? (not in front of code - sorry can't check) On 14 July 2011 16:37, Markus Jelsma markus.jel...@openindex.io wrote: Hi, If we filter and normalize hyperlinks in the parse job, we wouldn't have to filter and

Re: HTTPS support

2011-07-14 Thread Julien Nioche
http://www.google.co.uk/search?q=nutch+mailing+list - 1st result On 14 July 2011 16:50, Zanzico Gioele gioele.zanz...@vitecgroup.com wrote: how can i be deleted from this mailing list pls ? tks ciao gioele Gioele Zanzico Senior Web Analyst Vitec Group Imaging Staging Division Direct

Re: Normalize and filter hyperlinks during parse

2011-07-15 Thread Julien Nioche
updated the db and it worked. Now i have two urls. not clear. Was there only one outlink in that seed? Did the filtering work or not? More thoughts? :) On Thursday 14 July 2011 18:31:07 Julien Nioche wrote: Are you sure we don't we already filter and normalize at the end of the parse

Re: Real-time Solr integration

2011-07-15 Thread Julien Nioche
On Thursday 14 July 2011 15:03:34 Julien Nioche wrote: Have been thinking about this again. We could make so that the indexer does not necessarily require a linkDB : some people are not particularly interested in getting the anchors. At the moment you have to have a linkDB

Re: adding details to mvn.template?

2011-07-17 Thread Julien Nioche
Please excuse (and correct) my ignorance, but I need to clear this one up so I understand correctly. The purpose the mvn.template file serves is so we can specify exactly who can commit a Nutch maven pom. The pom in turn specifies the build dirs e.g. source dir as well as test dir. Then finally

Re: Automaton improvements

2011-07-25 Thread Julien Nioche
make Nutch require Lucene as a dependency -- this would provide more stable updates. Dawid On Mon, Jul 25, 2011 at 10:35 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Hi Kirby, Thanks for sharing this. It is definitely relevant for Nutch and I am sure that there would be quite

Re: (NUTCH-1071) Crawldb update to total counts per status

2011-07-29 Thread Julien Nioche
- Key: NUTCH-1071 URL: https://issues.apache.org/jira/browse/NUTCH-1071 Project: Nutch Issue Type: Improvement Affects Versions: 1.4 Reporter: Julien Nioche Assignee: Julien Nioche

Re: (NUTCH-1071) Crawldb update to total counts per status

2011-07-29 Thread Julien Nioche
Markus, Have just committed a change to CrawlDBReducer (rev 1152254) see line 155 - reporter.getCounter(CrawlDB status, CrawlDatum.getStatusName(* old*.getStatus())).increment(1); was using the wrong object :-( Would you mind giving it a try? Thanks Julien

Re: Nutch 2 LinkAnalysisScoringFilter

2011-08-03 Thread Julien Nioche
nope, see https://issues.apache.org/jira/browse/NUTCH-875 On 3 August 2011 01:09, Tom Davidson tdavid...@covario.com wrote: Does the LinkAnalysisScoringFilter in Nutch 2 work? -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/

Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Julien Nioche
Hi Kirby, Grumble, Grumble. (adding dev@nutch, as that is more than likely where this discussion really belongs)... am adding gora-...@incubator.apache.org as well It'd be really nice if folks could just follow the commands in the nightly build, and get a build pushed out. I've pointed

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-10 Thread Julien Nioche
Hi Tom, I have been using Nutch 1.x for the last 9 months or so and it works well for large scale crawls up to around a billion pages. However, the inherent lack of random access in HDFS really starts to become a burden on our hadoop cluster when going through the whole

Re: Nutch 2.0 DOAP

2011-08-10 Thread Julien Nioche
That's great, thanks! On 10 August 2011 14:58, lewis john mcgibbney lewis.mcgibb...@gmail.comwrote: Hi, Just for information purposes, I committed our DOAP which can now be found under trunk svn. I have been informed by site-dev@ that the system they use oes not support more than one doap

Re: Unreleased Gora dependencies in Nutch Trunk build

2011-08-15 Thread Julien Nioche
I must be missing something here but how do you plan to get the nightly builds to compile without declaring Gora as a dependency in Ivy? Will you put a hard copy of the jars? The public artefacts for Gora 0.1.incubating are incorrect, as for 0.1.1 they have not been published yet - in a nutshell

Re: The crawl command, keep or get rid of

2011-08-23 Thread Julien Nioche
+1 let's replace it with a shell script instead. On 22 August 2011 21:56, Markus Jelsma markus.jel...@openindex.io wrote: Hi, The crawl command seems to add a lot of confusion. It hides the entire crawl cycle logic from new users, leading to questions, lack of understanding of basic Nutch

Re: The crawl command, keep or get rid of

2011-08-23 Thread Julien Nioche
Julien Is an immediate crawl-with-one-command a desired feature? Provided as Java code or shell script? On Tuesday 23 August 2011 10:12:57 Julien Nioche wrote: +1 let's replace it with a shell script instead. On 22 August 2011 21:56, Markus Jelsma markus.jel...@openindex.io wrote: Hi

Re: Patch für httpResponse

2011-08-23 Thread Julien Nioche
Simone, Would you mind opening a JIRA for this and attach your patch + grant it to ASF? I know it is fairly small but it makes it easier to track the progress, link to svn commits, etc... Thanks Julien On 23 August 2011 07:53, Simone Frenzel psimon...@googlemail.com wrote: --

Re: how to use Nutch 1.3 as a single job jar on newer Hadoop releases

2011-08-24 Thread Julien Nioche
Make sure you specify the params in runtime/deploy/conf unless you rebuild the job file with 'ant job' On 24 August 2011 16:09, Ferdy Galema ferdy.gal...@kalooga.com wrote: Hi, Compiling Nutch 1.3 with patch NUTCH-993 (newest patch) and configuring mapreduce.job.jar.unpack.**pattern and

Re: Why URLNormalizer doesn't implement the Pluggable?

2011-08-26 Thread Julien Nioche
Resending your messages every hour won't get you more answers - at the opposite On 26 August 2011 09:28, Kaiwii Ho kaiwi...@gmail.com wrote: I'm a freshman learning about the nutch. Here,I have serval questions: 1、URLNormalizer is a kind of a ExtensionPoint.But why does it implement the

Re: Jenkins build is back to normal : Nutch-branch-1.4 #7

2011-09-16 Thread Julien Nioche
Thanks Lewis, that's great! On 16 September 2011 12:20, lewis john mcgibbney lewis.mcgibb...@gmail.comwrote: Branch 1.4 build set up and 'should' be running succesfully from now on. This will also auto update any JIRA issues which have been committed with some Jenkins commentary. At least

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-09-16 Thread Julien Nioche
Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall we reduce the various options described before to a single one? Julien On 15 September 2011 19:55, Markus Jelsma markus.jel...@openindex.iowrote: Hi Guys, I thought I'd chime in on this thread. My comments below:

[VOTE] Move 2.0 out of trunk

2011-09-18 Thread Julien Nioche
Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The arguments for / against can be found in the thread I mentioned. The

Re: [VOTE] Move 2.0 out of trunk

2011-09-19 Thread Julien Nioche
Here is my vote : +1 : Shelve 2.0 and move 1.4 to trunk Julien On 18 September 2011 10:21, Julien Nioche lists.digitalpeb...@gmail.comwrote: Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk

Re: [VOTE] Move 2.0 out of trunk

2011-09-19 Thread Julien Nioche
wanted to do that in 2.0. Again, if people want to get involved and improve it they will be able to do so. Thanks Julien On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Here is my vote : +1 : Shelve 2.0 and move 1.4 to trunk Julien On 18 September

[RESULT] [VOTE] Move 2.0 out of trunk

2011-09-21 Thread Julien Nioche
Hi Folks, Okey dok, this VOTE has passed with the following tallies: +1 PMC Markus Jelsma Sami Siren Chris Mattmann Lewis John McGibbney Dennis Kubes Julien Nioche Andrzej Bialecki -1 PMC Alexis de Tréglodé -1 Community Radim Kola Accordingly we will move the current Nutch trunk to a bew

Re: Extension of NUTCH-585 - blacklist whitelist plugin

2011-09-21 Thread Julien Nioche
Elisabeth, Great. Could you attach your patch to the original issue in JIRA instead and check the box : Grant license to ASF for inclusion in ASF works? Julien On 21 September 2011 16:47, Elisabeth Adler elisabeth.ad...@gmail.comwrote: Hi, Based on the suggestions/code from

Re: [RESULT] [VOTE] Move 2.0 out of trunk

2011-09-22 Thread Julien Nioche
+1 thanks Chris On 22 September 2011 04:12, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Guys, If no one objects, I will execute the move Friday by 12pm PDT. Will that work? Cheers, Chris On Sep 21, 2011, at 3:09 AM, Julien Nioche wrote: Hi Folks, Okey dok

Re: Nutch site documentation

2011-09-23 Thread Julien Nioche
Can someone please tell me how changes to https://svn.apache.org/repos/asf/nutch/site/ are populated to actually update our site. My suspicions are that the URL gets 'svn up' on people.apache.org to publish our website, however I wish I could confirm this. IIRC it uses SVNpubsub The

Re: [NOTICE] Nutch trunk is now 1.4-snapshot and Nutch 2.0 trunk is now the Nutch Gora branch

2011-09-24 Thread Julien Nioche
Thanks Chris! On 24 September 2011 01:36, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Okey dok, the news item is now published. Let the dev'ing commence! Cheers, Chris On Sep 23, 2011, at 4:57 PM, Mattmann, Chris A (388J) wrote: Hi Folks, Per:

Re: Providing a list of FAQ's with every new subscribe request

2011-09-26 Thread Julien Nioche
We don't have moderators for the user and dev lists On 26 September 2011 20:09, lewis john mcgibbney lewis.mcgibb...@gmail.comwrote: Thanks Markus, Who is mailing list moderator? If I can get this info before trying to contact infra it would be great. On Mon, Sep 26, 2011 at 7:37 PM,

Re: Nutchgora Jenkins CI builds

2011-09-26 Thread Julien Nioche
You are welcome. Thank you for all your work! On 26 September 2011 18:47, lewis john mcgibbney lewis.mcgibb...@gmail.comwrote: Hi , As per Julian's recent commit to include correct gora artefacts I have established a new build [1] for nutchgora branch development. We have some issues with

Re: Providing a list of FAQ's with every new subscribe request

2011-09-27 Thread Julien Nioche
would like to step down from the moderator status and have someone else do moderation instead, because frankly I have not been doing a great job with it. Any volunteers? -- Sami Siren On Tue, Sep 27, 2011 at 12:09 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: We don't have

Re: Prepare for 1.4 release?

2011-09-28 Thread Julien Nioche
+1 have created a 1.5 version in JIRA. Thanks Julien On 27 September 2011 22:01, Markus Jelsma markus.jel...@openindex.iowrote: Hi, There are some bad issues in 1.3 that are fixed in early 1.4 revisions. Also, 1.4 has some nice improvements and new features. I know some would like to

Re: Prepare for 1.4 release?

2011-09-29 Thread Julien Nioche
: yes +1 Thanks for bringing this up Markus. I would like to get NUTCH-1078 sorted out ASAP. However I'll comment on that issue separately. On Wed, Sep 28, 2011 at 9:46 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: +1 have created a 1.5 version in JIRA. Thanks Julien On 27

[ANNOUNCEMENT] Ferdy Galema is a Nutch committer and PMC member

2011-10-28 Thread Julien Nioche
Hi, A while back the NUTCH PMC nominated Ferdy Galema for Nutch committership and PMC membership. The VOTE tallies in Nutch PMC have occurred and I'm happy to announce that Ferdy is now a Nutch committer. Ferdy, feel free to say a little bit about yourself. Your account has been created and you

Re: Nutch Maven build

2011-10-31 Thread Julien Nioche
Guys, I have probably missed a discussion on this lately but I really don't remember that we'd decided to move from ANT+IVY. We've had numerous discussions on this in the past, all leading to the conclusion that maintaining two systems is a bad idea. Have I missed something? Jul PS: If we had

ANT+MAVEN (was: Nutch Maven build)

2011-10-31 Thread Julien Nioche
+MAVEN at all? Jul Thanks! Cheers, Chris On 31 October 2011 15:39, Markus Jelsma markus.jel...@openindex.io wrote: This was the thing, isn't it? https://issues.apache.org/jira/browse/NUTCH-995 On Monday 31 October 2011 16:28:18 Julien Nioche wrote: Guys, I have probably

Re: Nutch Maven build

2011-10-31 Thread Julien Nioche
, Oct 31, 2011 at 4:38 PM, Julien Nioche lists.digitalpeb...@gmail.com wrote: I was under the impression that to publish Nutch artefacts to maven repo we need to have a working pom.xml? Is this correct? This was all I was referring to. OK, sorry I now understand yes we do. It should

Re: [VOTE] Apache Nutch 1.4 release rc #1

2011-11-07 Thread Julien Nioche
Thanks Chris, * it would be good to have the same folder name for the src and bin versions. They are currently 'nutch-1.4' and 'apache-nutch-1.4' * do we really need to include the KEYS file? * bin version contains pom.xml, src version does not. Either include in both or remove altogether * What

Re: Using Nutch within Eclipse

2011-11-08 Thread Julien Nioche
I do use Eclipse for editing the code but build the jars/jobs with ANT. I use IVYDE for managing the dependencies On 7 November 2011 23:23, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi guys, Can anyone inform whether they are using Nutch trunk within Eclipse? Thanks Lewis --

Re: [VOTE] Apache Nutch 1.4 release rc #1

2011-11-08 Thread Julien Nioche
. About the runtime/local thing, I think we can do that for 1.5, but I am totally +1 for it. OK for 1.5 Thanks a lot Julien Let me know what you think. Thanks! Cheers, Chris On Nov 7, 2011, at 7:59 AM, Julien Nioche wrote: Thanks Chris, * it would be good to have the same

Re: [VOTE] Apache Nutch 1.4 release rc #1

2011-11-16 Thread Julien Nioche
, Julien Nioche wrote: Hi Chris Thanks for the review. Would you consider the below blockers, or would-be-nice-to-fix? If none are blockers I propose fixing them in 1.5 and pushing 1.4. Thoughts? see below I agree on the naming, sorry for the screw-up. no probs. Do you

Re: Fw: Lewis John McGibbney sent a message via SimilarPages – A web discovery and search add-on

2011-11-17 Thread Julien Nioche
We (DigitalPebble) managed the crawl for them and wrote the custom bits they required. The problems they mentioned were more related to EC2 than Hadoop as such. More on http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html Jul On 17 November 2011 16:57, Lewis John Mcgibbney

Re: nutch and openJDK 1.6 for fedora

2011-11-28 Thread Julien Nioche
Hi Alexander, Which version of OpenJDK is it? I have Nutch running on one of my servers with *java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)* and I don't have any problems compiling Julien

Re: Build failed in Jenkins: Nutch-trunk #1714

2012-01-04 Thread Julien Nioche
guys, any idea as to why this is not compiling anymore? J. On 4 January 2012 04:20, Apache Jenkins Server jenk...@builds.apache.orgwrote: See https://builds.apache.org/job/Nutch-trunk/1714/ -- [...truncated 2386 lines...] resolve-default:

Re: Build failed in Jenkins: Nutch-trunk #1714

2012-01-04 Thread Julien Nioche
January 2012 10:52:08 Julien Nioche wrote: guys, any idea as to why this is not compiling anymore? J. On 4 January 2012 04:20, Apache Jenkins Server jenk...@builds.apache.orgwrote: See https://builds.apache.org/job/Nutch-trunk/1714

Re: Build failed in Jenkins: Nutch-trunk #1714

2012-01-04 Thread Julien Nioche
Note : the latest stable is 1215090 i.e. things started to get bad when moving to hadoop 0.22 (rev 1220786). On 4 January 2012 16:45, Julien Nioche lists.digitalpeb...@gmail.comwrote: The problem is not with the urlfilter package as such but with the fact that the junit jar is removed from

Re: Build failed in Jenkins: Nutch-trunk #1714

2012-01-04 Thread Julien Nioche
). that's novelty to me - do we know what causes them to fail? Any hints? Note : the latest stable is 1215090 i.e. things started to get bad when moving to hadoop 0.22 (rev 1220786). On 4 January 2012 16:45, Julien Nioche lists.digitalpeb...@gmail.com wrote: The problem

Re: I want to volunteer some time

2012-01-17 Thread Julien Nioche
Hi Eddie, Great to hear that! Just to add to what Markus said there are also quite a few tasks to do on the NutchGora branch if that's something you'd be interested in. Or outside the tasks on JIRA, there is always a fair bit to do on the Wiki e.g. how to run in distributed mode etc... Just out

Re: I want to volunteer some time

2012-01-18 Thread Julien Nioche
Hi Eddie, * I've also re-created the lucene index plugin as part of our plugin, as we don't use Solr, but our own search application. * One task you could be interested in is to make the indexing backends pluggable. See https://issues.apache.org/jira/browse/NUTCH-1047 / for details. This would

  1   2   3   4   5   6   7   8   9   10   >