Just for reference, I've sorted all the problems I wasx having with my build environment and have updated the tutorial on our wiki. I've also commented on your issue Kirby. Thanks for the pointer.
Lewis On Fri, Nov 11, 2011 at 1:37 PM, Kirby Bohling <[email protected]>wrote: > Lewis, > > https://issues.apache.org/jira/browse/NUTCH-1068 > > That is the issue I filed about the patch (it isn't directly related > to this, but it is related to some potential fixes). > > http://www.mail-archive.com/dev%40nutch.apache.org/msg03419.html > > That's the e-mail thread where I originally mentioned the > modifications to automaton, and the patch with the backport of the > Lucene fixes. > > Kirby > > > On Fri, Nov 11, 2011 at 11:58 AM, Lewis John Mcgibbney > <[email protected]> wrote: > > Excellent Kirby, thanks for this. > > > > The obvious question I guess... where does this leave us with regards to > the > > urlfilter-automation libraries? > > > > For the record as well, can you please provide the Jira you filed, it > would > > be good to know where I can begin with this one. > > > > Thanks > > > > On Thu, Nov 10, 2011 at 10:18 PM, Kirby Bohling <[email protected] > > > > wrote: > >> > >> On Thu, Nov 10, 2011 at 6:14 PM, Lewis John Mcgibbney > >> <[email protected]> wrote: > >> > OK so the required dependencies can be seen below > >> > > >> > - FeedParser <dependency org="net.java.dev.rome" name="rome" > rev="1.0.0" > >> > conf="*->master"/> > >> > - URLAutomationFilter - <dependency org="dk.brics" name="automaton" > >> > rev="???"/> > >> > - SWFParser <dependency org="com.google.gwt" name="gwt-incubator" > >> > rev="2.0.1"/> > >> > - HTMLParser <dependency org="net.sourceforge.nekohtml" > >> > name="nekohtml" > >> > rev="1.9.15"/> > >> > > >> > There is a real nasty hack which would replace the usual ${nutch.root} > >> > with > >> > <include file="../../../ivy/ivy-configurations.xml"/> is possible, > >> > however > >> > this is not how I want to progress. > >> > > >> > I'm also not sure where to find the dk.brics dependency. > >> > >> The Automaton library to the best of my knowledge is not available via > >> Maven's central repo. > >> > >> http://www.brics.dk/automaton/ is the site where you and find it. > >> > >> That's the location of the actual jar. > >> http://www.brics.dk/automaton/automaton.jar > >> > >> In order to get the source you have to submit an e-mail address, but > >> it is all available under the newer BSD/MIT license. > >> > >> I believe all of the functionality actually used by Nutch is in a > >> faster form buried inside the Lucene Util library 4.0 (unreleased last > >> I knew). I believe I filed an JIRA issue about my backport of the > >> Lucene improvements to the library at Julian's request. I have > >> submitted the code to the author, but I'm not sure if he has > >> integrated it. He was short on time when I submitted all of it. > >> > >> It is a nice library, but it isn't very 3rd party user friendly (no > >> bug tracker, no public source repo). > >> > >> Kirby > >> > >> > >> > > >> > Any thoughts? Jira issue? > >> > > >> > Thanks > >> > > >> > On Thu, Nov 10, 2011 at 12:39 AM, Andrzej Bialecki <[email protected]> > >> > wrote: > >> >> > >> >> On 10/11/2011 04:39, Lewis John Mcgibbney wrote: > >> >>> > >> >>> Gets even more strange, both SWFParser and AutomationURLFilter > import > >> >>> additonal depenedencies, however they are not included within thier > >> >>> plugin/ivy/ivy.xml files! > >> >>> > >> >>> Am I missing something here? > >> >> > >> >> Most likely these problems come from the initial porting of a pure > ant > >> >> build to an ant+ivy build. We should determine what deps are really > >> >> needed > >> >> by these plugins, and sanitize the ivy.xml files so that they make > >> >> sense - > >> >> if the existing files can't be untangled we can ditch them and come > up > >> >> with > >> >> new, clean ones. > >> >> > >> >> -- > >> >> Best regards, > >> >> Andrzej Bialecki <>< > >> >> ___. ___ ___ ___ _ _ __________________________________ > >> >> [__ || __|__/|__||\/| Information Retrieval, Semantic Web > >> >> ___|||__|| \| || | Embedded Unix, System Integration > >> >> http://www.sigram.com Contact: info at sigram dot com > >> >> > >> > > >> > > >> > > >> > -- > >> > Lewis > >> > > >> > > > > > > > > > -- > > Lewis > > > > > -- *Lewis*

