Lewis,

https://issues.apache.org/jira/browse/NUTCH-1068

That is the issue I filed about the patch (it isn't directly related
to this, but it is related to some potential fixes).

http://www.mail-archive.com/dev%40nutch.apache.org/msg03419.html

That's the e-mail thread where I originally mentioned the
modifications to automaton, and the patch with the backport of the
Lucene fixes.

Kirby


On Fri, Nov 11, 2011 at 11:58 AM, Lewis John Mcgibbney
<lewis.mcgibb...@gmail.com> wrote:
> Excellent Kirby, thanks for this.
>
> The obvious question I guess... where does this leave us with regards to the
> urlfilter-automation libraries?
>
> For the record as well, can you please provide the Jira you filed, it would
> be good to know where I can begin with this one.
>
> Thanks
>
> On Thu, Nov 10, 2011 at 10:18 PM, Kirby Bohling <kirby.bohl...@gmail.com>
> wrote:
>>
>> On Thu, Nov 10, 2011 at 6:14 PM, Lewis John Mcgibbney
>> <lewis.mcgibb...@gmail.com> wrote:
>> > OK so the required dependencies can be seen below
>> >
>> > - FeedParser <dependency org="net.java.dev.rome" name="rome" rev="1.0.0"
>> > conf="*->master"/>
>> > - URLAutomationFilter - <dependency org="dk.brics" name="automaton"
>> > rev="???"/>
>> > - SWFParser <dependency org="com.google.gwt" name="gwt-incubator"
>> > rev="2.0.1"/>
>> > - HTMLParser   <dependency org="net.sourceforge.nekohtml"
>> > name="nekohtml"
>> > rev="1.9.15"/>
>> >
>> > There is a real nasty hack which would replace the usual ${nutch.root}
>> > with
>> > <include file="../../../ivy/ivy-configurations.xml"/> is possible,
>> > however
>> > this is not how I want to progress.
>> >
>> > I'm also not sure where to find the dk.brics dependency.
>>
>> The Automaton library to the best of my knowledge is not available via
>> Maven's central repo.
>>
>> http://www.brics.dk/automaton/ is the site where you and find it.
>>
>> That's the location of the actual jar.
>> http://www.brics.dk/automaton/automaton.jar
>>
>> In order to get the source you have to submit an e-mail address, but
>> it is all available under the newer BSD/MIT license.
>>
>> I believe all of the functionality actually used by Nutch is in a
>> faster form buried inside the Lucene Util library 4.0 (unreleased last
>> I knew).  I believe I filed an JIRA issue about my backport of the
>> Lucene improvements to the library at Julian's request.  I have
>> submitted the code to the author, but I'm not sure if he has
>> integrated it.  He was short on time when I submitted all of it.
>>
>> It is a nice library, but it isn't very 3rd party user friendly (no
>> bug tracker, no public source repo).
>>
>> Kirby
>>
>>
>> >
>> > Any thoughts? Jira issue?
>> >
>> > Thanks
>> >
>> > On Thu, Nov 10, 2011 at 12:39 AM, Andrzej Bialecki <a...@getopt.org>
>> > wrote:
>> >>
>> >> On 10/11/2011 04:39, Lewis John Mcgibbney wrote:
>> >>>
>> >>> Gets even more strange, both SWFParser and AutomationURLFilter import
>> >>> additonal depenedencies, however they are not included within thier
>> >>> plugin/ivy/ivy.xml files!
>> >>>
>> >>> Am I missing something here?
>> >>
>> >> Most likely these problems come from the initial porting of a pure ant
>> >> build to an ant+ivy build. We should determine what deps are really
>> >> needed
>> >> by these plugins, and sanitize the ivy.xml files so that they make
>> >> sense -
>> >> if the existing files can't be untangled we can ditch them and come up
>> >> with
>> >> new, clean ones.
>> >>
>> >> --
>> >> Best regards,
>> >> Andrzej Bialecki     <><
>> >>  ___. ___ ___ ___ _ _   __________________________________
>> >> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> >> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> >> http://www.sigram.com  Contact: info at sigram dot com
>> >>
>> >
>> >
>> >
>> > --
>> > Lewis
>> >
>> >
>
>
>
> --
> Lewis
>
>

Reply via email to