Hello,
We are currently using a heavily modified version of nutch. The main
reason for this is the fact that we do not only fetch the urls that the
QueueFeeder submits, but also additional resources from urls that are
constructed during parsing. So for example let's say the QueueFeeder
On 2010-07-20 14:30, Ferdy wrote:
Hello,
We are currently using a heavily modified version of nutch. The main
reason for this is the fact that we do not only fetch the urls that the
QueueFeeder submits, but also additional resources from urls that are
constructed during parsing. So for
Now that you mention upgrade solutions from 1.x to 2.0 I suggest that we
open
a JIRA to discuss this. IMHO we probably don't want to keep the 'old'
code in
src/java when we merge but could have the code for the conversion
utilities
and the Nutch 1.x jars in a the contrib/ directory
[
https://issues.apache.org/jira/browse/NUTCH-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-856.
-
Resolution: Fixed
thanks Chris for reviewing and committing TIKA-466. I will mark the issue as
Thanks for your comments Chris
However we still need to address the issue raise by Dogacan i.e shall we
provide tools to convert from 1.x structures to 2.0 and if so how shall
we
organise it. Again - some things have been removed fom NutchBase for the
sake
of clarity but since they are
[
https://issues.apache.org/jira/browse/NUTCH-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Scott Gonyea updated NUTCH-855:
---
Fix Version/s: 2.0
Description:
This plugin is designed to enhance the NUTCH-655 patch, by
6 matches
Mail list logo