Hmmm....interesting.

OK, my one comment would be: why wait? trunk is traditional not guaranteed
to be stable and it seems like you guys have nutchbase *sorta* working
enough that the time is ripe to just switch now. And then you won't further
confuse folks like me that are happy to check out the nutch trunk in
Eclipse, but shudder when I have to manually check out multiple copies of
Nutch as branches, etc. etc.

In other words, my comment is, *let's just switch now*. Before doing so,
let's:

1. tag current trunk as
http://svn.apache.org/repos/asf/nutch/branches/branch-1.3 (EOL'ed won't be
worked on, but nice to save). This way someone doesn't have to remember the
Nutchbase rev # before the Nutchbase branch lands in the trunk.

Then we can:

2. svn remove -m "n-1 before Nutchbase lands."
https://svn.apache.org/repos/asf/nutch/trunk
3. svn copy -m "Nutchbase branch lands in trunk."
https://svn.apache.org/repos/asf/nutch/branches/nutchbase
https://svn.apache.org/repos/asf/nutch/trunk

After doing that, we should also:

4. roll a a 1.2 release, which I would say is the last major 1.x release.
Andrzej and I and others have backported some pretty decent patches in the
past few weeks and it probably makes sense to make a quick release. I'll
happily be the RM for it.

So if 1-4 make sense, let's do 1, 2 and 3 today or tomorrow -- 4 can happen
over the next few weeks. WDYT?

Cheers,
Chris


On 7/21/10 2:26 PM, "Andrzej Bialecki" <[email protected]> wrote:

> Hi all,
> 
> I'd like to discuss what is the best way forward to merging the
> nutchbase code with trunk.
> 
> First some important facts:
> 
> * nutchbase is almost totally API incompatible with Nutch 1.x. While the
> main ideas remain the same, and most of the tools remain as well, their
> implementation is very different (and let me say, much cleaner) than
> that of Nutch 1.x. E.g. while nutchbase uses URLFilters and
> URLNormalizers, and IndexingFilters, etc, their method signatures have
> changed. To give you some idea how deep these changes go, let me say
> that CrawlDatum is gone now.
> 
> * for the last month or so, and I foresee for another month or so,
> Julien, Dogacan, myself and Enis have been working on bringing nutchbase
> (and Gora) as much up-to-date with trunk as possible - in fact, you
> could say we have been merging trunk to nutchbase... The original reason
> for this was that we first wanted to bring nutchbase into a working
> state and then start merging, but also another important reason was the
> one mentioned above - we didn't know how to prepare a meaningful patch
> for trunk that wouldn't replace 90+ % of the code in trunk...
> 
> So, I would like to propose an alternative strategy: we will keep
> merging from trunk to nutchbase, with proper JIRA tracking (I created a
> 'nutchbase' tag in JIRA), and once we reach a state when nutchbase
> offers roughly the same functionality as the code in trunk then we
> simply switch nutchbase with trunk.
> 
> Current status of nutchbase is that the basic tools to implement a
> crawling workflow have been ported and work correctly, and we are able
> to execute a few unit tests on an SQL backend.
> 
> Regarding backwards-compatibility with Nutch 1.x: most config files are
> unchanged, and we should probably offer some data migration tools - I'm
> not sure whether it makes sense to create a segment converter, but we
> can certainly create a CrawlDb converter.
> 
> What do you think? Any comments / suggestions / ideas?
> 
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply via email to