I have experience with Ant, Maven 1.X, and Maven 2.
Maven 1.X builds are a nightmare to maintain - we should not go there.
Fortunately for Maven 2, it's a completely different animal.
When your use case is directly supported by Maven 2, it's a beautiful
thing. As Grant says, it's magic. Like Grant, I've written M2 plugins
and set up some complex builds.
But unless you pin down the versions of plugins you use (currently
possible using <dependencyManagement>) and the exact version of M2 you
use (I don't think this is possible yet), people can get different
results. Maven has never been terribly stable, because it's in a
constant state of change. Maven 2.1 is the current focus of
development, so modifications to 2.0.X tend to take a long time to be
released. This has long been the Maven way: focusing on future
(backwards-incompatible) versions to the detriment of the existing versions.
If we don't use Maven, then we need to have alternative dependency
management and site building facilities, since, unlike Maven, Ant does
not provide support for these. Maybe at the start we can ignore these
two - without code, there isn't much of a site required, and the
dependencies will be fairly static (per algorithm).
I have heard good things about Ivy for dependency management, though
I've never used it. I think it leverages Maven remote repositories.
And Lucene uses Forrest to build its site. Both of these things can be
bolted on later if we start with an Ant build.
I've changed my mind about the project structure: I think it's okay to
start out with a single source tree. If it makes sense to do so later,
splitting algorithms out shouldn't be too hard.
Similarly, I think shipping a monolithic jar is okay to begin with.
Size is definitely not an issue, in the short- and medium-term, anyway.
Summarizing my votes:
Build system:
+0 Maven 2
+1 Ant
Project structure:
+0 Per-algorithm source tree
+1 Single source tree
Release artifact(s):
+0 Per-algorithm jar
+1 Monolithic jar
Steve
Grant Ingersoll wrote:
A couple of comments on various things that have come up (btw, I love
the participation, already!)
1. The structure fits well with Maven or ANT. Personally, I have come
full circle from ANT - Maven - ANT. I have done a lot of ANT building
and a lot of Maven building, including writing plugins/tasks, etc. ANT
is less magic at the cost of a little more upfront work (but it is easy
to setup common build functionality, etc.). Magic in your builds is not
good. Maven updates itself automatically, gets jars automatically,
etc. I know this sounds like a good thing, but it isn't, IMO.
Especially when it comes to the plugins. You have no idea whether
everyone is building on the same base. Maven does not do much to
guarantee back-compatibility, either. On the other hand, the Maven
repository is really nice. And I really like that Maven has convinced
people that using common file structures and conventions is a good thing
in project management. But neither of these things requires Maven
itself. I tend to want to minimize our 3rd party dependencies, anyway,
as much as possible. The simpler we can keep this, the better off we
will be.
2. One other good thing from a infrastructure point of view for the
sub-project structure is we can, in theory, give permission to a
committer on a single algorithm, much like the contrib modules in
Lucene. This isn't a big deal, but it could be useful, if someone is
really knowledgeable in one particular area and is only contributing in
that area. Generally, however, I would favor making someone a full
committer.
3. I do like the idea of both separate jars and a single uber-jar. This
is trivial to do in both ANT and Maven.
-Grant
On Jan 30, 2008, at 3:21 AM, Ted Dunning wrote:
And all of Colt is < 1M.
I would say that it isn't all that likely that the library will get to
more
than a few megs (if that). At that size, it really doesn't matter that
there is a bit of dross along for the ride.
How many here would rather pick and choose pieces out of rapid miner or
weka? Or would you rather just download the comprehensive jar and be
ready
to roll?
I also think that the example of text translation vs spam
categorization is
a bit of a straw man. It is much more likely that these would be
entirely
independent applications that would themselves like to download the
(single)
Mahout jar.
On 1/29/08 11:45 PM, "Isabel Drost" <[EMAIL PROTECTED]>
wrote:
On Wednesday 30 January 2008, Steve Rowe wrote:
On 01/29/2008 at 6:44 PM, Lukas Vlcek wrote:
I would prefer to have an option not to work with whole library but
select only specific algorithms and optionally their particular
modifications.
+1
+1 I would at least like to have one downloadable jar for each algorithm
family (why would I as a user want to download the functionality for
translating texts, if all I want to do is build a better spam
classification
plugin for spam assassin?) plus one library for the common code like
input-/
output-filters.
Maybe we should look at other machine learning frameworks that followed
the "all in one jar" path to get a feeling on how large a project can
easily
get. Please be careful with these numbers, as both projects are
trying to
provide whole machine learning frameworks with GUIs for experimentation,
algorithms for evaluation and the like.
Weka Compiled: 4.4M
Rapid Miner Sources: 12M Compiled: 4.5M (21M including all
dependencies)
Isabel