Dave Reynolds wrote:
On Tue, 2011-09-06 at 13:06 +0100, Andy Seaborne wrote:
[Great set of thoughts.]
== Goals
[snip]
+ A single download zip file for using Jena as a library
+ A single jar file for using Jena as a library
+1 for goals overall and for the notion of a single jar.
Hi Dave,
just to make sure I understand your position.
Do you mean "a single jar" in addition to separate jars (one for each of the
Jena modules, for example: tdb-x.y.z.jar, arq-x.y.z.jar and maybe new jars
such as riot-x.y.z.jar, atlas-x.y.z.jar and a small jena-core-x.y.z.jar
(exact list yet to be decided))?
Or, do you mean just "a single jar" instead of separate jars (one of each
of the Jena modules)?
We (@ Talis) quite like having different modules and it allows us to easily
patch one module and we would like to avoid including what we do not use
in our class path. (We had troubles with Xerces and Saxon, just to give an
example).
Another scenario is MapReduce. When you write MapReduce jobs you often just
want to parse N-Triples or N-Quads files, so a small jena-core.jar and riot.jar
would do. You don't want to upload ~20MB jar just to parse N-Triples and if
you have an interactive algorithm with a sequence of MapReduce jobs (and you
use Amazon EMR jobs, as sometimes we do) it's a pain to upload the ~20MB each
time.
A similar argument (i.e. better modularization) is what pushed projects such
as Any23 away from Jena in favour of Sesame, for example. I can relate to that
and I understand why.
Again, having a big and monolithic Jena jar (with a dependency on Xerces)
has caused problems to people wanting to run Jena on Android even if they did
not wanted to parse RDF/XML (not something I personally need, but why not to
put Jena in a position where this would be possible and easy?)
Another argument in favor of a few smaller modules is that it's easier for
others to join in, understand a smaller code base and contribute to it.
Other projects (for example Any23 or Clerezza) might benefit as well and/or
use a tiny apache-jena-core.jar or riot.jar and (why not) even contribute to
it.
These are a few reasons why I would not be in favor of "just a single jar"
option.
To me, "a single jar" is good and it's not something I am opposed to, but
in addition to a jar for each of the modules (and we should discuss what
these modules should be).
I'd like (as in, I would put work into this) to also have osgi support.
Suggest:
o the jena-one-jar would also be marked as an OSGI bundle (easy to do
via maven bundle plugin)
We don't use OSGI, but if it's easy to do and it help others, why not?
o a third top level download, jena-complete, which is the jena-one-jar
plus dependent jars (i.e. xerces, slf4j/log4j etc) packaged as an single
jar/OSGI bundle.
The latter is not *necessary* for OSGi support but the way we/I
currently work with OSGi makes it easier to have reasonably chunky
self-contained bundles than do the fine grain dependency management at
the OSGi level.
I am not an expert of OSGi, however a chunky jena-one-jar including all
the third party dependencies is something I would say "unusual" and certainly
something we would not use. I am not so sure who would find that useful.
But, maybe I am wrong.
This can also be used via "java -jar jena-complete.jar ..." to run any
of the command line utilities which would save some support list load on
classpath advice :)
Support cost on classpath issues seems to be not much (maybe it's better
documentation, maybe people are learning how to use Maven, who knows).
However, a single jar is really useful to run command line utilities or
applications such as Fuseki.
So, I do not oppose to this, but I would not like to lose the opportunity
for a better modularization of Jena.
I believe we can have "single jar" and separate modules (and from what you
wrote below) it seems we are on the same page on this.
However, I wanted to double check.
Paolo
== Possible build layout.
OK with going for maven multi-module project structure.
Given comments on the "Build experiments" thread I wonder if we a want
sub-project structure as well. A possible structure might be:
Jena sub-project
modules
JenaSys
IRI
Atlas
RIOT
ARQ
TDB?
Jena (one jar and zip) - just for building integrated artefact
JenaComplete - just for building integrated artefact
Website sub-project
SDB sub-project
Fuseki sub-project
Eyeball sub-project
So sub projects can be independently checked out and the Jena
sub-project can then be hierarchical.
If we went the sub-project route then it might make sense to have TDB as
a separate sub-project as well since that is under more active
development. Depends partly on whether we feel TDB should be part of
jena-one-jar or separate (I noted your "maybe?" on that, have no settled
view myself).
=== Questions and notes.
1/ We currently make some attempt to deliver the test suite in the zip
so people can locally run it to check an installation. From memory, the
only thing this seems to catch is problems running the test suite, not
problems with installation. Maybe it's not worth the effort.
Not worth the effort.
3/ For RDB, I propose creating a maven module and putting the code here
with a dependency of whatever version of Jena it is at the time then
leaving it frozen. Alternatively, zip up the code and dump somewhere in
case anyone wants to port it.
Prefer former (create module then leave frozen) but OK with either.
4/ Shall we leave the documentation out of the build and just have it on
the website?
Yes, except for some minimal "getting started" documentation in the zip
that references the website for details.
5/ Jump to maven 3?
If that risks any increase in maven/m2e unpredictability then no :) but
I've no knowledge of what the specific issues might be.
Dave