[ 
https://issues.apache.org/jira/browse/BIGTOP-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463579#comment-13463579
 ] 

Erich Schubert commented on BIGTOP-713:
---------------------------------------

bq. Hadoop 0.23.3

I was under the impression that 0.23.3 is the current hadoop release. The 
version numbering of hadoop is a mess. If you read the changelog for 
2.0.0-alpha, the first line identifies it as 0.23.1 - same for 2.0.1-alpha. So 
I was under the assumption that 0.23.3 - the latest release, 2 months after 
2.0.1-alpha - was actually the newest version. Just that nobody rebranded it as 
2.0.2-alpha or so. And upstream subversion uses 3.0.0 everywhere IIRC.

bq. patching

Debian in would love to also have 0 patches. However, if you want to get a bug 
fixed quickly for your users, it often is best to fix it, make a patch, send 
that out to user users for testin and upstream for inclusion. Debian changelogs 
are full of entries like "remove patches ..., included upstream" (and also 
patches that were solved differently by upstream). But in fact the compile fix 
I mentioned - fixed in Hadoop SVN the same way - is a good example for the need 
of patching. It won't compile otherwise, so with a 0 patching policy this means 
you cannot build Hadoop Pipes on current Debian, because it has a too new GCC.

bq. conflicting library versions

Again, this is a problem that does not only affect Hadoop. In my personal 
opinion, it is a consequence of how dependencies are handled in the Java 
community. You leave it to your users (and maven) to get all the jars you need. 
If people would care more about having to use one system to manage the 
dependencies for all of the java software they use, they would be more aware of 
such conflicts. And of course they also occur with binary libaries. It is 
common for distributions to take care of this, and they will also try to offer 
multiple version of a library when incompatible.
And in some cases, it is easiest to patch (or recompile, for binary packages) 
some dependant software, to only have to provide one version of a library.

Debian java packaging already manages a symlink farm of the type:
xml-apis-ext.jar -> xml-apis-ext-1.4.01.jar

So packages can use "any version of xml-apis-ext.jar", for example. For 
explicit version dependencies, you would have a versioned depend on the 
package, obviously. Most of the version dependencies are a ">= x.y" type, a few 
are of the type "< z" (when e.g. an API changes for a major version).

When it is known that a package breaks API compatibility, the distributions 
should take care to make them installable at the same time. For example GNU 
trove 2 and GNU trove 3 are not API compatible. Debian ships them as "trove.jar 
-> trove-2.x.y.jar" and "trove-3.jar -> trove-3.0.3.jar" symlinks. So far, the 
packages depending on trove 2 or trove 3 continue to work...

.bq java-wrappers

I believe they allow apps to specify e.g. "java6", "java7" and the wrappers may 
choose a different java runtime than the system default.

A typical java wrappers script looks like this:
{noformat}
#!/bin/sh
. /usr/lib/java-wrappers/java-wrappers.sh
find_java_runtime openjdk6 sun6
find_jars app batik fop
run_java mainclass "$@"
{noformat}
Where find_jars will take care of setting up the classpath. I havn't looked 
into the details of how you would specify a versioned requirement. With trove, 
you would use trove-3.jar. Furthermore, many jars may already include other 
jars - when in the system folder, so it actually works well with the debian 
installed jars - via Class-path attribute in the manifest. Ideally, jars in 
Debian are packaged with such dependencies. For example fop.jar specifies 
commons-io.jar xercesImpl.jar xalan2.jar etc. Above example could even be 
simplified: batik is needed by fop, so we could leave it away.
Debian also ships some projects split into numerous smaller jar files. Batik is 
a good example. There is batik-all containing all of batik, but there are also 
smaller jars containing e.g. only the parser. So that a project that tries to 
reduce memory requirements can also just load that part of the batik into the 
classpath that is needed.

It's probably not perfect - the debian java team seems to be a bit 
underpowered, as so often (they for sure currently do not have the power to do 
Hadoop packages) - but they do seem to work on a manageable java ecosystem. 
Often such infrastructure things need some users to spread across 
distributions. I don't know what redhat has for managing java, maybe more, 
maybe less. The "alternatives" thing is a good example of infrastructure 
utilities adopted across distributions over time. Quoting from an internet page:

.bq Fedora's implementation of alternatives is a rewrite and extension of the 
alternatives system used in Debian.

So if java-wrappers are useful for Bigtop, it may be very manageable to have 
them adopted by the Fedora ecosystem; while bigtop-utils is not yet adopted by 
either I guess.
                
> use newer debhelper and source format 3.0 (quilt) for Debian and Ubuntu 
> packaging
> ---------------------------------------------------------------------------------
>
>                 Key: BIGTOP-713
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-713
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: Debian
>    Affects Versions: 0.5.0
>            Reporter: Erich Schubert
>            Assignee: Roman Shaposhnik
>            Priority: Minor
>
> debhelper can automate a lot of common things in debian package creation.
> The current packages use an old style of debhelper, that often is 
> unnecessarily complicated, making it harder to fix things.
> For example, current Hadoop (0.23.3) does not compile on Debian because of 
> the new GCC version. The fix is a simple "include <unistd.h>" in the 
> HadoopPipes.cc file.
> Modern Debian packaging with "quilt" has an excellent mechanism for managing 
> such patches. However, in order to use this with the current Bigtop 
> packaging, one has to 1. create debian/source/format to use "3.0 (quilt)" 2. 
> manually add quilt patching to the debian/rules targets. 3. making sure the 
> .debian.tar.gz is also copied instead of the old .diff.gz
> You will be surprised how many things debhelper does well on its own with a 
> rules file consisting just of little more than the automagic:
> %:
>         dh $@
> Furthermore, "java-wrappers" is a Debian and Ubuntu package that helps with 
> setting up classpaths and choosing the JVM. It can do all of bigtop-utils and 
> more, and it is used by other Java packages. IMHO it should be preferred 
> instead.
> If the packaging would be more Debian-standard, it would be alot easier to 
> get the packages at some point accepted into Debian mainline. It may even be 
> desirable to build the various hadoop components (-commmon, -yarn etc.) 
> independently if they are isolated well enough upstream.
> Don't get me wrong. I think the packages are pretty good already. In 
> particularly I like the split into namenode and datanode packages and the use 
> of update-alternatives, for example. I just found it rather hard to get a 
> grip of the process and to get my fixes into the package. For example, I had 
> to manually set JAVA_HOME before building, some build dependencies were 
> missing (cmake, but it probably is a new requirement), some paths have 
> changed (probably the yarn promotion to a top level project?)
> I understand that you want to have as much common code for all distributions 
> as possible, as opposed to having per-distribution packaging. However, if 
> every project uses its own specific version of java-wrappers and build 
> process, things will not really be better than if it is at least consistent 
> across the various distributions.
> But ideally, there should be very little packaging code needed anyway, and 
> most things be done by an appropriate installation process upstream.
> And seriously, /usr/lib/hadoop/lib is a **mess**. There even is a package in 
> there with a "*" in the file name. Plus, a lot of these jars are available in 
> Debian, and could be shared across packages if the packages would accept them 
> to be managed by the distribution instead of shipping their own...
> Even within the bigtop packages this leads to a totally unnecessary overlap:
> 995720 Sep 25 14:18 /usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar
> 995720 Sep 25 14:18 /usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar
> [...]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to