Inline On Tue, Apr 28, 2015 at 8:51 AM, Andre Kelpe <[email protected]> wrote:
> Hi, > > I am currently learning the ins and outs of bigtop to work on the Cascading > integration (https://issues.apache.org/jira/browse/BIGTOP-1766). I have a > few questions around packaging in bigtop: > > 1) most linux distros have packaging guidelines that should be followed. > Does bigtop follow any set of rules in particular? Is there a linting tool > for spec files etc? > This is distro specific. RedHat family distributions (RHEL, Fedora, Centos, Amazon Linux) offer 'rpmlint'. You can install it and run it by hand. From personal experience if you build deb packages on Ubuntu the package build will run the lintian tool automatically. > 2) Related to 1): Does bigtop require to follow a certain directory layout? > Our tools are currently meant to be untarred and used as is, if bigtop > requires them to be split over the file-system, we will have to work on > that upstream before they can be included. > Yes, broadly speaking we follow the Linux standard base (LSB). A typical package build happens in four steps. We move files around in the third step to make packages look more like LSB. Let me take you through one package as an example: Step 1. Download source tarball from the software release site and expand it. Step 2. do-package-build Here, for example, see what we do for ZooKeeper: https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/do-component-build . We kick off a build of the component's binary artifacts while first normalizing dependency versions according to the release BOM. Step 3. install_<component>.sh Again let's look at the ZK package: https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/zookeeper/install_zookeeper.sh . Here we take the resulting tarball from the component build, expand it, and move the locations of various types of files around to be more LSB-like. Step 4. Native packager Finally we hand off the expanded and munged result from step 3 to the native packager. For ZK, the RPM specfile used is here: https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec . The Debian package control files are here: https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb/zookeeper > 3) I noticed that the packages are build from source instead of re-using > binary releases. Is that a strict requirement or does it just happen to be > that way? For the Cascading integration I was planning on downloading our > binary releases so that bigtop ship with the same bits as our SDK. > We typically build packages from source so we can normalize dependencies. For example, if a given Bigtop release ships with Hadoop 2.6.0 but the Cascading SDK includes 2.5.1 artifacts, this would be ugly at best and broken at worst. > 4) What is your take on packaging standalone libraries? I noticed that most > parts of bigtop are tools in the broader sense. Something one can invoke on > the command line, but there is also a package for apache crunch, which is a > library. What is the reasoning here? Would it make sense to build packages > for libraries in the Cascading eco-system? > > I'm not sure we have anything that amounts to a policy here. Crunch isn't the only case. We package the DataFu library of UDFs for Pig. We package the Phoenix SQL skin add-on for HBase. We also package Tez, which is a YARN application requiring Hadoop, and although it could be useful on its own it's meant to be picked up and used by the Hive and Pig packages. If a champion for a component shows up we will give it a look. We could absolutely build a core Cascading package and then a number of library or add-on packages, if that's how you would like to set things up as champion or maintainer of same. > Thanks for your answers! > > - André > > -- > André Kelpe > [email protected] > http://concurrentinc.com > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
