I explored the "download binaries from Maven" approach for a while on Friday. Here is what I found:
1) There is a Maven plugin that should be able to help us find matching system binaries @ https://github.com/trustin/os-maven-plugin The protobuf-maven-plugin uses this approach to download and run the appropriate protoc binary for your architecture according to https://www.xolstice.org/protobuf-maven-plugin/examples/protoc-artifact.html 2) Stripped binaries from release builds look small enough to be viable to download to run integration tests via Maven in precommit builds, at least in non-bandwidth-constrained environments: $ strip kudu-master $ strip kudu-tserver $ ls -alh total 85M drwxrwxr-x 2 mpercy mpercy 45 Jul 2 12:05 . drwxrwxr-x 3 mpercy mpercy 98 Jun 29 14:56 .. -rwxrwxr-x 1 mpercy mpercy 45M Jul 2 12:05 kudu-master -rwxrwxr-x 1 mpercy mpercy 41M Jul 2 12:05 kudu-tserver 3) Kudu binaries contain many system dependencies related to security as well as the c++ stdlib: $ ldd kudu-tserver linux-vdso.so.1 => (0x00007ffe0c290000) libz.so.1 => /lib64/libz.so.1 (0x00007fde730d5000) libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fde72eab000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde72c8e000) libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007fde729a7000) libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fde725bd000) libssl.so.10 => /lib64/libssl.so.10 (0x00007fde7234e000) libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007fde72131000) libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007fde71ee3000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fde71cde000) librt.so.1 => /lib64/librt.so.1 (0x00007fde71ad6000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fde717cd000) libm.so.6 => /lib64/libm.so.6 (0x00007fde714ca000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fde712b4000) libc.so.6 => /lib64/libc.so.6 (0x00007fde70ef3000) /lib64/ld-linux-x86-64.so.2 (0x00007fde732fe000) libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007fde70cc0000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fde70abc000) libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007fde708ad000) libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fde706a8000) libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fde7048e000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fde70257000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fde7002f000) libfreebl3.so => /lib64/libfreebl3.so (0x00007fde6fe2c000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fde6fbca000) So it's not viable to simply have a linux-x86_64 binary and a darwin-x86_64 binary like protoc does, or even just ubuntu & redhat. We'll likely need a separate binary for every major OS version, i.e. RHEL 6, RHEL 7, trusty, xenial, bionic. I think people running non-LTS builds of Ubuntu, or SUSE or something, would be out of luck. One potential option would be to offer a completely static build that is for testing only and with no intent to ever fix security vulnerabilities. I would have two concerns about that, though: 1) someone could take those binaries and run them for non-testing purposes, and 2) I'm not sure how easy it would be to generate a fully static build, since I don't think the distributions provide static libs for security components in order to discourage people from doing this. Mike On Sat, Jun 30, 2018 at 4:31 AM Tim Robertson <[email protected]> wrote: > > What do you mean by that? > Sorry, poor phrasing - currently the Beam project has the build path with > unit tests (no Docker there) and the project IT environment which can use > Docker. > A binary only approach could potentially be managed without adding a > dependency on Docker - but has other issues summarised below. > > > For Kudu-internal testing I think we could stick to running "kudu > minicluster > Yes. > > > ... external use cases, we could switch that to "docker run > kudu:minicluster:1.7.0" > I think this makes good sense. > > > In summary: > > 1) Fake a Kudu master in Java - difficult unless simplified, not > representative if simplified, code maintenance issue > 2) Mocking the Kudu client - verbose unless only covering simple scenarios > 3) Use mini cluster with binaries - portability challenge of binaries, need > to script caching the binaries / use of some repository, unfamiliar build > tasks with binary handling (unless built to work with something like > maven), possible could see linking problems > 4) Docker - predictable, adds a dependency, existing Kudu images not > "managed" at the moment > > For Beam I think I will put most effort into IT which can use Docker or an > existing cluster and then mock a Java KuduClient for some basic sanity > tests for the build path. > > On Docker: > - to get current versions [e.g. 1] working I found I had to edit > /etc/hosts. I think the mini cluster version with the FakeDNS might avoid > that? > - Kudu docs currently encourage the Cloudera Quickstart VM over Docker > [2,3] > > Do you think the Kudu project could provide an image allowing "docker run > kudu:minicluster:1.x.x" as part of the release cycle? > > Thanks again, > Tim > > [1] https://github.com/MartinWeindel/kudu-docker > [2] https://kudu.apache.org/docs/quickstart.html#quickstart_vm > [3] https://github.com/cloudera/kudu-examples/wiki/Docker-based-tutorial > > On Sat, Jun 30, 2018 at 2:22 AM, Todd Lipcon <[email protected]> > wrote: > > > On Fri, Jun 29, 2018 at 1:23 PM, Tim Robertson < > [email protected]> > > wrote: > > > > > Thanks Mike, Todd - I greatly appreciate the inputs. > > > > > > > How many platforms would need to be supported for it to be viable for > > > Beam? > > > The minimal for it to be considered would probably(!) be ubuntu, > centos, > > > osx. Incidentally it was actually the protobuf approach that make me > > > consider this. > > > > > > > What about depending on a docker container than runs the kudu > > > minicluster in > > > "host" networking mode? > > > I've also pondered this a little but like Attila raises it puts a lot > of > > > burden for other project developers. Mmmm... > > > > > > > What do you mean by that? For Kudu-internal testing I think we could > stick > > to running "kudu minicluster" as is. For external use cases, we could > > switch that to "docker run kudu:minicluster:1.7.0" or whatever, and it > > would auto-download from dockerhub as necessary, right? > > > > > > > > > > Ismaël (Beam PMC) has suggested I stick to mocking given the complexity > > of > > > the things I'm exploring. > > > > > > As another idea: > > > I briefly pondered writing a "FakeKudu Java server" - data held in > > memory, > > > no partitioning, protobuf messaging, handling table metadata, checking > > > schemas on write, predicate and projected columns for scan, faking > > kerberos > > > (if possible). It didn't seem particularly difficult to do but I fear a > > > maintenance burden for a small audience. > > > > > > > > Yea, I think that would be quite a maintenance burden, especially as new > > features are added over time. I suppose in many cases you could omit > things > > or stub things out, but then the behavior will begin to differ and it > won't > > really be that clear that your tests actually are representative. > > > > > > > Could utilities in Kudu that help folk test Java clients be of interest > > to > > > others? - e.g. preconfigured mock objects for various scenarios. If so, > > I'd > > > be happy to discuss options and offer PRs in Kudu. > > > > > > Thanks, > > > Tim > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 29, 2018 at 9:34 PM, Todd Lipcon <[email protected] > > > > > wrote: > > > > > > > On Fri, Jun 29, 2018 at 12:31 PM, Mike Percy <[email protected]> > > wrote: > > > > > > > > > This is something I've been thinking about and toying with and I'd > > like > > > > to > > > > > see if we can't get binaries available via Maven for at least one > > > > platform > > > > > (say, RHEL 7). Similar to how protobuf does it. > > > > > > > > > > > > > What about depending on a docker container than runs the kudu > > minicluster > > > > in "host" networking mode? eg https://github.com/ > > > MartinWeindel/kudu-docker > > > > is one possibility > > > > > > > > > > > > > How many platforms would need to be supported for it to be viable > for > > > > Beam? > > > > > > > > > > Thanks, > > > > > Mike > > > > > > > > > > On Fri, Jun 29, 2018 at 10:01 AM Tim <[email protected]> > > > wrote: > > > > > > > > > > > Thanks Attila > > > > > > > > > > > > That’s great feedback and helpful for me to reference as > guidance. > > > > > > > > > > > > By “Kudu installation” I was referring to the possibility that an > > > > install > > > > > > might set config etc, beyond just having the binary. I got it > > running > > > > on > > > > > > CentOS similar to how you outline now. > > > > > > > > > > > > I too believe mocking makes most sense, especially as we have the > > IT > > > > > > running as well, but was asked to explore this further. It’s > useful > > > to > > > > > know > > > > > > you’d agree. > > > > > > > > > > > > Thanks > > > > > > > > > > > > Tim > > > > > > > > > > > > > On 29 Jun 2018, at 17:33, Attila Bukor <[email protected]> > > > wrote: > > > > > > > > > > > > > > Hi Tim, > > > > > > > > > > > > > > I’m not sure what you mean by relying on actual installations. > If > > > you > > > > > > have the kudu, kudu-master and kudu-tserver binaries at the same > > > > location > > > > > > and they can be executed, MiniKuduCluster can be used (“binDir” > > > > property > > > > > > should be set to the directory containing the Kudu binaries). You > > > > should > > > > > > also look into BaseKuduTest as that will set up the > MiniKuduCluster > > > for > > > > > you > > > > > > and you don’t have to do it manually. > > > > > > > > > > > > > > Extracting the Kudu binaries from an rpm should probably work, > > but > > > > that > > > > > > binds you to CDH as currently Cloudera is the only one that ships > > > Kudu > > > > > > binaries and MacOS builds are not available anywhere afaik. Also, > > > 1.4.0 > > > > > is > > > > > > around a year old, you might want to use this repository instead > > > (from > > > > > CDH > > > > > > 5.13 Kudu is part of the CDH): > > > > > > http://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5/ > > > > > RPMS/x86_64/kudu-1.7.0+cdh5.15.0+0-1.cdh5.15.0.p0.52.el7.x86_64.rpm > > > > > > > > > > > > > > As a general suggestion, I would recommend mocking Kudu for > unit > > > > tests > > > > > > (that’s what a unit test is for after all) and create separate > > > > > integration > > > > > > tests that actually use Kudu that can be skipped where Kudu is > not > > > > > > available. Of course the CI should be set up to be able to > provide > > > all > > > > > > necessary integrations for the tests, but a developer wouldn’t > have > > > to > > > > > set > > > > > > up Kudu, or use Docker to run the tests if their change doesn’t > > > affect > > > > > the > > > > > > Kudu integration. > > > > > > > > > > > > > > Attila > > > > > > > > > > > > > >> On 2018. Jun 29., at 16:42, Tim Robertson < > > > > [email protected]> > > > > > > wrote: > > > > > > >> > > > > > > >> Hi folks, > > > > > > >> > > > > > > >> I've written Java KuduIO for Apache Beam with integration > tests > > > > making > > > > > > use > > > > > > >> of Kudu in Docker. It is yet to be committed on Apache Beam. > > > > > > >> > > > > > > >> Rather than mocking Kudu client for unit tests I'd like to > > explore > > > > use > > > > > > of > > > > > > >> the MiniKuduCluster which "Depends on precompiled kudu, > > > kudu-master, > > > > > and > > > > > > >> kudu-tserver binaries". > > > > > > >> > > > > > > >> I'd need unit tests to run on the main linux distros and OS X. > > > > > > >> > > > > > > >> For the linux distros, would an approach where I extract the > > > > binaries > > > > > > from > > > > > > >> the packages [1] work please? Or does the MiniKuduCluster rely > > on > > > > > actual > > > > > > >> installations? I am pretty weak on C builds and linked > libraries > > > etc > > > > > > (Java > > > > > > >> guy, sorry). > > > > > > >> > > > > > > >> For CentOS I'm exploring this for example: > > > > > > >> rpm2cpio ./kudu-1.4.0+cdh5.12.2+0-1. > > > cdh5.12.2.p0.8.el7.x86_64.rpm > > > > | > > > > > > cpio > > > > > > >> -idmv > > > > > > >> > > > > > > >> I haven't explored OS X options yet. > > > > > > >> > > > > > > >> Any advice here would greatly be appreciated to save me going > > > down a > > > > > > dead > > > > > > >> end. > > > > > > >> > > > > > > >> Many thanks, > > > > > > >> Tim > > > > > > >> > > > > > > >> > > > > > > >> [1] http://kudu.apache.org/docs/installation.html#install_ > > > packages > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Todd Lipcon > > > > Software Engineer, Cloudera > > > > > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > >
