Re: [DISCUSS] Any interest in separate client/server tarballs

Josh Elser Fri, 05 Jan 2018 13:55:29 -0800

I think it would depend how much other "stuff" has to come in to supportthe *Clusters. I assumed it would be a bit, but, if it's not, I have noobjections to a single jar.


On 1/5/18 4:38 PM, Michael Wall wrote:

Yeah, I was thinking more like your second paragraph.  Thinking I would use
the proposed client jar to develop against the MiniAccumuloCluster
(typically the StandaloneMiniAccumuloCluster for me) and then deploy that
code to run against a real cluster.  Would like to flesh that usecase out a
little more.  Do you think it has to be another jar on top of the client
jar?


On Fri, Jan 5, 2018 at 4:31 PM Josh Elser <[email protected]> wrote:

MAC, in its common state, is probably not something we'd want to include
in this proposed tarball. The reasoning being that MAC (and related
classes) aren't something that people would need on your "Hadoop
Cluster" to talk to Accumulo. It's something that can just be obtained
via Maven.

However, if you're more referring to MAC as the generic
"AccumuloCluster" interface (an attempt to make running tests against
MAC and a real Accumulo cluster transparent --
StandaloneAccumuloCluster), then I could see some JAR that we'd include
which would contain the necessary classes (on top of
accumulo-client.jar) for users to run code seamlessly against a
traditional MAC or the StandaloneAccumuloCluster.

On 1/5/18 4:22 PM, Michael Wall wrote:

I like the idea of a client jar that has less dependencies.  Josh, where
are thinking the MiniAccumuloCluster fits in here?

On Fri, Jan 5, 2018 at 3:57 PM Christopher <[email protected]> wrote:

On Fri, Jan 5, 2018 at 10:30 AM Keith Turner <[email protected]> wrote:

On Thu, Jan 4, 2018 at 7:43 PM, Christopher <[email protected]>

wrote:

tl;dr : I would prefer not to add another tarball as part of our

"official"

I am not opposed to replacing the current single tarball with client
and server tarballs.   What I find appealing about this is if the
client tarball has less deps.

However I think a lot of thought should be put into the scripts if
this is done.  For example the client tar and server tar should
probably not both have accumulo commands that do different things.

Agreed on Keith's point about the scripts and it requiring some
consideration.

releases, but I'd be in favor of a blog instructions, script, or build
profile, which users could read/execute/activate to create a

client-centric

package.

I've long believed that supporting different downstream packaging

scenarios

should be prioritized over upstream binary packaging. I have argued in


These "downstream" packaging could be done within the Apache Accumulo
project also.  Like accumulo-docker.  Creating other packaging
projects within Accumulo is something to consider.

+1; When I say "downstream", it's a role, not an entity. The point is

that

it's a distinct activity. accumulo-docker is a perfect example of a
"downstream packaging" project maintained by the upstream community. I

find

it frustrating sometimes when supporting users that they can't tell the
difference between what is "Accumulo" and what is "this specific
packaging/configuration/deployment of Accumulo", because we don't make
those lines clear. I think we can draw these lines a bit more clearly.

favor of removing our current tarball entirely, while supporting

efforts

to

Apache Accumulo needs some sort of tarball that makes it easy to run
the code on a cluster, otherwise how can we test Accumulo on a cluster
for releases?

A binary tarball may be the best for this, but it's little more than the
jars in Maven Central and a few text files. It could be trivially

replaced

with a simple script and manifest; it could also be replaced with an

RPM, a

docker image, or any number of things. A tarball is just one type of
packaging for Accumulo's binaries.

In any case, I wasn't talking about removing the ability to produce a
binary tarball from source. Only removing it from our release artifacts

and

downloads. It is not a popular opinion, but I still think it's

reasonable,

with both pros and cons.

enable downstream packaging by modularizing the server code,

supporting a

client-API jar (future work), and decoupling code from launch scripts.

think we should continue to do these kinds of improvements to support
different packaging scenarios downstream, but I'd prefer to avoid
additional "official" binary releases.


I agree, I think if the Accumulo Java code made less assumptions about
its runtime env it would result in code that is easier to maintain and
package for different environments.

In Fluo we have recently done a lot of work in order to support
Docker, Mesos, and Kubernetes.  This work has really cleaned up the
core Fluo code making it easier to run in any environment.

I suspect pulling the Accumuo tar ball into a separate git repo and
out of the main repo may help highlight some of the assumptions
Accumulo Java code makes about the environment.

This is basically what the assemble module is now. It's why I moved the

bin

and conf directories into it, and have made its dependencies optional so
they wouldn't be resolved transitively, and why I made the assembly

plugin

gather up the libs instead of the dependency plugin which used to drop

them

in a lib directory at the root of the source checkout. This module is

the

"downstream packaging" for the current "all-in-one" binary tarball

package.

I think these clean up issues are related to what Josh is suggesting,
but are not prerequisites.  So it makes sense to discuss them at this
point, but I don't think they should block work on two tarballs if
that seems like a good idea.

Agreed. That discussion can be deferred. Much depends on how it is to be
split up.


Rather than provide additional packages, I'd prefer to work with

downstream

to make the source more "packagable" to suit the needs of these

downstream

vendor/community packagers. One way we can do that here is by either
documenting what would be needed in a client-centric package, or by
providing a script or build profile to create it from source, so that

your

$dayjob or any other downstream packager doesn't have to figure that

out

from scratch.

On Thu, Jan 4, 2018 at 7:17 PM Josh Elser <[email protected]>

wrote:

Hi,

$dayjob presented me with a request to break up the current tarball

into

two: one suitable for "users" and another for the Accumulo services.

The

ultimate goal is to make upgrade scenarios a bit easier by having

client

and server centric packaging.

The "client" tarball would be something suitable for most users
providing the ability to do things like:

* Launch a java app against Accumulo
* Launch a MapReduce job against Accumulo
* Launch the Accumulo shell

Essentially, the client tarball is just a pared down version of our
"current" tarball and the server-tarball is likely equivalent to our
"current" tarball (given that we have little code which would be
considered client-only).

Obviously, there are many ways to go about this. If there is buy-in

from

other folks, adding some new assembly descriptors and making it a

part

of the Maven build (perhaps, optionally generated) would be the

easiest

in terms of maintenance. However, I don't want to push for that if

it's

just going to be ignored by folks. I'll be creating something to

support

this one way or another.

Any thoughts/opinions? Would this have any value to other folks?

- Josh

Re: [DISCUSS] Any interest in separate client/server tarballs

Reply via email to