Oh, I like that idea, Bill & Sean.

Package: org.apache.accumulo.cluster
Public API: org.apache.accumulo.cluster.AccumuloCluster
MAC: org.apache.accumulo.cluster.mini.MiniAccumuloCluster (implements AccumuloCluster, allows for backwards compat)
Yarn: org.apache.accumulo.cluster.yarn
Docker: ...
Mesos: ...

etc etc etc.

One question in my mind, do we keep the maven module 'accumulo-minicluster'? I would imagine that if we struck the 'mini' portion from 1.6 that would create some confusion. Would it be worth the indirection to rename accumulo-minicluster to accumulo-cluster and then create a new accumulo-minicluster module that depends on accumulo-minicluster (but contains no code itself) to preserve the 1.4 and 1.5 poms to generally work with a version bump? I'm not sure if Maven would be happy with that or do what I think it "should".

On 3/28/14, 6:26 AM, Bill Havanki wrote:
I've been watching the conversation on the side, but I wanted to mention
that it seems the focus isn't so much on "mini" clusters anymore. You're
thinking of programmatic cluster management, whether one node or many. The
idea of a basic cluster management interface, with MAC as an
implementation, is promising. A package name of just "cluster" could work.

Carry on :)

Bill H


On Fri, Mar 28, 2014 at 12:39 AM, Sean Busbey <[email protected]>wrote:

If you decide to go the mapred/mapreduce way, you could go with the package
name "mini".

alternatively, we can do a multi-stage change out

1)  1.6.x:  introduce TestAccumuloCluster interface, @deprecate
MiniAccumuloCluster class and make it implement TestAccumuloCluster

2) 1.6 + major: change MiniAccumuloCluster to an interface that extends
TestAccumuloCluster, @deprecate TestAccumuloCluster

3) 1.6 + 2 major: remove TestAccumuloCluster

Or just go with TestAccumuloCluster as the interface, have
MiniAccumuloCluster as the local pseudo distributed implementation, and
then call your new one something like YarnAccumuloCluster.

In that case we could use the deprecation cycle to move the MAC class out
of the public api.


On Thu, Mar 27, 2014 at 6:48 PM, Josh Elser <[email protected]> wrote:

Thoughts on if this would be an acceptable change for 1.6.0 to alleviate
future cruft?

Suggestions on the new package and/or class name would be greatly
appreciated over "NewMiniAccumuloC*".


On 3/26/14, 3:37 PM, Josh Elser wrote:

Those who are interested: check out
https://github.com/joshelser/accumulo/commit/
9f63cf32559ab514a69ff2c6b02acef9c9cbb4e8


tl;dr I could create some real interfaces for the cluster and config,
which are "hidden" under the covers by the 1.4 and 1.5
MiniAccumuloCluster and MiniAccumuloConfig classes. This de-couples the
default implementation, gives us the ability to hide "implementation
details" if wanted, and moves us towards some factory methods instead of
calling a class directly.

Thoughts?

On 3/26/14, 1:21 PM, Josh Elser wrote:

Yes, very much experimental at this point.

What I'm most concerned about is having reasonable hooks up front, not
trying to make an implementation for inclusion 1.6.0.

Regarding additions, the implementations already contains most things I
would want to expose. I haven't come up with anything that would be
generally returned through the "API" rather than through this proposed
implementation (e.g. YARN connection information)

On 3/26/14, 11:57 AM, Keith Turner wrote:

What you are trying to do sounds interesting.  It also sounds
experimental
and in the early stages.   Is there anything specific you think
should be
done for 1.6.0 w/ regards to MAC API?


On Wed, Mar 26, 2014 at 2:26 PM, Josh Elser <[email protected]>
wrote:

  On 3/26/14, 11:13 AM, Keith Turner wrote:

  On Wed, Mar 26, 2014 at 2:05 PM, Josh Elser <[email protected]>
wrote:

   On 3/26/14, 10:57 AM, Keith Turner wrote:


   Can you give an example of what you are thinking of? I don't
understand

you
viewpoint either


  Sure. One limitation of MAC, in general as a testing harness, is
that it
doesn't adequately exercise multi-node implementations. You can run
multiple tservers, but they are all on the same host which limits
the
validity of a "robust" test. This is my immediate goal.

Multi-node deployments are capable using something like Mesos or
Yarn.
Given that there is already functioning support to deploy Accumulo
on
Yarn,
this was my goal.

My goal is to be able to have the ability to run all of our
AbstractMacIT
implementations against "real" hardware without changing a single
line of
test code (ok - maybe a line or two to do injection of the MAC
implementation). The point is, I believe there could be a huge
testing
gain
from being able to write tests which leverage yarn, have the same
programmatic configuration API from MAC, and provide near "real"
Accumulo
semantics.


  Ok so you want to MAC to be an interface so that you can provide a
completely different implementation?


  Correct. Some things would serve well in a common abstract base
(e.g.
numTservers, siteXml configuration), but all the nonsense about
creating
directory structures and managing Processes is implementation
specific.

Perhaps I could create a new interface that the current
implementation
implements which still provides the same semantics from 1.4 and 1.5.
Let me
see if I can mock up what I'm thinking -- that will probably be
easier than
me trying to write it out.







Reply via email to