Re: Accumulo 1.7 and Data Center Replication

Josh Elser Thu, 26 Jun 2014 19:49:25 -0700

Hi Joe,

I'm the guy to ask if you'd like more information about the replicationfeature. You already found the parent ticket, so that has a bunch oftechnical "what's been done".

At a high level, replication was implemented as a framework in Accumuloto copy data that was written to a table to another "location". Theprovided initial implementation is to replicate the data as-is toanother Accumulo table (usually some other Accumulo instance). You'llalso find a new page in the monitor some basic administration tools inthe code via Instance#replicationOperations.

I've published a recent version of the user manual[1] which goes intosome more detail on the feature, as well as how to configure it.

You can also check the replication component on JIRA [2] to see what Ihave lined up. Automatically replicating bulk-loaded files will be a bitof work. There are some other minor things that could be improved. Wecan delve into the more technical implementation difficulties if you'd like.

I've written a basic test to evaluate equivalence by generating a Merkletree for two tables. This has been promising so far, but it currently isliving in my Github[3]. I need to figure out where/how best to includeit in Apache.

Finally, having resources to do a larger-scale test would be great, andtesting failure conditions over multiple nodes is probably the biggestarea that needs to be tested more. I can simulate this on a small scale,but I don't have the resources to do an appropriate larger test withinjected failure.

If you have something specific you'd like to help out with, I'd be happyto work with you.

<employer-hat>This feature will also be included in the next version ofAccumulo shipped in HDP</employer-hat>


- Josh

[1] http://people.apache.org/~elserj/accumulo_user_manual.html#_replication

[2]https://issues.apache.org/jira/issues/?jql=project%20%3D%20ACCUMULO%20AND%20component%20%3D%20replication%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC%2C%20key%20DESC

[3] https://github.com/joshelser/merkle

On 6/26/14, 8:50 PM, Joe Stein wrote:

Hi, I was hoping to get some more info around the 1.7 release and what are
the to-be-dos and plans around it?

Is there any help that is needed from a contribution perspective in
anyways? Testing? Documentation? Pending coding or such?

We are going to be rolling trunk into two of our lab environments
specifically for https://issues.apache.org/jira/browse/ACCUMULO-378 as it
is a requirement for one of my projects at Bloomberg for Accumulo to have
data center replication before we go live.   This works is going to be over
the next month(s) with lots of cycles dedicated to Accumulo 1.7 in the next
few sprints.

Also, I wanted to reach out if folks are looking for full time, contract or
even side work with Accumulo. We have projects right now going on and are
looking for more hands on keyboards.

Anyways, thanks for all the great work!!!! I am looking forward to more
continued success with the system, more integrations and to be able to
become more active in the community.

/*******************************************
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

Re: Accumulo 1.7 and Data Center Replication

Reply via email to