Some more information on the subject. A few of us got together to co-work today and had an informal discussion on our individual interests for 1.7. Summary incoming:

*Monitor re-write*
- I was pushing this one, I think the monitor still has merit despite the goal of the desire of other to just integrate with external systems
  - I have some code in place, but still needs more work.
- Is a unified/stable "metrics" API necessary for integration w/ external tools? (or is JMX enough?)
  - An API would probably be a more usable interface than JMX
- Such an API should be stateless (no log aggregation nor statistics over time)
  - Monitor still has uses for standalone/small deployments
- If still being used, MVC approach would ease testing and addition of new data and views
  - Not necessary to hold up 1.7.0 from happening

*Revisit performance*
- Eric mentioned that he wants to spend some time running some Accumulo benchmarks, specifically YCSB.
  - Lots of related topics were mentioned that might be relevant
* Other HDFS block cache implementations (HBase has lots of nice benchmarks, could learn from them)
    * A WIP patch for metadata updates have some promise (ACCUMULO-2889)
    * Collapse iterator stack (ACCUMULO-3079)
* Possible improvements to Scanner for single-batch cases (reduce a few RPCs to one RPC)
  - Actual changes made likely to be found via investigation
  - Changing default conf values where relevant also mentioned

*Distributed Tracing*
- Billie has been spending some time working w/ some people on replacing Cloudtrace with HTrace - Mentioned that HTrace shares a remarkable amount of similarity with our existing tracing library
  - Upstream efforts in Hadoop-3 to integrate htrace to DN/NN calls
- Some consideration given to replace traceserver with zipkin however not required for the first implementation

*Decouple MiniAccumuloCluster from ITs*
  - Another one I've started working on
  - ITs are really great, we have a lot for really good cases
  - Running them against a real instance in infeasible right now
- Would be good to express as many as possible in terms of only using Instance+Connector - Christopher mentioned possible benefit outside of tests to using the accumulo-maven-plugin as the "shim" between a real instance and a MiniAccumuloCluster - Some tests are written explicitly for MAC and must be ignored or run against a MAC when a real instance is available.

*Upgrade test script*
- Keith mentioned there's some code from John McNamee that might help testing upgrade paths

*Hadoop Metrics2*
  - Metrics2 is the current library in use by Hadoop
- Integration gives us a lot more flexibility, notably good integration with Ganglia provided (ACCUMULO-1817) - No one expressed interest in working on this directly (potential to slip)

*Deprecate MockAccumulo?*
  - Talked about this for 1.6, decided against
  - It's now 1.7. Is it time?
  - Remember, deprecate != removal

There are some outstanding things we need to investigate more:
- Is improved JMX or metrics2 impl sufficient for integration with external monitoring tools? (considerations: nagios, ganglia, statsd, collectd, carbon, riemann... others?) - BatchWriter has some weird cases around error handling. Is intended that it survives failures, but that's very much not the case. Should probably be fixed around a major release, but need to figure out how exactly to fix it (needs someone to get behind it)

If people want to continue discussion on these, let's break off individual topics into their own thread for clarity (and my sanity).

Also, anyone have a desire to be "release manager"?

- Josh

Josh Elser wrote:
Thanks, John.

I was thinking about trying to gun for January time-frame for a release.
I'd love to say before 2014 is over, but that probably just won't happen
for a major release with the holidays.

For 1.7 right now, I see the following "bigger" items (correct me where
I'm wrong):

* Replication (done)
* Upgrade rules/guarantees (proposed)
* Replace cloudtrace (in-progress)
* Rewrite monitor, include REST service (in-progress)
* Drop Hadoop 1 support (proposed)
* Decouple MiniAccumulo from ITs (in-progress)
* Other minicluster types: in-process, shim to real instance (in-progress)
* Support Hadoop metrics2 (proposed)
* A few WAL/metadata related performance improvements (in-progress)

Also, would be good to check the In-Progress state issues on JIRA. What
do people think?

John Vines wrote:
Moving this to it's own thread...

On Mon, Oct 6, 2014 at 5:54 PM, Mike Drob<[email protected]> wrote:

Related: Do we have a release timeline for 1.7?

Reply via email to