RE: Slack call notes

dlmarion Wed, 25 Mar 2020 11:09:20 -0700


  I couldn't make the call today, but am curious if anyone has previously 
brought up creating a FileSystem API for Accumulo so that we could use 
implementations other than Hadoop. I realize that Hadoop provides 
implementations for things other than HDFS but that doesn't necessarily mean 
that all filesystem implementations are covered.

-----Original Message-----
From: Christopher <[email protected]> 
Sent: Wednesday, March 25, 2020 1:45 PM
To: accumulo-dev <[email protected]>
Subject: Slack call notes

Several committers/contributors in the community joined a call in Slack on 
Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of the 
call. Please feel free to add to them.

I shared the overall philosophy and backstory to some of the script 
improvements in 2.x to help guide current/future work on the scripts.

* bin/accumulo is inspired by old jpackage.org standards which are still in use 
in RPM macros for Java packaging in Fedora/RHEL/etc. The key idea is that 
scripts are simple... set up environment (class path, etc.), locate java, and 
exec a single process with the provided args.
* bin/accumulo-service is inspired by old SysVInit scripts for 
start/stop/restart/status of a single service
* behavior of bin/accumulo and bin/accumulo-service can be manipulated through 
launch environment
* bin/accumulo-cluster uses bin/accumulo-service, and is provided as a simple, 
out-of-the-box cluster management tool
* bin/accumulo-cluster and bin/accumulo-service are replaceable; they are 
useful for out-of-the-box, but one would expect them to be unnecessary if using 
systemd, or a vendor-provided cluster management system
* we discussed possibly moving bin/accumulo-cluster and bin/accumulo-service to 
contrib/ in the tarball, or some subdir of bin/, but it was suggested to not 
make too many disruptive changes there
* we discussed the possibility of adding a config file for bin/accumulo-cluster 
(also mentioned on
https://github.com/apache/accumulo/pull/1568)
* we discussed the need to document the intent/purpose/scope of the scripts in 
comments inside the scripts themselves
* Ed Coleman asked if it'd be good to document a systemd example; I suggested 
it might make for a good blog post (perhaps by the person who wrote the systemd 
unit files for Fluo Muchos)

Keith Turner discussed his development efforts with regard to enabling more 
controls over compactions.

* one main idea was to keep configuration/API for data separate from that for 
execution
* data is concerns to application owners, whereas execution involves system 
admins (resource contention, etc.)
* he will submit a PR for review when ready
* he also suggested another call to go over the PR

Billie Rinaldi discussed better support for Azure Data Lake Storage
Gen2 (ADLSv2).

* maintaining a fork for experimenting, and working on reliably testing issues 
involving WALs
* did not recommend using ADLSv2 with WALs, but that we should still support it
* might need to implement a custom log closer to better support it

Mike Miller brought up the idea of eliminating more static internal state.

* ServerConfigurationFactory might be improved in this regard, with some 
additional ZK cleanup
* Other ZK cleanup might help elsewhere (such as ZooCache)
* I suggested tablet location cache might also benefit from being bound to an 
AccumuloClient lifecycle (or a dedicated opaque object that could be shared 
across AccumuloClient instances with its own user-managed lifecycle)

Please add anything I might have missed (or got wrong) in response to this post.

RE: Slack call notes

Reply via email to