(Forking this thread, as it's a distinct topic) I've thought about it. The idea has driven me to try to reduce our use of Hadoop-specific code, and to isolate Hadoop-specific stuff behind some abstraction, wherever possible. Though, I'll admit, we're nowhere close to where we'd want to be to be fully decoupled from Hadoop.
I've also been looking a lot at our VolumeManager code lately, to try to improve it a bit, and to create better abstractions for Volumes, that could aid future work in this area. But, I haven't directly been working on new FileSystem API abstraction... just trying to lay some groundwork for that possibility in future. It'd be nice to get to a point where we have a Hadoop-specific implementation isolated to a jar that can be swapped out at runtime for other file system implementations, as needed. I see that as a somewhat long-way off. On Wed, Mar 25, 2020 at 2:08 PM <dlmar...@comcast.net> wrote: > > > I couldn't make the call today, but am curious if anyone has previously > brought up creating a FileSystem API for Accumulo so that we could use > implementations other than Hadoop. I realize that Hadoop provides > implementations for things other than HDFS but that doesn't necessarily mean > that all filesystem implementations are covered. > > -----Original Message----- > From: Christopher <ctubb...@apache.org> > Sent: Wednesday, March 25, 2020 1:45 PM > To: accumulo-dev <dev@accumulo.apache.org> > Subject: Slack call notes > > Several committers/contributors in the community joined a call in Slack on > Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of the > call. Please feel free to add to them. > > I shared the overall philosophy and backstory to some of the script > improvements in 2.x to help guide current/future work on the scripts. > > * bin/accumulo is inspired by old jpackage.org standards which are still in > use in RPM macros for Java packaging in Fedora/RHEL/etc. The key idea is that > scripts are simple... set up environment (class path, etc.), locate java, and > exec a single process with the provided args. > * bin/accumulo-service is inspired by old SysVInit scripts for > start/stop/restart/status of a single service > * behavior of bin/accumulo and bin/accumulo-service can be manipulated > through launch environment > * bin/accumulo-cluster uses bin/accumulo-service, and is provided as a > simple, out-of-the-box cluster management tool > * bin/accumulo-cluster and bin/accumulo-service are replaceable; they are > useful for out-of-the-box, but one would expect them to be unnecessary if > using systemd, or a vendor-provided cluster management system > * we discussed possibly moving bin/accumulo-cluster and bin/accumulo-service > to contrib/ in the tarball, or some subdir of bin/, but it was suggested to > not make too many disruptive changes there > * we discussed the possibility of adding a config file for > bin/accumulo-cluster (also mentioned on > https://github.com/apache/accumulo/pull/1568) > * we discussed the need to document the intent/purpose/scope of the scripts > in comments inside the scripts themselves > * Ed Coleman asked if it'd be good to document a systemd example; I suggested > it might make for a good blog post (perhaps by the person who wrote the > systemd unit files for Fluo Muchos) > > Keith Turner discussed his development efforts with regard to enabling more > controls over compactions. > > * one main idea was to keep configuration/API for data separate from that for > execution > * data is concerns to application owners, whereas execution involves system > admins (resource contention, etc.) > * he will submit a PR for review when ready > * he also suggested another call to go over the PR > > Billie Rinaldi discussed better support for Azure Data Lake Storage > Gen2 (ADLSv2). > > * maintaining a fork for experimenting, and working on reliably testing > issues involving WALs > * did not recommend using ADLSv2 with WALs, but that we should still support > it > * might need to implement a custom log closer to better support it > > Mike Miller brought up the idea of eliminating more static internal state. > > * ServerConfigurationFactory might be improved in this regard, with some > additional ZK cleanup > * Other ZK cleanup might help elsewhere (such as ZooCache) > * I suggested tablet location cache might also benefit from being bound to an > AccumuloClient lifecycle (or a dedicated opaque object that could be shared > across AccumuloClient instances with its own user-managed lifecycle) > > Please add anything I might have missed (or got wrong) in response to this > post. >