FileSystem API (was: Slack call notes)

Christopher Wed, 25 Mar 2020 12:17:20 -0700

(Forking this thread, as it's a distinct topic)

I've thought about it. The idea has driven me to try to reduce our use
of Hadoop-specific code, and to isolate Hadoop-specific stuff behind
some abstraction, wherever possible. Though, I'll admit, we're nowhere
close to where we'd want to be to be fully decoupled from Hadoop.


I've also been looking a lot at our VolumeManager code lately, to try
to improve it a bit, and to create better abstractions for Volumes,
that could aid future work in this area.

But, I haven't directly been working on new FileSystem API
abstraction... just trying to lay some groundwork for that possibility
in future.

It'd be nice to get to a point where we have a Hadoop-specific
implementation isolated to a jar that can be swapped out at runtime
for other file system implementations, as needed. I see that as a
somewhat long-way off.

On Wed, Mar 25, 2020 at 2:08 PM <dlmar...@comcast.net> wrote:
>
>
>   I couldn't make the call today, but am curious if anyone has previously 
> brought up creating a FileSystem API for Accumulo so that we could use 
> implementations other than Hadoop. I realize that Hadoop provides 
> implementations for things other than HDFS but that doesn't necessarily mean 
> that all filesystem implementations are covered.
>
> -----Original Message-----
> From: Christopher <ctubb...@apache.org>
> Sent: Wednesday, March 25, 2020 1:45 PM
> To: accumulo-dev <dev@accumulo.apache.org>
> Subject: Slack call notes
>
> Several committers/contributors in the community joined a call in Slack on 
> Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of the 
> call. Please feel free to add to them.
>
> I shared the overall philosophy and backstory to some of the script 
> improvements in 2.x to help guide current/future work on the scripts.
>
> * bin/accumulo is inspired by old jpackage.org standards which are still in 
> use in RPM macros for Java packaging in Fedora/RHEL/etc. The key idea is that 
> scripts are simple... set up environment (class path, etc.), locate java, and 
> exec a single process with the provided args.
> * bin/accumulo-service is inspired by old SysVInit scripts for 
> start/stop/restart/status of a single service
> * behavior of bin/accumulo and bin/accumulo-service can be manipulated 
> through launch environment
> * bin/accumulo-cluster uses bin/accumulo-service, and is provided as a 
> simple, out-of-the-box cluster management tool
> * bin/accumulo-cluster and bin/accumulo-service are replaceable; they are 
> useful for out-of-the-box, but one would expect them to be unnecessary if 
> using systemd, or a vendor-provided cluster management system
> * we discussed possibly moving bin/accumulo-cluster and bin/accumulo-service 
> to contrib/ in the tarball, or some subdir of bin/, but it was suggested to 
> not make too many disruptive changes there
> * we discussed the possibility of adding a config file for 
> bin/accumulo-cluster (also mentioned on
> https://github.com/apache/accumulo/pull/1568)
> * we discussed the need to document the intent/purpose/scope of the scripts 
> in comments inside the scripts themselves
> * Ed Coleman asked if it'd be good to document a systemd example; I suggested 
> it might make for a good blog post (perhaps by the person who wrote the 
> systemd unit files for Fluo Muchos)
>
> Keith Turner discussed his development efforts with regard to enabling more 
> controls over compactions.
>
> * one main idea was to keep configuration/API for data separate from that for 
> execution
> * data is concerns to application owners, whereas execution involves system 
> admins (resource contention, etc.)
> * he will submit a PR for review when ready
> * he also suggested another call to go over the PR
>
> Billie Rinaldi discussed better support for Azure Data Lake Storage
> Gen2 (ADLSv2).
>
> * maintaining a fork for experimenting, and working on reliably testing 
> issues involving WALs
> * did not recommend using ADLSv2 with WALs, but that we should still support 
> it
> * might need to implement a custom log closer to better support it
>
> Mike Miller brought up the idea of eliminating more static internal state.
>
> * ServerConfigurationFactory might be improved in this regard, with some 
> additional ZK cleanup
> * Other ZK cleanup might help elsewhere (such as ZooCache)
> * I suggested tablet location cache might also benefit from being bound to an 
> AccumuloClient lifecycle (or a dedicated opaque object that could be shared 
> across AccumuloClient instances with its own user-managed lifecycle)
>
> Please add anything I might have missed (or got wrong) in response to this 
> post.
>

FileSystem API (was: Slack call notes)

Reply via email to