Re: FileSystem API (was: Slack call notes)

David Mollitor Wed, 25 Mar 2020 12:20:20 -0700

Hello,

I too have been thinking about this for a pet project.  There is already
Apache Commons VFS that, with some investment, could probably serve all
these requirements.


On Wed, Mar 25, 2020, 3:16 PM Christopher <[email protected]> wrote:

> (Forking this thread, as it's a distinct topic)
>
> I've thought about it. The idea has driven me to try to reduce our use
> of Hadoop-specific code, and to isolate Hadoop-specific stuff behind
> some abstraction, wherever possible. Though, I'll admit, we're nowhere
> close to where we'd want to be to be fully decoupled from Hadoop.
>
> I've also been looking a lot at our VolumeManager code lately, to try
> to improve it a bit, and to create better abstractions for Volumes,
> that could aid future work in this area.
>
> But, I haven't directly been working on new FileSystem API
> abstraction... just trying to lay some groundwork for that possibility
> in future.
>
> It'd be nice to get to a point where we have a Hadoop-specific
> implementation isolated to a jar that can be swapped out at runtime
> for other file system implementations, as needed. I see that as a
> somewhat long-way off.
>
> On Wed, Mar 25, 2020 at 2:08 PM <[email protected]> wrote:
> >
> >
> >   I couldn't make the call today, but am curious if anyone has
> previously brought up creating a FileSystem API for Accumulo so that we
> could use implementations other than Hadoop. I realize that Hadoop provides
> implementations for things other than HDFS but that doesn't necessarily
> mean that all filesystem implementations are covered.
> >
> > -----Original Message-----
> > From: Christopher <[email protected]>
> > Sent: Wednesday, March 25, 2020 1:45 PM
> > To: accumulo-dev <[email protected]>
> > Subject: Slack call notes
> >
> > Several committers/contributors in the community joined a call in Slack
> on Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of
> the call. Please feel free to add to them.
> >
> > I shared the overall philosophy and backstory to some of the script
> improvements in 2.x to help guide current/future work on the scripts.
> >
> > * bin/accumulo is inspired by old jpackage.org standards which are
> still in use in RPM macros for Java packaging in Fedora/RHEL/etc. The key
> idea is that scripts are simple... set up environment (class path, etc.),
> locate java, and exec a single process with the provided args.
> > * bin/accumulo-service is inspired by old SysVInit scripts for
> start/stop/restart/status of a single service
> > * behavior of bin/accumulo and bin/accumulo-service can be manipulated
> through launch environment
> > * bin/accumulo-cluster uses bin/accumulo-service, and is provided as a
> simple, out-of-the-box cluster management tool
> > * bin/accumulo-cluster and bin/accumulo-service are replaceable; they
> are useful for out-of-the-box, but one would expect them to be unnecessary
> if using systemd, or a vendor-provided cluster management system
> > * we discussed possibly moving bin/accumulo-cluster and
> bin/accumulo-service to contrib/ in the tarball, or some subdir of bin/,
> but it was suggested to not make too many disruptive changes there
> > * we discussed the possibility of adding a config file for
> bin/accumulo-cluster (also mentioned on
> > https://github.com/apache/accumulo/pull/1568)
> > * we discussed the need to document the intent/purpose/scope of the
> scripts in comments inside the scripts themselves
> > * Ed Coleman asked if it'd be good to document a systemd example; I
> suggested it might make for a good blog post (perhaps by the person who
> wrote the systemd unit files for Fluo Muchos)
> >
> > Keith Turner discussed his development efforts with regard to enabling
> more controls over compactions.
> >
> > * one main idea was to keep configuration/API for data separate from
> that for execution
> > * data is concerns to application owners, whereas execution involves
> system admins (resource contention, etc.)
> > * he will submit a PR for review when ready
> > * he also suggested another call to go over the PR
> >
> > Billie Rinaldi discussed better support for Azure Data Lake Storage
> > Gen2 (ADLSv2).
> >
> > * maintaining a fork for experimenting, and working on reliably testing
> issues involving WALs
> > * did not recommend using ADLSv2 with WALs, but that we should still
> support it
> > * might need to implement a custom log closer to better support it
> >
> > Mike Miller brought up the idea of eliminating more static internal
> state.
> >
> > * ServerConfigurationFactory might be improved in this regard, with some
> additional ZK cleanup
> > * Other ZK cleanup might help elsewhere (such as ZooCache)
> > * I suggested tablet location cache might also benefit from being bound
> to an AccumuloClient lifecycle (or a dedicated opaque object that could be
> shared across AccumuloClient instances with its own user-managed lifecycle)
> >
> > Please add anything I might have missed (or got wrong) in response to
> this post.
> >
>

Re: FileSystem API (was: Slack call notes)

Reply via email to