Re: FileSystem API (was: Slack call notes)

Christopher Wed, 25 Mar 2020 15:09:12 -0700

Only 705 across 280 files, if you exclude Text, though :)

grep -rP 'org[.]apache[.]hadoop(?![.]io[.]Text)' --include='*.java' *
| grep -v test/ | wc -l


On Wed, Mar 25, 2020 at 3:34 PM Mike Miller <[email protected]> wrote:
>
> I think we have come a long way removing any external types from the API,
> for reasons other than de-coupling from Hadoop.  While we don't have many
> dependencies on the other components of Hadoop, we are still very tightly
> coupled to HDFS.
> For example, some quick grep'ing of the code shows:
> "grep -r "import org.apache.hadoop" --include=*.java * | wc -l"
> 1734
> Without tests it is slightly more feasible...
> grep -r "import org.apache.hadoop" --include=*.java * | grep -v "test" | wc
> -l
> 858
>
>
> On Wed, Mar 25, 2020 at 3:19 PM David Mollitor <[email protected]> wrote:
>
> > Hello,
> >
> > I too have been thinking about this for a pet project.  There is already
> > Apache Commons VFS that, with some investment, could probably serve all
> > these requirements.
> >
> > On Wed, Mar 25, 2020, 3:16 PM Christopher <[email protected]> wrote:
> >
> > > (Forking this thread, as it's a distinct topic)
> > >
> > > I've thought about it. The idea has driven me to try to reduce our use
> > > of Hadoop-specific code, and to isolate Hadoop-specific stuff behind
> > > some abstraction, wherever possible. Though, I'll admit, we're nowhere
> > > close to where we'd want to be to be fully decoupled from Hadoop.
> > >
> > > I've also been looking a lot at our VolumeManager code lately, to try
> > > to improve it a bit, and to create better abstractions for Volumes,
> > > that could aid future work in this area.
> > >
> > > But, I haven't directly been working on new FileSystem API
> > > abstraction... just trying to lay some groundwork for that possibility
> > > in future.
> > >
> > > It'd be nice to get to a point where we have a Hadoop-specific
> > > implementation isolated to a jar that can be swapped out at runtime
> > > for other file system implementations, as needed. I see that as a
> > > somewhat long-way off.
> > >
> > > On Wed, Mar 25, 2020 at 2:08 PM <[email protected]> wrote:
> > > >
> > > >
> > > >   I couldn't make the call today, but am curious if anyone has
> > > previously brought up creating a FileSystem API for Accumulo so that we
> > > could use implementations other than Hadoop. I realize that Hadoop
> > provides
> > > implementations for things other than HDFS but that doesn't necessarily
> > > mean that all filesystem implementations are covered.
> > > >
> > > > -----Original Message-----
> > > > From: Christopher <[email protected]>
> > > > Sent: Wednesday, March 25, 2020 1:45 PM
> > > > To: accumulo-dev <[email protected]>
> > > > Subject: Slack call notes
> > > >
> > > > Several committers/contributors in the community joined a call in Slack
> > > on Wednesday, at 1130-1230, New York (Eastern) time. Here are my notes of
> > > the call. Please feel free to add to them.
> > > >
> > > > I shared the overall philosophy and backstory to some of the script
> > > improvements in 2.x to help guide current/future work on the scripts.
> > > >
> > > > * bin/accumulo is inspired by old jpackage.org standards which are
> > > still in use in RPM macros for Java packaging in Fedora/RHEL/etc. The key
> > > idea is that scripts are simple... set up environment (class path, etc.),
> > > locate java, and exec a single process with the provided args.
> > > > * bin/accumulo-service is inspired by old SysVInit scripts for
> > > start/stop/restart/status of a single service
> > > > * behavior of bin/accumulo and bin/accumulo-service can be manipulated
> > > through launch environment
> > > > * bin/accumulo-cluster uses bin/accumulo-service, and is provided as a
> > > simple, out-of-the-box cluster management tool
> > > > * bin/accumulo-cluster and bin/accumulo-service are replaceable; they
> > > are useful for out-of-the-box, but one would expect them to be
> > unnecessary
> > > if using systemd, or a vendor-provided cluster management system
> > > > * we discussed possibly moving bin/accumulo-cluster and
> > > bin/accumulo-service to contrib/ in the tarball, or some subdir of bin/,
> > > but it was suggested to not make too many disruptive changes there
> > > > * we discussed the possibility of adding a config file for
> > > bin/accumulo-cluster (also mentioned on
> > > > https://github.com/apache/accumulo/pull/1568)
> > > > * we discussed the need to document the intent/purpose/scope of the
> > > scripts in comments inside the scripts themselves
> > > > * Ed Coleman asked if it'd be good to document a systemd example; I
> > > suggested it might make for a good blog post (perhaps by the person who
> > > wrote the systemd unit files for Fluo Muchos)
> > > >
> > > > Keith Turner discussed his development efforts with regard to enabling
> > > more controls over compactions.
> > > >
> > > > * one main idea was to keep configuration/API for data separate from
> > > that for execution
> > > > * data is concerns to application owners, whereas execution involves
> > > system admins (resource contention, etc.)
> > > > * he will submit a PR for review when ready
> > > > * he also suggested another call to go over the PR
> > > >
> > > > Billie Rinaldi discussed better support for Azure Data Lake Storage
> > > > Gen2 (ADLSv2).
> > > >
> > > > * maintaining a fork for experimenting, and working on reliably testing
> > > issues involving WALs
> > > > * did not recommend using ADLSv2 with WALs, but that we should still
> > > support it
> > > > * might need to implement a custom log closer to better support it
> > > >
> > > > Mike Miller brought up the idea of eliminating more static internal
> > > state.
> > > >
> > > > * ServerConfigurationFactory might be improved in this regard, with
> > some
> > > additional ZK cleanup
> > > > * Other ZK cleanup might help elsewhere (such as ZooCache)
> > > > * I suggested tablet location cache might also benefit from being bound
> > > to an AccumuloClient lifecycle (or a dedicated opaque object that could
> > be
> > > shared across AccumuloClient instances with its own user-managed
> > lifecycle)
> > > >
> > > > Please add anything I might have missed (or got wrong) in response to
> > > this post.
> > > >
> > >
> >

Re: FileSystem API (was: Slack call notes)

Reply via email to