Hello Team,

One short coming of Apache Accumulo and Apache HBase, as I understand it,
is that they both rely on the HDFS for replicated WAL management.
Therefore, HDFS is a requirement even if deploying to a cloud solution.  I
believe Google has developed a consensus enabled WAL management so that
three instances can be stood up without any external dependencies (other
than storage for the collection of rfile/hfile).

Be interested to hear your thoughts on this.

On Fri, Oct 25, 2019 at 1:46 PM Mike Miller <[email protected]> wrote:

> Hi Tushar,
>
> The closest thing we have are the performance tests in accumulo-testing,
> which is probably the best place.
> https://github.com/apache/accumulo-testing#performance-test
> The instructions for setting up the scripts are in the README.  There are
> only a limited number of tests written though and they used to be
> integration tests that were moved out of the main test package.
>
> org.apache.accumulo.testing.performance.tests.DurabilityWriteSpeedPT
> org.apache.accumulo.testing.performance.tests.YieldingScanExecutorPT
> org.apache.accumulo.testing.performance.tests.ScanExecutorPT
> org.apache.accumulo.testing.performance.tests.ScanFewFamiliesPT
> org.apache.accumulo.testing.performance.tests.ConditionalMutationsPT
> org.apache.accumulo.testing.performance.tests.RandomCachedLookupsPT
>
> On Thu, Oct 24, 2019 at 8:09 PM Tushar Dhadiwal <[email protected]>
> wrote:
>
> > Hello Everyone,
> >
> >
> > I am a Software Engineer at Microsoft and our team is currently working
> on
> > making the deployment and operations of Accumulo on Azure as seamless as
> > possible. As part of this effort, we are attempting to observe / measure
> > some standard Accumulo operations (e.g. scan, canary queries, ingest,
> etc.)
> > and how their performance varies over time on long standing Accumulo
> > clusters running in Azure. As part of this we’re looking to come up with
> a
> > metric that we can use to evaluate how healthy / available an Accumulo
> > cluster is. Over time we intend to use this to understand how underlying
> > platform changes in Azure can affect overall health of Accumulo
> workloads.
> >
> >
> >
> > As a starting metric for example, we are thinking of continually doing
> > scans of random values across various tablet servers and capturing timing
> > information related to how long such scans take. I took a quick look at
> the
> > accumulo-testing repo and didn’t find any tests or probes attempting to
> do
> > something along these lines. Does something like this seem reasonable?
> Has
> > anyone previously attempted something similar? Does accumulo-testing seem
> > like a reasonable place for code that attempts to do something like this?
> >
> >
> >
> > Appreciate your thoughts and feedback.
> >
> >
> >
> > Cheers,
> >
> > Tushar Dhadiwal
> >
>

Reply via email to