Hello Team, One short coming of Apache Accumulo and Apache HBase, as I understand it, is that they both rely on the HDFS for replicated WAL management. Therefore, HDFS is a requirement even if deploying to a cloud solution. I believe Google has developed a consensus enabled WAL management so that three instances can be stood up without any external dependencies (other than storage for the collection of rfile/hfile).
Be interested to hear your thoughts on this. On Fri, Oct 25, 2019 at 1:46 PM Mike Miller <[email protected]> wrote: > Hi Tushar, > > The closest thing we have are the performance tests in accumulo-testing, > which is probably the best place. > https://github.com/apache/accumulo-testing#performance-test > The instructions for setting up the scripts are in the README. There are > only a limited number of tests written though and they used to be > integration tests that were moved out of the main test package. > > org.apache.accumulo.testing.performance.tests.DurabilityWriteSpeedPT > org.apache.accumulo.testing.performance.tests.YieldingScanExecutorPT > org.apache.accumulo.testing.performance.tests.ScanExecutorPT > org.apache.accumulo.testing.performance.tests.ScanFewFamiliesPT > org.apache.accumulo.testing.performance.tests.ConditionalMutationsPT > org.apache.accumulo.testing.performance.tests.RandomCachedLookupsPT > > On Thu, Oct 24, 2019 at 8:09 PM Tushar Dhadiwal <[email protected]> > wrote: > > > Hello Everyone, > > > > > > I am a Software Engineer at Microsoft and our team is currently working > on > > making the deployment and operations of Accumulo on Azure as seamless as > > possible. As part of this effort, we are attempting to observe / measure > > some standard Accumulo operations (e.g. scan, canary queries, ingest, > etc.) > > and how their performance varies over time on long standing Accumulo > > clusters running in Azure. As part of this we’re looking to come up with > a > > metric that we can use to evaluate how healthy / available an Accumulo > > cluster is. Over time we intend to use this to understand how underlying > > platform changes in Azure can affect overall health of Accumulo > workloads. > > > > > > > > As a starting metric for example, we are thinking of continually doing > > scans of random values across various tablet servers and capturing timing > > information related to how long such scans take. I took a quick look at > the > > accumulo-testing repo and didn’t find any tests or probes attempting to > do > > something along these lines. Does something like this seem reasonable? > Has > > anyone previously attempted something similar? Does accumulo-testing seem > > like a reasonable place for code that attempts to do something like this? > > > > > > > > Appreciate your thoughts and feedback. > > > > > > > > Cheers, > > > > Tushar Dhadiwal > > >
