Hey Josh, Thank you for the thoughtful clarification to my otherwise boorish remarks.
I was surprised to learn that there are a few file systems that provide the required WAL append guarantees. Is there documentation that covers this topic and lists the file systems? Thanks On Fri, Oct 25, 2019, 5:28 PM Josh Elser <[email protected]> wrote: > Forking this off because I don't think it's related to Tushar's original > question. > > HBase and Accumulo both implementation a WAL which can be said to > relying on a distributed FileSystem which: > > 1. Is API compatible with HDFS > 2. Guarantees that data written prior to an hflush/hsync() is durable > > There are actually a few filesystems capable of this: HDFS (duh), > Azure's Windows Azure Storage Blob (WASB), Azure's Data Lake Store > (ADLS), and Azure's Blob Filesystem (ABFS). > > Azure has had a pretty long interaction with the upstream Hadoop project > (and some ties in with the HBase project) to make sure that we know how > to configure their Hadoop drivers that work with those Azure blob stores > to make that durability guarantee. > > That said, it's wrong to say that HBase/Accumulo in a cloud solution > require HDFS. It is accurate to say that S3 (via the S3A adapter) does > not provide the durability guarantees that HBase/Accumulo need for WALs > (but EMRFS does, from what I've heard through the grapevine, but > requires you to be using EMR) > > On 10/25/19 1:49 PM, David Mollitor wrote: > > Hello Team, > > > > One short coming of Apache Accumulo and Apache HBase, as I understand it, > > is that they both rely on the HDFS for replicated WAL management. > > Therefore, HDFS is a requirement even if deploying to a cloud solution. > I > > believe Google has developed a consensus enabled WAL management so that > > three instances can be stood up without any external dependencies (other > > than storage for the collection of rfile/hfile). > > > > Be interested to hear your thoughts on this. > > > > On Fri, Oct 25, 2019 at 1:46 PM Mike Miller <[email protected]> wrote: > > > >> Hi Tushar, > >> > >> The closest thing we have are the performance tests in accumulo-testing, > >> which is probably the best place. > >> https://github.com/apache/accumulo-testing#performance-test > >> The instructions for setting up the scripts are in the README. There > are > >> only a limited number of tests written though and they used to be > >> integration tests that were moved out of the main test package. > >> > >> org.apache.accumulo.testing.performance.tests.DurabilityWriteSpeedPT > >> org.apache.accumulo.testing.performance.tests.YieldingScanExecutorPT > >> org.apache.accumulo.testing.performance.tests.ScanExecutorPT > >> org.apache.accumulo.testing.performance.tests.ScanFewFamiliesPT > >> org.apache.accumulo.testing.performance.tests.ConditionalMutationsPT > >> org.apache.accumulo.testing.performance.tests.RandomCachedLookupsPT > >> > >> On Thu, Oct 24, 2019 at 8:09 PM Tushar Dhadiwal < > [email protected]> > >> wrote: > >> > >>> Hello Everyone, > >>> > >>> > >>> I am a Software Engineer at Microsoft and our team is currently working > >> on > >>> making the deployment and operations of Accumulo on Azure as seamless > as > >>> possible. As part of this effort, we are attempting to observe / > measure > >>> some standard Accumulo operations (e.g. scan, canary queries, ingest, > >> etc.) > >>> and how their performance varies over time on long standing Accumulo > >>> clusters running in Azure. As part of this we’re looking to come up > with > >> a > >>> metric that we can use to evaluate how healthy / available an Accumulo > >>> cluster is. Over time we intend to use this to understand how > underlying > >>> platform changes in Azure can affect overall health of Accumulo > >> workloads. > >>> > >>> > >>> > >>> As a starting metric for example, we are thinking of continually doing > >>> scans of random values across various tablet servers and capturing > timing > >>> information related to how long such scans take. I took a quick look at > >> the > >>> accumulo-testing repo and didn’t find any tests or probes attempting to > >> do > >>> something along these lines. Does something like this seem reasonable? > >> Has > >>> anyone previously attempted something similar? Does accumulo-testing > seem > >>> like a reasonable place for code that attempts to do something like > this? > >>> > >>> > >>> > >>> Appreciate your thoughts and feedback. > >>> > >>> > >>> > >>> Cheers, > >>> > >>> Tushar Dhadiwal > >>> > >> > > >
