Re: Accumulo on s3

William Slacum Mon, 25 Apr 2016 10:41:14 -0700

Ephemeral storage & EBS are more friendly. Ephemeral storage is generally
the fastest and most HDFS-friendly.


On Mon, Apr 25, 2016 at 1:13 PM, Dylan Hutchison <[email protected]
> wrote:

> Hey Josh,
>
> Are there other platforms on AWS (or another cloud provider) that
> Accumulo/HDFS are friendly to run on?  I thought I remembered you and
> others running the agitation tests on Amazon instances during
> release-testing time.  If there are alternatives, what advantages would S3
> have over the current method?
>
> On Mon, Apr 25, 2016 at 8:09 AM, Josh Elser <[email protected]> wrote:
>
> > I'm not sure on the guarantees of s3 (much less the s3 or s3a Hadoop
> > FileSystem implementations), but, historically, the common issue is
> > lacking/incorrect implementations of sync(). For durability (read-as: not
> > losing your data), Accumulo *must* know that when it calls sync() on a
> > file, the data is persisted.
> >
> > I don't know definitively what S3 guarantees (or asserts to guarantee),
> > but I would be very afraid until I ran some testing (we have one good
> test
> > in Accumulo that can run for days and verify data integrity called
> > continuous ingest).
> >
> > You might have luck reaching out to the Hadoop community to get some
> > understanding from them about what can reasonably be expected with the
> > current S3 FileSystem implementations, and then run your own tests to
> make
> > sure that data is not lost.
> >
> >
> > vdelmeglio wrote:
> >
> >> Hi everyone,
> >>
> >> I recently got this answer on stackoverflow (link:
> >>
> >>
> http://stackoverflow.com/questions/36602719/accumulo-cluster-in-aws-with-s3-not-really-stable/36772874#36772874
> >> ):
> >>
> >>
> >>   Yes, I would expect that running Accumulo with S3 would result in
> >>> problems. Even though S3 has a FileSystem implementation, it does not
> >>> behave like a normal file system. Some examples of the differences are
> >>> that operations we would expect to be atomic are not atomic in S3,
> >>> exceptions may mean different things than we expect, and we assume our
> >>> view of files and their metadata is consistent rather than the eventual
> >>> consistency S3 provides.
> >>>
> >>> It's possible these issues could be mitigated if we made some
> >>> modifications to the Accumulo code, but as far as I know no one has
> tried
> >>> running Accumulo on S3 to figure out the problems and whether those
> could
> >>> be fixed or not.
> >>>
> >>
> >> Since we're currently running an accumulo cluster on aws with s3 for
> >> evaluation purpose, this answer make me wonder, should someone explain
> me
> >> why running accumulo on s3 is not a good idea? in the specific, which
> >> operations are expected to be atomic on accumulo?
> >>
> >> Is there eventually a roadmap for s3 compatibility?
> >>
> >> Thanks!
> >> Valerio
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-tp16737.html
> >> Sent from the Developers mailing list archive at Nabble.com.
> >>
> >
>

Re: Accumulo on s3

Reply via email to