Ephemeral storage & EBS are more friendly. Ephemeral storage is generally the fastest and most HDFS-friendly.
On Mon, Apr 25, 2016 at 1:13 PM, Dylan Hutchison <[email protected] > wrote: > Hey Josh, > > Are there other platforms on AWS (or another cloud provider) that > Accumulo/HDFS are friendly to run on? I thought I remembered you and > others running the agitation tests on Amazon instances during > release-testing time. If there are alternatives, what advantages would S3 > have over the current method? > > On Mon, Apr 25, 2016 at 8:09 AM, Josh Elser <[email protected]> wrote: > > > I'm not sure on the guarantees of s3 (much less the s3 or s3a Hadoop > > FileSystem implementations), but, historically, the common issue is > > lacking/incorrect implementations of sync(). For durability (read-as: not > > losing your data), Accumulo *must* know that when it calls sync() on a > > file, the data is persisted. > > > > I don't know definitively what S3 guarantees (or asserts to guarantee), > > but I would be very afraid until I ran some testing (we have one good > test > > in Accumulo that can run for days and verify data integrity called > > continuous ingest). > > > > You might have luck reaching out to the Hadoop community to get some > > understanding from them about what can reasonably be expected with the > > current S3 FileSystem implementations, and then run your own tests to > make > > sure that data is not lost. > > > > > > vdelmeglio wrote: > > > >> Hi everyone, > >> > >> I recently got this answer on stackoverflow (link: > >> > >> > http://stackoverflow.com/questions/36602719/accumulo-cluster-in-aws-with-s3-not-really-stable/36772874#36772874 > >> ): > >> > >> > >> Yes, I would expect that running Accumulo with S3 would result in > >>> problems. Even though S3 has a FileSystem implementation, it does not > >>> behave like a normal file system. Some examples of the differences are > >>> that operations we would expect to be atomic are not atomic in S3, > >>> exceptions may mean different things than we expect, and we assume our > >>> view of files and their metadata is consistent rather than the eventual > >>> consistency S3 provides. > >>> > >>> It's possible these issues could be mitigated if we made some > >>> modifications to the Accumulo code, but as far as I know no one has > tried > >>> running Accumulo on S3 to figure out the problems and whether those > could > >>> be fixed or not. > >>> > >> > >> Since we're currently running an accumulo cluster on aws with s3 for > >> evaluation purpose, this answer make me wonder, should someone explain > me > >> why running accumulo on s3 is not a good idea? in the specific, which > >> operations are expected to be atomic on accumulo? > >> > >> Is there eventually a roadmap for s3 compatibility? > >> > >> Thanks! > >> Valerio > >> > >> > >> > >> -- > >> View this message in context: > >> > http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-tp16737.html > >> Sent from the Developers mailing list archive at Nabble.com. > >> > > >
