Re: Accumulo on s3

Dylan Hutchison Mon, 25 Apr 2016 10:14:27 -0700

Hey Josh,

Are there other platforms on AWS (or another cloud provider) that
Accumulo/HDFS are friendly to run on?  I thought I remembered you and
others running the agitation tests on Amazon instances during
release-testing time.  If there are alternatives, what advantages would S3
have over the current method?


On Mon, Apr 25, 2016 at 8:09 AM, Josh Elser <[email protected]> wrote:

> I'm not sure on the guarantees of s3 (much less the s3 or s3a Hadoop
> FileSystem implementations), but, historically, the common issue is
> lacking/incorrect implementations of sync(). For durability (read-as: not
> losing your data), Accumulo *must* know that when it calls sync() on a
> file, the data is persisted.
>
> I don't know definitively what S3 guarantees (or asserts to guarantee),
> but I would be very afraid until I ran some testing (we have one good test
> in Accumulo that can run for days and verify data integrity called
> continuous ingest).
>
> You might have luck reaching out to the Hadoop community to get some
> understanding from them about what can reasonably be expected with the
> current S3 FileSystem implementations, and then run your own tests to make
> sure that data is not lost.
>
>
> vdelmeglio wrote:
>
>> Hi everyone,
>>
>> I recently got this answer on stackoverflow (link:
>>
>> http://stackoverflow.com/questions/36602719/accumulo-cluster-in-aws-with-s3-not-really-stable/36772874#36772874
>> ):
>>
>>
>>   Yes, I would expect that running Accumulo with S3 would result in
>>> problems. Even though S3 has a FileSystem implementation, it does not
>>> behave like a normal file system. Some examples of the differences are
>>> that operations we would expect to be atomic are not atomic in S3,
>>> exceptions may mean different things than we expect, and we assume our
>>> view of files and their metadata is consistent rather than the eventual
>>> consistency S3 provides.
>>>
>>> It's possible these issues could be mitigated if we made some
>>> modifications to the Accumulo code, but as far as I know no one has tried
>>> running Accumulo on S3 to figure out the problems and whether those could
>>> be fixed or not.
>>>
>>
>> Since we're currently running an accumulo cluster on aws with s3 for
>> evaluation purpose, this answer make me wonder, should someone explain me
>> why running accumulo on s3 is not a good idea? in the specific, which
>> operations are expected to be atomic on accumulo?
>>
>> Is there eventually a roadmap for s3 compatibility?
>>
>> Thanks!
>> Valerio
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-accumulo.1065345.n5.nabble.com/Accumulo-on-s3-tp16737.html
>> Sent from the Developers mailing list archive at Nabble.com.
>>
>

Re: Accumulo on s3

Reply via email to