Re: Sling Discovery implementation on AWS S3

Stefan Egli Mon, 12 May 2014 08:05:37 -0700

I'd agree with Chetan and Ian here in that S3 sounds feasible for
persisting properties in a topology (rather than relying on the
non-persistent nature of discovery-properties). The implementation of
discovery itself I see as a separate discussion.


Cheers,
Stefan

On 5/12/14 1:00 PM, "Chetan Mehrotra" <[email protected]> wrote:

>If the usecase is only for Discovery it might be simpler to run Apache
>Zookeeper [1]  and use Apache Curator [2] as noted in SLING-2939.
>While running on AWS one can possibly use Netflix Exhibitor [3] which
>manages the Zookeeper instances and backup there state in S3.
>
>The benefit of this approach is that Zookeeper abstract out all the
>complexities of leader election (which is hard!) and can also be used
>in on prem installation if required
>
>Chetan Mehrotra
>[1] http://zookeeper.apache.org/
>[2] http://curator.apache.org/
>[3] https://github.com/Netflix/exhibitor
>
>On Mon, May 12, 2014 at 2:54 PM, Timothée Maret
><[email protected]> wrote:
>> Hi,
>>
>> 2014-05-12 9:02 GMT+01:00 Ian Boston <[email protected]>:
>>
>>> Hi,
>>> +1 for distribution of properties via S3, makes perfect sense. Perhaps
>>> abstracting behind an API so that any low latency globally distributed
>>> storage provider could be used.
>>>
>>>
>> Yes, discussing this offline with Felix, an alternative could be to
>> implement a ResourceProvider for S3.
>> S3 is really low level (key-value pair) with objects being binaries +
>> metadata.
>> We could implement the path structure based on the "prefix" property in
>>[3]
>> and stick to storing binaries only so that other S3 consumers can access
>> the data directly (without using a Sling API).
>>
>>
>>> Not sure about discovery. Although [0] described the AWS VM, it
>>> doesn't, without further validation describe if the Sling instance is
>>> running and available. Its perfectly possible for the VM to be in a
>>> running state, with no viable Sling instance running. I dont think
>>> that hard to achieve but it needs to be done to support the discovery
>>> use case.
>>>
>>
>> Exactly, ootb, the AWS API has no concept of Sling instance and we
>>should
>> implement it.
>> According to [2] we could *not* leverage instance metadata since they
>>can't
>> be modified at runtime.
>> Thus, we would need to have The Sling instances publish their state in
>>S3.
>>
>>
>>> I think we are talking about instances running on independent
>>> repositories here, since if all instances share the same repository
>>> (ie are a Jackrabbit cluster), then the repository already has a
>>> mechanism of communicating running instances via the repository.
>>>
>>
>> +1
>>
>>
>>>
>>> Best Regards
>>> Ian
>>>
>>> On 12 May 2014 07:06, Carsten Ziegeler <[email protected]> wrote:
>>> > Hi Timotheé,
>>> >
>>> > yes I think this is valuable - the idea of the discovery API is to
>>> abstract
>>> > the discovery and if we can benefit in certain scenarios from already
>>> > available mechanism/information I think it makes totally sense to use
>>> that
>>> > instead of adding the same on top of it.
>>> >
>>> > Right now, the topology is formed of clusters containing instances -
>>> where
>>> > all instances in a cluster share the same repository, but instances
>>>in
>>> > different clusters use a different one. Is this kind of topology
>>>somehow
>>> > possible by using the AWS API? Or would all instances end up in a
>>>single
>>> > cluster?
>>> >
>>> > Regards
>>> > Carsten
>>> >
>>> >
>>> > 2014-05-11 18:54 GMT+02:00 Timothée Maret <[email protected]>:
>>> >
>>> >> Hi,
>>> >>
>>> >> I would like to discuss a potential implementation of the Sling
>>> Discovery
>>> >> APIs over an eventually consistent distributed storages such as AWS
>>>S3.
>>> >> Assuming the instances being part of the topology runs in AWS, then
>>>we
>>> >> could leverage AWS APIs and service in order to implement the
>>>Discovery
>>> >> mechanism.
>>> >>
>>> >> The discovery of instances could be implemented implicitely using
>>>EC2
>>> REST
>>> >> API [0] without sending heartbeats, the properties for each instance
>>> could
>>> >> be stored in AWS S3 and distributed eventually, the leader election
>>> could
>>> >> be implemented with [1] or similar.
>>> >>
>>> >> The benefits (over Sling impl) would be
>>> >> * Arguably the highest availablity we can get from the environment
>>> >> * Reduced bandwith consumption (no hearthbeats)
>>> >> * Environment specific informations is implicitely distributed
>>>(local
>>> ip,
>>> >> external ip, hostname, region, etc.)
>>> >>
>>> >> Of course, it would bind the implementation to an environment (AWS
>>>in
>>> this
>>> >> case), however I believe we could apply the same mechanism to other
>>> >> eventually consistent storage.
>>> >>
>>> >> Wdyt ? Is this something that would be valuable for Sling ?
>>> >>
>>> >> Regards,
>>> >>
>>> >> Timothee
>>> >>
>>> >> [0]
>>> >>
>>> >>
>>> 
>>>http://docs.aws.amazon.com/AWSEC2/latest/APIReference/ApiReference-query
>>>-DescribeInstances.html
>>> >> [1] http://gsyc.es/~anto/papers/2007-dsn.pdf
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Carsten Ziegeler
>>> > [email protected]
>>>
>>
>> Regards
>>
>> Timothee
>>
>> [2]
>> 
>>http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AESDG-chapter-instance
>>data.html
>> [3] http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

Re: Sling Discovery implementation on AWS S3

Reply via email to