I'd agree with Chetan and Ian here in that S3 sounds feasible for persisting properties in a topology (rather than relying on the non-persistent nature of discovery-properties). The implementation of discovery itself I see as a separate discussion.
Cheers, Stefan On 5/12/14 1:00 PM, "Chetan Mehrotra" <[email protected]> wrote: >If the usecase is only for Discovery it might be simpler to run Apache >Zookeeper [1] and use Apache Curator [2] as noted in SLING-2939. >While running on AWS one can possibly use Netflix Exhibitor [3] which >manages the Zookeeper instances and backup there state in S3. > >The benefit of this approach is that Zookeeper abstract out all the >complexities of leader election (which is hard!) and can also be used >in on prem installation if required > >Chetan Mehrotra >[1] http://zookeeper.apache.org/ >[2] http://curator.apache.org/ >[3] https://github.com/Netflix/exhibitor > >On Mon, May 12, 2014 at 2:54 PM, Timothée Maret ><[email protected]> wrote: >> Hi, >> >> 2014-05-12 9:02 GMT+01:00 Ian Boston <[email protected]>: >> >>> Hi, >>> +1 for distribution of properties via S3, makes perfect sense. Perhaps >>> abstracting behind an API so that any low latency globally distributed >>> storage provider could be used. >>> >>> >> Yes, discussing this offline with Felix, an alternative could be to >> implement a ResourceProvider for S3. >> S3 is really low level (key-value pair) with objects being binaries + >> metadata. >> We could implement the path structure based on the "prefix" property in >>[3] >> and stick to storing binaries only so that other S3 consumers can access >> the data directly (without using a Sling API). >> >> >>> Not sure about discovery. Although [0] described the AWS VM, it >>> doesn't, without further validation describe if the Sling instance is >>> running and available. Its perfectly possible for the VM to be in a >>> running state, with no viable Sling instance running. I dont think >>> that hard to achieve but it needs to be done to support the discovery >>> use case. >>> >> >> Exactly, ootb, the AWS API has no concept of Sling instance and we >>should >> implement it. >> According to [2] we could *not* leverage instance metadata since they >>can't >> be modified at runtime. >> Thus, we would need to have The Sling instances publish their state in >>S3. >> >> >>> I think we are talking about instances running on independent >>> repositories here, since if all instances share the same repository >>> (ie are a Jackrabbit cluster), then the repository already has a >>> mechanism of communicating running instances via the repository. >>> >> >> +1 >> >> >>> >>> Best Regards >>> Ian >>> >>> On 12 May 2014 07:06, Carsten Ziegeler <[email protected]> wrote: >>> > Hi Timotheé, >>> > >>> > yes I think this is valuable - the idea of the discovery API is to >>> abstract >>> > the discovery and if we can benefit in certain scenarios from already >>> > available mechanism/information I think it makes totally sense to use >>> that >>> > instead of adding the same on top of it. >>> > >>> > Right now, the topology is formed of clusters containing instances - >>> where >>> > all instances in a cluster share the same repository, but instances >>>in >>> > different clusters use a different one. Is this kind of topology >>>somehow >>> > possible by using the AWS API? Or would all instances end up in a >>>single >>> > cluster? >>> > >>> > Regards >>> > Carsten >>> > >>> > >>> > 2014-05-11 18:54 GMT+02:00 Timothée Maret <[email protected]>: >>> > >>> >> Hi, >>> >> >>> >> I would like to discuss a potential implementation of the Sling >>> Discovery >>> >> APIs over an eventually consistent distributed storages such as AWS >>>S3. >>> >> Assuming the instances being part of the topology runs in AWS, then >>>we >>> >> could leverage AWS APIs and service in order to implement the >>>Discovery >>> >> mechanism. >>> >> >>> >> The discovery of instances could be implemented implicitely using >>>EC2 >>> REST >>> >> API [0] without sending heartbeats, the properties for each instance >>> could >>> >> be stored in AWS S3 and distributed eventually, the leader election >>> could >>> >> be implemented with [1] or similar. >>> >> >>> >> The benefits (over Sling impl) would be >>> >> * Arguably the highest availablity we can get from the environment >>> >> * Reduced bandwith consumption (no hearthbeats) >>> >> * Environment specific informations is implicitely distributed >>>(local >>> ip, >>> >> external ip, hostname, region, etc.) >>> >> >>> >> Of course, it would bind the implementation to an environment (AWS >>>in >>> this >>> >> case), however I believe we could apply the same mechanism to other >>> >> eventually consistent storage. >>> >> >>> >> Wdyt ? Is this something that would be valuable for Sling ? >>> >> >>> >> Regards, >>> >> >>> >> Timothee >>> >> >>> >> [0] >>> >> >>> >> >>> >>>http://docs.aws.amazon.com/AWSEC2/latest/APIReference/ApiReference-query >>>-DescribeInstances.html >>> >> [1] http://gsyc.es/~anto/papers/2007-dsn.pdf >>> >> >>> > >>> > >>> > >>> > -- >>> > Carsten Ziegeler >>> > [email protected] >>> >> >> Regards >> >> Timothee >> >> [2] >> >>http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AESDG-chapter-instance >>data.html >> [3] http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html
