My last effort and then lets move on, email is not the best venue to discuss design and sometimes whiteboards are necessary but I let me see if I can convey my thoughts :-).
On Tue, Feb 25, 2014 at 11:07 AM, Kanak Biscuitwala <[email protected]> wrote: > Alright, here's where I think we agree so far: > > - Single class for building common rebalancer configs > - Some form of factory for getting a HelixConnection > - Keeping building constructors empty, but validating on build > - Having an admin scoped to a single cluster is probably cleaner > > Here's what we still need to discuss: > > - The complexity of the helix connection factory: > > I'm not sure what benefit to the user (or developer) this buys us. I mean > theoretically we could reuse open HelixConnections, but I don't think that's > what you're suggesting. Right now, it seems like more code for us and the > user to do the same thing. > [Sandeep] The connection factory buys us the ability to keep a consistent API when someone writes a replacement to store configuration state in something other than Zookeeper. Sure, we can keep the API to the user simple but the underlying complexity buys us flexibility, we could say replacing Zookeeper or for that matter providing an abstraction to zookeeper is not priority now so lets not worry about that for now which is fine. But if we are going to provide alternatives to zookeeper then that abstraction of spi and connection factories or some variant of it is necessary. > - How to let applications gracefully shut down: > > We provide FINALIZE callbacks to users so they can do what is necessary > before we shut down. [Sandeep] Sounds good > > - The necessity of a multi-cluster admin: > > There's only one method that we need to support that is beyond the scope of a > single cluster: listClusters(). The rest can and should be scoped to a single > cluster; it's just a question of where we stick this one method. > > - The role of an admin in the cluster, and whether or not it should be the > source of all APIs or if the connection should be: [Sandeep] Just to clarify I was not asking that we have all APIs on admin. I probably was not clear so my apologies on the confusion, what I was pointing to is why expose a connection which can be hidden under the HelixAdmin by passing a string url to the HelixAdmin. The HelixAdmin can then create/manage connections underneath. One less object exposed to the end user. > > Just so we're on the same page, let me briefly describe how administration > typically works in systems that use Helix: > > 1. There are tools that change the cluster state that will exclusively work > through an administrator. [Sandeep] Seems like there is a separation of API right here because there is exclusive ability for an admin > 2. Participants and spectators don't change the cluster state most of the > time, but at some point may need to save a config or store a property, and > thus need administration support. [Sandeep] Sure, any participant or spectator should be able to save state. That location should be separated from where cluster state is stored, the storage and who gets access to what should be directed from the access pattern from the API which is why I think there is a separation. > > Generally Helix systems are driven through participant state transition > callbacks and are generally low-touch with respect to configurations over > time. I would recommend looking at the other wiki pages first for some > context. So I disagree that everything should hang off of HelixAdmin. [Sandeep] Surely I am lacking context and will read through the other wiki pages. I just don't think Helix needs to expose another object to the user, managing connections should not be the responsibility of the client if we can do away with that object being surfaced to the client its one less object to manage through an API change process. In the end you guys are more savvy on Helix than I am so it will be your call anyway, I am going to push you towards trying to reduce the API surface area so if it can be done then great if not then I would imagine we will have a justification for why not. > > By the way, being able to share a connection is something that was actually > requested by a user. There are legitimate reasons for wanting a single live > connection across multiple roles on a single node. [Sandeep] Sure I am not denying that there cant be legitimate reasons all I am suggesting is hide that away from the user. The user will only ask for the shared connection if he knows about the connection. Too many touch points on an API comes to mind when we are exposing too many objects. Also if we want to share connections I would imagine we can create APIs which allow creating objects with a shared connection. > > - Whether or not multiple admins attached to a single connection should be > legal: > > The administrator is passive. It doesn't run things in the background and > absolutely has nothing to clean up. [Sandeep] I was only reacting to the shutdown API the admin instance can invoke and asking what happens if connections are shared in that case. Seemed like a legitimate question to ask. > > Kanak > > ---------------------------------------- >> Date: Tue, 25 Feb 2014 08:24:30 -0800 >> Subject: Re: API Review: Helix Admin - Feedback >> From: [email protected] >> To: [email protected] >> >> Actually coming to think of it I am starting to think why even open up >> the HelixConnection? Why cant someone create a HelixAdmin instance >> directly and we manage the connections internally. The lesser the >> objects exposed through API the better. >> >> Thanks, >> >> Sandeep >> >> On Tue, Feb 25, 2014 at 12:27 AM, Sandeep Nayak <[email protected]> wrote: >>> [Sandeep] See responses inline. >>> >>> On Mon, Feb 24, 2014 at 11:05 PM, Kanak Biscuitwala <[email protected]> >>> wrote: >>>> My responses are inline. Others please feel free to chime in. >>>> >>>> General comments: >>>> >>>> - I think HelixDataAccessor serves as a good basis for an SPI (which I >>>> have largely ignored in the wiki pages I posted). It exposes leaf nodes as >>>> HelixProperty instances and is entirely decoupled from ZooKeeper. >>>> - Things I want to avoid: class/interface/code-size explosion and >>>> overabstraction. I'm still not sure what the right balance is. >>> >>>> >>>> ---------------------------------------- >>>>> Date: Mon, 24 Feb 2014 22:02:29 -0800 >>>>> Subject: API Review: Helix Admin - Feedback >>>>> From: [email protected] >>>>> To: [email protected] >>>>> >>>>> Hey guys, >>>>> >>>>> Here is my first set of feedback, thought I should start this in >>>>> chunks rather than all in one go, so picking 3 items at a time. >>>>> >>>>> Connection Management >>>>> >>>>> * HelixConnection seems like it belongs to API >>>> >>>> Could you clarify what you mean by "belongs to API?" >>> >>> [Sandeep] So my notion of API is all interfaces/contracts which a >>> Helix user can use safely. API is our contract to Helix users, as long >>> as they stick to the API we make sure the contract is retained. In >>> that regard ZKHelixAdmin should not be in API, it should belong >>> somewhere in a core component. API is what we can hand over to a Helix >>> user and say here, adhere to this and these contracts will be >>> respected in future versions, we will only remove them through a >>> deprecation process and you will get enough time to migrate. >>> >>>> >>>>> * ZKHelixConnection is a specialization specific to Zookeeper >>>>> * We need an factory pattern which allows creating Connections based >>>>> on a provider-type. A provider type could be an enum which has >>>>> Zookeeper to begin with and we can add other types later. The factory >>>>> should be a part of SPI, one implementation of SPI is Zookeeper. >>>> >>>> Do you mean: >>>> >>>> HelixConnection connection = >>>> HelixConnectionFactory.getConnection(ConnectionType.ZOOKEEPER, zkAddress); >>>> >>>> Can you write example code for this? >>> >>> [Sandeep] This is not API so not visible to the end user, this is >>> internal to Helix core. >>> >>> HelixConnectionFactory becomes an interface which connection providers >>> need to implement. One provider is zookeeper. >>> >>> enum HelixStoreProvider{ //Did not figure a better name, but since we >>> store data in zookeeper thought of it as store >>> ZOOKEEPER, >>> >>> IN_MEMORY; //Just a sample not that someone will write one, but >>> demonstrates multiplicity of providers >>> } >>> >>> interface HelixConnectionFactory{ >>> getConnection(String address); >>> >>> String getName(); //you can make the return type an enum if we >>> control all providers added >>> OR >>> HelixStoreProvider getProvider(); //if you use an enum then it >>> would be as such >>> >>> } >>> >>> ZookeeperConnectionFactory implements HelixConnectionFactory{ >>> getConnection(String address){ >>> >>> } >>> >>> String getName(){ >>> return "zookeeper"; >>> } >>> OR >>> >>> HelixStoreProvider getProvider(){ >>> return HelixStoreProvider.ZOOKEEPER; >>> } >>> } >>> >>> HelixConnectionFactoryRegistry contains all registered connection >>> factories, its a map retained >>> Map<String, HelixConnectionFactory> connectionFactories or >>> Map<HelixStoreProvider, HelixConnectionFactory> >>> >>> We provider a mechanism for connection factories to be registered in >>> the registries, a common model is like the java service discovery >>> model. The registry would load all configured connection factories we >>> place as a comma class names in >>> META-INF/services/org.apache.helix.spi.HelixConnectionfactory. The >>> file contains org.apache.helix.spi.impl.ZookeeperHelixConnectionFactory >>> >>> >>> So in the call would look like such. >>> >>> new >>> HelixConnectionBuilder.usingProvider(HelixStoreProvider.ZOOKEEPER).toAddress(zkAddress).build(); >>> >>>> >>>>> * Should the zkAddress be a String? Should it be a special type like >>>>> HelixProviderURL? Having something like String requires us to document >>>>> it clearly but having a a specialization like a HelixProviderURL >>>>> allows a guided building of such URL. Not sure if its an overkill but >>>>> If we use a provider URL we should allow creating them through a >>>>> builder pattern. >>>> >>>> I would say this is overkill. Each layer of abstraction is more code for >>>> the users. Most deployments know where ZK is and it's provided to them as >>>> a string. >>> >>> [Sandeep] Yes, I thought so as much but was not sure if we need to >>> safeguard against invalid urls, seems like its well understood >>> pattern. >>>> >>>>> * We need to declare exceptions when connect fails >>>> >>>> I've ignored failure cases entirely for now. >>> >>> [Sandeep] Lets make sure we dont lose track of exceptions, when we get >>> to some level of completion with the API lets iterate through >>> exceptions. >>> >>>> >>>>> * We need to allow for a time delay for the shutdown, something like a >>>>> future.get(timeout) >>>>> >>>> >>>> I'm not super experienced with this, so could you give an example of why >>>> this is preferred to alternative methods of preemption? >>> >>> [Sandeep] The reason for this is allow for a shutdown API which does >>> not shutdown immediately but allows the users to trigger shutdown with >>> a grace period. The delay just allows them to ask helix for a grace >>> period incase they need to do some cleanup on their side. >>> >>> The flip side we provide interceptors so they can inject logic before >>> shutdown. >>> >>>> >>>>> >>>>> Get an Administrator >>>>> >>>>> * HelixAdministrator seems too generic, it should be >>>>> HelixClusterAdministrator given that its an admin for that cluster? >>>>> * We should have a HelixAdministrator which can be an administrator to >>>>> all clusters. How should we retrieve such an administrator? >>>> >>>> I would rather have one administrator that manages all clusters than two >>>> administrator classes. >>> >>> [Sandeep] I think we should simplify and say 1 admin manages 1 >>> cluster, if we say 1 admin manages multiple clusters we then need to >>> identify the cluster in each API when the user adds participant, >>> shutsdown cluster etc. If an admin wants to manage multiple clusters >>> then we should allow disconnect from one cluster and connect to >>> another and manage that one. It just keeps the context clear, if we do >>> 1 to many it just seems confusing. >>> >>>> >>>>> * Having to get a connection first and then getting admin object seems >>>>> inverted. Why not allow >>>>> HelixAdmin.create(HelixProviderURL).getConnection(). It should return >>>>> a specialization of the HelixConnection i.e. HelixAdminConnection. We >>>>> could then hang specific privileges on the HelixAdminConnection and >>>>> not have them on the HelixConnection. >>>> >>>> It's in this direction so that user can share one connection while >>>> managing different roles. For instance, a single node may be a participant >>>> and administrator for one cluster, and a spectator for another. >>> >>> [Sandeep] Look at it like this, should a participant have all APIs on >>> a connection that an admin has? But let me take a step back from your >>> examples it seems like a connection can only be used to get an admin >>> and everything else hangs off the admin, could there be other uses of >>> a connection or is it only a means to get an admin handle? >>> >>>> >>>>> * What happens if an administrator is already created? Should we allow >>>>> two administrators? >>>> >>>> I think this is acceptable. I would be more hesitant if a new connection >>>> was opened for each administrator, but if they're sharing the same >>>> connection, I don't think it's a big deal. >>> >>> [Sandeep] So what happens if I do this >>> T1 -> Admin 1 using Connection 1 on Thread 1 >>> T2 -> Admin 2 using Connection 1 on Thread 2 >>> T3 -> Admin 2 closes Connection 1 on Thread 2 >>> T3 -> Admin 1 tries to use Connection 1 on Thread 1 >>> >>> We would need to track all admin objects and invalidate them when a >>> connection is discarded. Its doable not overly difficult but we are >>> undertaking complexity here. >>> >>>> >>>>> >>>>> Resource Create >>>>> >>>>> * ResourceConfig seems like API >>>>> * Should we go all the way ResourceConfiguration instead of >>>>> ResourceConfig? >>>> >>>> It means more typing, but I'm fine either way. >>>> >>>>> * ResourceConfigBuilder should have >>>>> ResourceConfig.Builder().withId("string") and the remaining should >>>>> follow like in the example >>>> >>>> Sure. We'll need to validate on build() that an id was provided. >>>> >>>>> * RebalancerConfig seems like API >>>>> * Can we figure RebalancerConfig with types? So we dont have three >>>>> versions we leverage a type to generate/assemble a builder? >>>> >>>> I don't follow, can you give an example? >>>> >>>> We want a RebalancerConfig to be an arbitrary string-serializable class. >>>> We can provide provide three special RebalancerConfigs for the three >>>> built-in rebalancing modes. These (and their builders) all derive from >>>> PartitionedRebalancerConfig. >>> >>> [Sandeep] So I was thinking something like >>> RebalancerConfig.Builder(resourceId).usingStrategy(RebalancerStrategy.FullAuto).... >>> >>> RebalancerStrategy then becomes an enum which internally allows the >>> builder to add a pre-baked recipe for FullAuto strategy with defaults, >>> the defaults can be overridden by the user ofcourse if necessary. But >>> imagine how simple it would be for a majority of new users who could >>> do with baked in recipes while they onboard. >>> >>>> >>>>> * How do we determine if the resource is already added? That should >>>>> dictate the exception. >>>> >>>> A resource is already added if its IdealState is already added. >>>> >>>>> >>>>> I think I will pause and wait for some responses/feedback on how this >>>>> medium of feedback feels before I move forward. >>>>> >>>>> Thanks, >>>>> >>>>> Sandeep >>>> >
