Actually coming to think of it I am starting to think why even open up the HelixConnection? Why cant someone create a HelixAdmin instance directly and we manage the connections internally. The lesser the objects exposed through API the better.
Thanks, Sandeep On Tue, Feb 25, 2014 at 12:27 AM, Sandeep Nayak <[email protected]> wrote: > [Sandeep] See responses inline. > > On Mon, Feb 24, 2014 at 11:05 PM, Kanak Biscuitwala <[email protected]> > wrote: >> My responses are inline. Others please feel free to chime in. >> >> General comments: >> >> - I think HelixDataAccessor serves as a good basis for an SPI (which I have >> largely ignored in the wiki pages I posted). It exposes leaf nodes as >> HelixProperty instances and is entirely decoupled from ZooKeeper. >> - Things I want to avoid: class/interface/code-size explosion and >> overabstraction. I'm still not sure what the right balance is. > >> >> ---------------------------------------- >>> Date: Mon, 24 Feb 2014 22:02:29 -0800 >>> Subject: API Review: Helix Admin - Feedback >>> From: [email protected] >>> To: [email protected] >>> >>> Hey guys, >>> >>> Here is my first set of feedback, thought I should start this in >>> chunks rather than all in one go, so picking 3 items at a time. >>> >>> Connection Management >>> >>> * HelixConnection seems like it belongs to API >> >> Could you clarify what you mean by "belongs to API?" > > [Sandeep] So my notion of API is all interfaces/contracts which a > Helix user can use safely. API is our contract to Helix users, as long > as they stick to the API we make sure the contract is retained. In > that regard ZKHelixAdmin should not be in API, it should belong > somewhere in a core component. API is what we can hand over to a Helix > user and say here, adhere to this and these contracts will be > respected in future versions, we will only remove them through a > deprecation process and you will get enough time to migrate. > >> >>> * ZKHelixConnection is a specialization specific to Zookeeper >>> * We need an factory pattern which allows creating Connections based >>> on a provider-type. A provider type could be an enum which has >>> Zookeeper to begin with and we can add other types later. The factory >>> should be a part of SPI, one implementation of SPI is Zookeeper. >> >> Do you mean: >> >> HelixConnection connection = >> HelixConnectionFactory.getConnection(ConnectionType.ZOOKEEPER, zkAddress); >> >> Can you write example code for this? > > [Sandeep] This is not API so not visible to the end user, this is > internal to Helix core. > > HelixConnectionFactory becomes an interface which connection providers > need to implement. One provider is zookeeper. > > enum HelixStoreProvider{ //Did not figure a better name, but since we > store data in zookeeper thought of it as store > ZOOKEEPER, > > IN_MEMORY; //Just a sample not that someone will write one, but > demonstrates multiplicity of providers > } > > interface HelixConnectionFactory{ > getConnection(String address); > > String getName(); //you can make the return type an enum if we > control all providers added > OR > HelixStoreProvider getProvider(); //if you use an enum then it > would be as such > > } > > ZookeeperConnectionFactory implements HelixConnectionFactory{ > getConnection(String address){ > > } > > String getName(){ > return "zookeeper"; > } > OR > > HelixStoreProvider getProvider(){ > return HelixStoreProvider.ZOOKEEPER; > } > } > > HelixConnectionFactoryRegistry contains all registered connection > factories, its a map retained > Map<String, HelixConnectionFactory> connectionFactories or > Map<HelixStoreProvider, HelixConnectionFactory> > > We provider a mechanism for connection factories to be registered in > the registries, a common model is like the java service discovery > model. The registry would load all configured connection factories we > place as a comma class names in > META-INF/services/org.apache.helix.spi.HelixConnectionfactory. The > file contains org.apache.helix.spi.impl.ZookeeperHelixConnectionFactory > > > So in the call would look like such. > > new > HelixConnectionBuilder.usingProvider(HelixStoreProvider.ZOOKEEPER).toAddress(zkAddress).build(); > >> >>> * Should the zkAddress be a String? Should it be a special type like >>> HelixProviderURL? Having something like String requires us to document >>> it clearly but having a a specialization like a HelixProviderURL >>> allows a guided building of such URL. Not sure if its an overkill but >>> If we use a provider URL we should allow creating them through a >>> builder pattern. >> >> I would say this is overkill. Each layer of abstraction is more code for the >> users. Most deployments know where ZK is and it's provided to them as a >> string. > > [Sandeep] Yes, I thought so as much but was not sure if we need to > safeguard against invalid urls, seems like its well understood > pattern. >> >>> * We need to declare exceptions when connect fails >> >> I've ignored failure cases entirely for now. > > [Sandeep] Lets make sure we dont lose track of exceptions, when we get > to some level of completion with the API lets iterate through > exceptions. > >> >>> * We need to allow for a time delay for the shutdown, something like a >>> future.get(timeout) >>> >> >> I'm not super experienced with this, so could you give an example of why >> this is preferred to alternative methods of preemption? > > [Sandeep] The reason for this is allow for a shutdown API which does > not shutdown immediately but allows the users to trigger shutdown with > a grace period. The delay just allows them to ask helix for a grace > period incase they need to do some cleanup on their side. > > The flip side we provide interceptors so they can inject logic before > shutdown. > >> >>> >>> Get an Administrator >>> >>> * HelixAdministrator seems too generic, it should be >>> HelixClusterAdministrator given that its an admin for that cluster? >>> * We should have a HelixAdministrator which can be an administrator to >>> all clusters. How should we retrieve such an administrator? >> >> I would rather have one administrator that manages all clusters than two >> administrator classes. > > [Sandeep] I think we should simplify and say 1 admin manages 1 > cluster, if we say 1 admin manages multiple clusters we then need to > identify the cluster in each API when the user adds participant, > shutsdown cluster etc. If an admin wants to manage multiple clusters > then we should allow disconnect from one cluster and connect to > another and manage that one. It just keeps the context clear, if we do > 1 to many it just seems confusing. > >> >>> * Having to get a connection first and then getting admin object seems >>> inverted. Why not allow >>> HelixAdmin.create(HelixProviderURL).getConnection(). It should return >>> a specialization of the HelixConnection i.e. HelixAdminConnection. We >>> could then hang specific privileges on the HelixAdminConnection and >>> not have them on the HelixConnection. >> >> It's in this direction so that user can share one connection while managing >> different roles. For instance, a single node may be a participant and >> administrator for one cluster, and a spectator for another. > > [Sandeep] Look at it like this, should a participant have all APIs on > a connection that an admin has? But let me take a step back from your > examples it seems like a connection can only be used to get an admin > and everything else hangs off the admin, could there be other uses of > a connection or is it only a means to get an admin handle? > >> >>> * What happens if an administrator is already created? Should we allow >>> two administrators? >> >> I think this is acceptable. I would be more hesitant if a new connection was >> opened for each administrator, but if they're sharing the same connection, I >> don't think it's a big deal. > > [Sandeep] So what happens if I do this > T1 -> Admin 1 using Connection 1 on Thread 1 > T2 -> Admin 2 using Connection 1 on Thread 2 > T3 -> Admin 2 closes Connection 1 on Thread 2 > T3 -> Admin 1 tries to use Connection 1 on Thread 1 > > We would need to track all admin objects and invalidate them when a > connection is discarded. Its doable not overly difficult but we are > undertaking complexity here. > >> >>> >>> Resource Create >>> >>> * ResourceConfig seems like API >>> * Should we go all the way ResourceConfiguration instead of ResourceConfig? >> >> It means more typing, but I'm fine either way. >> >>> * ResourceConfigBuilder should have >>> ResourceConfig.Builder().withId("string") and the remaining should >>> follow like in the example >> >> Sure. We'll need to validate on build() that an id was provided. >> >>> * RebalancerConfig seems like API >>> * Can we figure RebalancerConfig with types? So we dont have three >>> versions we leverage a type to generate/assemble a builder? >> >> I don't follow, can you give an example? >> >> We want a RebalancerConfig to be an arbitrary string-serializable class. We >> can provide provide three special RebalancerConfigs for the three built-in >> rebalancing modes. These (and their builders) all derive from >> PartitionedRebalancerConfig. > > [Sandeep] So I was thinking something like > RebalancerConfig.Builder(resourceId).usingStrategy(RebalancerStrategy.FullAuto).... > > RebalancerStrategy then becomes an enum which internally allows the > builder to add a pre-baked recipe for FullAuto strategy with defaults, > the defaults can be overridden by the user ofcourse if necessary. But > imagine how simple it would be for a majority of new users who could > do with baked in recipes while they onboard. > >> >>> * How do we determine if the resource is already added? That should >>> dictate the exception. >> >> A resource is already added if its IdealState is already added. >> >>> >>> I think I will pause and wait for some responses/feedback on how this >>> medium of feedback feels before I move forward. >>> >>> Thanks, >>> >>> Sandeep >>
