Apologies for the late response, got tied down with priority items at work. About the plan it sounds good, I will join IRC later in the evening and we can do Hangout too. See you guys there.
On Tue, Feb 25, 2014 at 2:41 PM, kishore g <[email protected]> wrote: > Great discussion, I see we are converging on some items. We can move this > discussion to JIRA but i dont think that will be of much help. There is an > other option. > > We setup a time to discuss these, much easier to explain the pros and cons. > -- IRC > -- Hangout session. > > I am fine with either option. After that we can send the summary to the > group. What do you think. > > On the connectionFactory, I agree with what Sandeep wrote with the registry > pattern. However, I am little worried about loading things from classpath > automatically, I am fine with us loading Zookeeper impl from within the > code (its ugly) but i prefer ability to debug the code and helps in knowing > the usage of a particular method. Sandeep, if I understand correctly you > are saying the HelixStoreProvider is a dynamic builder that can be > customized per SPI. I am not sure if we need getConnection(Address) in > ConnectionFactory. we need something like getConnection(ConnectionOptions), > lets say we add database as a backend, then we might need additional > options to create the connection. We can work on the details when we start > writing the api's. At a high I agree with the builder pattern + service > registry. > > Regarding re-use of connections, we absolutely don't want connection to be > re-used across participants. That will create more issues. I think the > problem with existing api is that creating a manager requires cluster name > and we did not really have a connection object. The re-use is a pretty rare > case and its a nice to have feature but its ok to if its hard to re-use at > the cost of simplifying the most common use cases. > > On the participant,spectator, administrator topic, I have ended up using > the admin module for most of the operations. So it might be a good idea to > have administrator as the single point for all CRUD operations. The only > pain point will be passing in cluster name for every operations even after > being connected as the participant to the cluster. > > Lets have a discussion on IRC/Hangout and start working on a branch. > Sandeep, we should be able to give you permission to commit on a branch. I > think once we start writing the API we get more clarity. > > thanks > Kishore G > > > > On Tue, Feb 25, 2014 at 11:56 AM, Sandeep Nayak <[email protected]> wrote: > >> My last effort and then lets move on, email is not the best venue to >> discuss design and sometimes whiteboards are necessary but I let me >> see if I can convey my thoughts :-). >> >> On Tue, Feb 25, 2014 at 11:07 AM, Kanak Biscuitwala <[email protected]> >> wrote: >> > Alright, here's where I think we agree so far: >> > >> > - Single class for building common rebalancer configs >> > - Some form of factory for getting a HelixConnection >> > - Keeping building constructors empty, but validating on build >> > - Having an admin scoped to a single cluster is probably cleaner >> > >> > Here's what we still need to discuss: >> > >> > - The complexity of the helix connection factory: >> > >> > I'm not sure what benefit to the user (or developer) this buys us. I >> mean theoretically we could reuse open HelixConnections, but I don't think >> that's what you're suggesting. Right now, it seems like more code for us >> and the user to do the same thing. >> > >> >> [Sandeep] The connection factory buys us the ability to keep a >> consistent API when someone writes a replacement to store >> configuration state in something other than Zookeeper. Sure, we can >> keep the API to the user simple but the underlying complexity buys us >> flexibility, we could say replacing Zookeeper or for that matter >> providing an abstraction to zookeeper is not priority now so lets not >> worry about that for now which is fine. But if we are going to provide >> alternatives to zookeeper then that abstraction of spi and connection >> factories or some variant of it is necessary. >> >> > - How to let applications gracefully shut down: >> > >> > We provide FINALIZE callbacks to users so they can do what is necessary >> before we shut down. >> >> [Sandeep] Sounds good >> >> > >> > - The necessity of a multi-cluster admin: >> > >> > There's only one method that we need to support that is beyond the scope >> of a single cluster: listClusters(). The rest can and should be scoped to a >> single cluster; it's just a question of where we stick this one method. >> > >> > - The role of an admin in the cluster, and whether or not it should be >> the source of all APIs or if the connection should be: >> >> [Sandeep] Just to clarify I was not asking that we have all APIs on >> admin. I probably was not clear so my apologies on the confusion, what >> I was pointing to is why expose a connection which can be hidden under >> the HelixAdmin by passing a string url to the HelixAdmin. The >> HelixAdmin can then create/manage connections underneath. One less >> object exposed to the end user. >> > >> > Just so we're on the same page, let me briefly describe how >> administration typically works in systems that use Helix: >> > >> > 1. There are tools that change the cluster state that will exclusively >> work through an administrator. >> >> [Sandeep] Seems like there is a separation of API right here because >> there is exclusive ability for an admin >> >> > 2. Participants and spectators don't change the cluster state most of >> the time, but at some point may need to save a config or store a property, >> and thus need administration support. >> >> [Sandeep] Sure, any participant or spectator should be able to save >> state. That location should be separated from where cluster state is >> stored, the storage and who gets access to what should be directed >> from the access pattern from the API which is why I think there is a >> separation. >> >> > >> > Generally Helix systems are driven through participant state transition >> callbacks and are generally low-touch with respect to configurations over >> time. I would recommend looking at the other wiki pages first for some >> context. So I disagree that everything should hang off of HelixAdmin. >> >> [Sandeep] Surely I am lacking context and will read through the other >> wiki pages. I just don't think Helix needs to expose another object to >> the user, managing connections should not be the responsibility of the >> client if we can do away with that object being surfaced to the client >> its one less object to manage through an API change process. In the >> end you guys are more savvy on Helix than I am so it will be your call >> anyway, I am going to push you towards trying to reduce the API >> surface area so if it can be done then great if not then I would >> imagine we will have a justification for why not. >> >> > >> > By the way, being able to share a connection is something that was >> actually requested by a user. There are legitimate reasons for wanting a >> single live connection across multiple roles on a single node. >> >> [Sandeep] Sure I am not denying that there cant be legitimate reasons >> all I am suggesting is hide that away from the user. The user will >> only ask for the shared connection if he knows about the connection. >> Too many touch points on an API comes to mind when we are exposing too >> many objects. Also if we want to share connections I would imagine we >> can create APIs which allow creating objects with a shared connection. >> >> > >> > - Whether or not multiple admins attached to a single connection should >> be legal: >> > >> > The administrator is passive. It doesn't run things in the background >> and absolutely has nothing to clean up. >> >> [Sandeep] I was only reacting to the shutdown API the admin instance >> can invoke and asking what happens if connections are shared in that >> case. Seemed like a legitimate question to ask. >> >> > >> > Kanak >> > >> > ---------------------------------------- >> >> Date: Tue, 25 Feb 2014 08:24:30 -0800 >> >> Subject: Re: API Review: Helix Admin - Feedback >> >> From: [email protected] >> >> To: [email protected] >> >> >> >> Actually coming to think of it I am starting to think why even open up >> >> the HelixConnection? Why cant someone create a HelixAdmin instance >> >> directly and we manage the connections internally. The lesser the >> >> objects exposed through API the better. >> >> >> >> Thanks, >> >> >> >> Sandeep >> >> >> >> On Tue, Feb 25, 2014 at 12:27 AM, Sandeep Nayak <[email protected]> >> wrote: >> >>> [Sandeep] See responses inline. >> >>> >> >>> On Mon, Feb 24, 2014 at 11:05 PM, Kanak Biscuitwala < >> [email protected]> wrote: >> >>>> My responses are inline. Others please feel free to chime in. >> >>>> >> >>>> General comments: >> >>>> >> >>>> - I think HelixDataAccessor serves as a good basis for an SPI (which >> I have largely ignored in the wiki pages I posted). It exposes leaf nodes >> as HelixProperty instances and is entirely decoupled from ZooKeeper. >> >>>> - Things I want to avoid: class/interface/code-size explosion and >> overabstraction. I'm still not sure what the right balance is. >> >>> >> >>>> >> >>>> ---------------------------------------- >> >>>>> Date: Mon, 24 Feb 2014 22:02:29 -0800 >> >>>>> Subject: API Review: Helix Admin - Feedback >> >>>>> From: [email protected] >> >>>>> To: [email protected] >> >>>>> >> >>>>> Hey guys, >> >>>>> >> >>>>> Here is my first set of feedback, thought I should start this in >> >>>>> chunks rather than all in one go, so picking 3 items at a time. >> >>>>> >> >>>>> Connection Management >> >>>>> >> >>>>> * HelixConnection seems like it belongs to API >> >>>> >> >>>> Could you clarify what you mean by "belongs to API?" >> >>> >> >>> [Sandeep] So my notion of API is all interfaces/contracts which a >> >>> Helix user can use safely. API is our contract to Helix users, as long >> >>> as they stick to the API we make sure the contract is retained. In >> >>> that regard ZKHelixAdmin should not be in API, it should belong >> >>> somewhere in a core component. API is what we can hand over to a Helix >> >>> user and say here, adhere to this and these contracts will be >> >>> respected in future versions, we will only remove them through a >> >>> deprecation process and you will get enough time to migrate. >> >>> >> >>>> >> >>>>> * ZKHelixConnection is a specialization specific to Zookeeper >> >>>>> * We need an factory pattern which allows creating Connections based >> >>>>> on a provider-type. A provider type could be an enum which has >> >>>>> Zookeeper to begin with and we can add other types later. The factory >> >>>>> should be a part of SPI, one implementation of SPI is Zookeeper. >> >>>> >> >>>> Do you mean: >> >>>> >> >>>> HelixConnection connection = >> HelixConnectionFactory.getConnection(ConnectionType.ZOOKEEPER, zkAddress); >> >>>> >> >>>> Can you write example code for this? >> >>> >> >>> [Sandeep] This is not API so not visible to the end user, this is >> >>> internal to Helix core. >> >>> >> >>> HelixConnectionFactory becomes an interface which connection providers >> >>> need to implement. One provider is zookeeper. >> >>> >> >>> enum HelixStoreProvider{ //Did not figure a better name, but since we >> >>> store data in zookeeper thought of it as store >> >>> ZOOKEEPER, >> >>> >> >>> IN_MEMORY; //Just a sample not that someone will write one, but >> >>> demonstrates multiplicity of providers >> >>> } >> >>> >> >>> interface HelixConnectionFactory{ >> >>> getConnection(String address); >> >>> >> >>> String getName(); //you can make the return type an enum if we >> >>> control all providers added >> >>> OR >> >>> HelixStoreProvider getProvider(); //if you use an enum then it >> >>> would be as such >> >>> >> >>> } >> >>> >> >>> ZookeeperConnectionFactory implements HelixConnectionFactory{ >> >>> getConnection(String address){ >> >>> >> >>> } >> >>> >> >>> String getName(){ >> >>> return "zookeeper"; >> >>> } >> >>> OR >> >>> >> >>> HelixStoreProvider getProvider(){ >> >>> return HelixStoreProvider.ZOOKEEPER; >> >>> } >> >>> } >> >>> >> >>> HelixConnectionFactoryRegistry contains all registered connection >> >>> factories, its a map retained >> >>> Map<String, HelixConnectionFactory> connectionFactories or >> >>> Map<HelixStoreProvider, HelixConnectionFactory> >> >>> >> >>> We provider a mechanism for connection factories to be registered in >> >>> the registries, a common model is like the java service discovery >> >>> model. The registry would load all configured connection factories we >> >>> place as a comma class names in >> >>> META-INF/services/org.apache.helix.spi.HelixConnectionfactory. The >> >>> file contains org.apache.helix.spi.impl.ZookeeperHelixConnectionFactory >> >>> >> >>> >> >>> So in the call would look like such. >> >>> >> >>> new >> HelixConnectionBuilder.usingProvider(HelixStoreProvider.ZOOKEEPER).toAddress(zkAddress).build(); >> >>> >> >>>> >> >>>>> * Should the zkAddress be a String? Should it be a special type like >> >>>>> HelixProviderURL? Having something like String requires us to >> document >> >>>>> it clearly but having a a specialization like a HelixProviderURL >> >>>>> allows a guided building of such URL. Not sure if its an overkill but >> >>>>> If we use a provider URL we should allow creating them through a >> >>>>> builder pattern. >> >>>> >> >>>> I would say this is overkill. Each layer of abstraction is more code >> for the users. Most deployments know where ZK is and it's provided to them >> as a string. >> >>> >> >>> [Sandeep] Yes, I thought so as much but was not sure if we need to >> >>> safeguard against invalid urls, seems like its well understood >> >>> pattern. >> >>>> >> >>>>> * We need to declare exceptions when connect fails >> >>>> >> >>>> I've ignored failure cases entirely for now. >> >>> >> >>> [Sandeep] Lets make sure we dont lose track of exceptions, when we get >> >>> to some level of completion with the API lets iterate through >> >>> exceptions. >> >>> >> >>>> >> >>>>> * We need to allow for a time delay for the shutdown, something like >> a >> >>>>> future.get(timeout) >> >>>>> >> >>>> >> >>>> I'm not super experienced with this, so could you give an example of >> why this is preferred to alternative methods of preemption? >> >>> >> >>> [Sandeep] The reason for this is allow for a shutdown API which does >> >>> not shutdown immediately but allows the users to trigger shutdown with >> >>> a grace period. The delay just allows them to ask helix for a grace >> >>> period incase they need to do some cleanup on their side. >> >>> >> >>> The flip side we provide interceptors so they can inject logic before >> shutdown. >> >>> >> >>>> >> >>>>> >> >>>>> Get an Administrator >> >>>>> >> >>>>> * HelixAdministrator seems too generic, it should be >> >>>>> HelixClusterAdministrator given that its an admin for that cluster? >> >>>>> * We should have a HelixAdministrator which can be an administrator >> to >> >>>>> all clusters. How should we retrieve such an administrator? >> >>>> >> >>>> I would rather have one administrator that manages all clusters than >> two administrator classes. >> >>> >> >>> [Sandeep] I think we should simplify and say 1 admin manages 1 >> >>> cluster, if we say 1 admin manages multiple clusters we then need to >> >>> identify the cluster in each API when the user adds participant, >> >>> shutsdown cluster etc. If an admin wants to manage multiple clusters >> >>> then we should allow disconnect from one cluster and connect to >> >>> another and manage that one. It just keeps the context clear, if we do >> >>> 1 to many it just seems confusing. >> >>> >> >>>> >> >>>>> * Having to get a connection first and then getting admin object >> seems >> >>>>> inverted. Why not allow >> >>>>> HelixAdmin.create(HelixProviderURL).getConnection(). It should return >> >>>>> a specialization of the HelixConnection i.e. HelixAdminConnection. We >> >>>>> could then hang specific privileges on the HelixAdminConnection and >> >>>>> not have them on the HelixConnection. >> >>>> >> >>>> It's in this direction so that user can share one connection while >> managing different roles. For instance, a single node may be a participant >> and administrator for one cluster, and a spectator for another. >> >>> >> >>> [Sandeep] Look at it like this, should a participant have all APIs on >> >>> a connection that an admin has? But let me take a step back from your >> >>> examples it seems like a connection can only be used to get an admin >> >>> and everything else hangs off the admin, could there be other uses of >> >>> a connection or is it only a means to get an admin handle? >> >>> >> >>>> >> >>>>> * What happens if an administrator is already created? Should we >> allow >> >>>>> two administrators? >> >>>> >> >>>> I think this is acceptable. I would be more hesitant if a new >> connection was opened for each administrator, but if they're sharing the >> same connection, I don't think it's a big deal. >> >>> >> >>> [Sandeep] So what happens if I do this >> >>> T1 -> Admin 1 using Connection 1 on Thread 1 >> >>> T2 -> Admin 2 using Connection 1 on Thread 2 >> >>> T3 -> Admin 2 closes Connection 1 on Thread 2 >> >>> T3 -> Admin 1 tries to use Connection 1 on Thread 1 >> >>> >> >>> We would need to track all admin objects and invalidate them when a >> >>> connection is discarded. Its doable not overly difficult but we are >> >>> undertaking complexity here. >> >>> >> >>>> >> >>>>> >> >>>>> Resource Create >> >>>>> >> >>>>> * ResourceConfig seems like API >> >>>>> * Should we go all the way ResourceConfiguration instead of >> ResourceConfig? >> >>>> >> >>>> It means more typing, but I'm fine either way. >> >>>> >> >>>>> * ResourceConfigBuilder should have >> >>>>> ResourceConfig.Builder().withId("string") and the remaining should >> >>>>> follow like in the example >> >>>> >> >>>> Sure. We'll need to validate on build() that an id was provided. >> >>>> >> >>>>> * RebalancerConfig seems like API >> >>>>> * Can we figure RebalancerConfig with types? So we dont have three >> >>>>> versions we leverage a type to generate/assemble a builder? >> >>>> >> >>>> I don't follow, can you give an example? >> >>>> >> >>>> We want a RebalancerConfig to be an arbitrary string-serializable >> class. We can provide provide three special RebalancerConfigs for the three >> built-in rebalancing modes. These (and their builders) all derive from >> PartitionedRebalancerConfig. >> >>> >> >>> [Sandeep] So I was thinking something like >> >>> >> RebalancerConfig.Builder(resourceId).usingStrategy(RebalancerStrategy.FullAuto).... >> >>> >> >>> RebalancerStrategy then becomes an enum which internally allows the >> >>> builder to add a pre-baked recipe for FullAuto strategy with defaults, >> >>> the defaults can be overridden by the user ofcourse if necessary. But >> >>> imagine how simple it would be for a majority of new users who could >> >>> do with baked in recipes while they onboard. >> >>> >> >>>> >> >>>>> * How do we determine if the resource is already added? That should >> >>>>> dictate the exception. >> >>>> >> >>>> A resource is already added if its IdealState is already added. >> >>>> >> >>>>> >> >>>>> I think I will pause and wait for some responses/feedback on how this >> >>>>> medium of feedback feels before I move forward. >> >>>>> >> >>>>> Thanks, >> >>>>> >> >>>>> Sandeep >> >>>> >> > >>
