Re: API Review: Helix Admin - Feedback

Sandeep Nayak Tue, 25 Feb 2014 19:22:13 -0800

Apologies for the late response, got tied down with priority items at
work. About the plan it sounds good, I will join IRC later in the
evening and we can do Hangout too. See you guys there.


On Tue, Feb 25, 2014 at 2:41 PM, kishore g <[email protected]> wrote:
> Great discussion, I see we are converging on some items. We can move this
> discussion to JIRA but i dont think that will be of much help. There is an
> other option.
>
> We setup a time to discuss these, much easier to explain the pros and cons.
> -- IRC
> -- Hangout session.
>
> I am fine with either option. After that we can send the summary to the
> group. What do you think.
>
> On the connectionFactory, I agree with what Sandeep wrote with the registry
> pattern. However, I am little worried about loading things from classpath
> automatically, I am fine with us loading Zookeeper impl from within the
> code (its ugly) but i prefer ability to debug the code and helps in knowing
> the usage of a particular method. Sandeep, if I understand correctly you
> are saying the HelixStoreProvider is a dynamic builder that can be
> customized per SPI. I am not sure if we need getConnection(Address) in
> ConnectionFactory. we need something like getConnection(ConnectionOptions),
> lets say we add database as a backend, then we might need additional
> options to create the connection. We can work on the details when we start
> writing the api's. At a high I agree with the builder pattern + service
> registry.
>
> Regarding re-use of connections, we absolutely don't want connection to be
> re-used across participants. That will create more issues. I think the
> problem with existing api is that creating a manager requires cluster name
> and we did not really have a connection object. The re-use is a pretty rare
> case and its a nice to have feature but its ok to if its hard to re-use at
> the cost of simplifying the most common use cases.
>
> On the participant,spectator, administrator topic, I have ended up using
> the admin module for most of the operations. So it might be a good idea to
> have administrator as the single point for all CRUD operations. The only
> pain point will be passing in cluster name for every operations even after
> being connected as the participant to the cluster.
>
> Lets have a discussion on IRC/Hangout and start working on a branch.
> Sandeep, we should be able to give you permission to commit on a branch. I
> think once we start writing the API we get more clarity.
>
> thanks
> Kishore G
>
>
>
> On Tue, Feb 25, 2014 at 11:56 AM, Sandeep Nayak <[email protected]> wrote:
>
>> My last effort and then lets move on, email is not the best venue to
>> discuss design and sometimes whiteboards are necessary but I let me
>> see if I can convey my thoughts :-).
>>
>> On Tue, Feb 25, 2014 at 11:07 AM, Kanak Biscuitwala <[email protected]>
>> wrote:
>> > Alright, here's where I think we agree so far:
>> >
>> > - Single class for building common rebalancer configs
>> > - Some form of factory for getting a HelixConnection
>> > - Keeping building constructors empty, but validating on build
>> > - Having an admin scoped to a single cluster is probably cleaner
>> >
>> > Here's what we still need to discuss:
>> >
>> > - The complexity of the helix connection factory:
>> >
>> > I'm not sure what benefit to the user (or developer) this buys us. I
>> mean theoretically we could reuse open HelixConnections, but I don't think
>> that's what you're suggesting. Right now, it seems like more code for us
>> and the user to do the same thing.
>> >
>>
>> [Sandeep] The connection factory buys us the ability to keep a
>> consistent API when someone writes a replacement to store
>> configuration state in something other than Zookeeper. Sure, we can
>> keep the API to the user simple but the underlying complexity buys us
>> flexibility, we could say replacing Zookeeper or for that matter
>> providing an abstraction to zookeeper is not priority now so lets not
>> worry about that for now which is fine. But if we are going to provide
>> alternatives to zookeeper then that abstraction of spi and connection
>> factories or some variant of it is necessary.
>>
>> > - How to let applications gracefully shut down:
>> >
>> > We provide FINALIZE callbacks to users so they can do what is necessary
>> before we shut down.
>>
>> [Sandeep] Sounds good
>>
>> >
>> > - The necessity of a multi-cluster admin:
>> >
>> > There's only one method that we need to support that is beyond the scope
>> of a single cluster: listClusters(). The rest can and should be scoped to a
>> single cluster; it's just a question of where we stick this one method.
>> >
>> > - The role of an admin in the cluster, and whether or not it should be
>> the source of all APIs or if the connection should be:
>>
>> [Sandeep] Just to clarify I was not asking that we have all APIs on
>> admin. I probably was not clear so my apologies on the confusion, what
>> I was pointing to is why expose a connection which can be hidden under
>> the HelixAdmin by passing a string url to the HelixAdmin. The
>> HelixAdmin can then create/manage connections underneath. One less
>> object exposed to the end user.
>> >
>> > Just so we're on the same page, let me briefly describe how
>> administration typically works in systems that use Helix:
>> >
>> > 1. There are tools that change the cluster state that will exclusively
>> work through an administrator.
>>
>> [Sandeep] Seems like there is a separation of API right here because
>> there is exclusive ability for an admin
>>
>> > 2. Participants and spectators don't change the cluster state most of
>> the time, but at some point may need to save a config or store a property,
>> and thus need administration support.
>>
>> [Sandeep] Sure, any participant or spectator should be able to save
>> state. That location should be separated from where cluster state is
>> stored, the storage and who gets access to what should be directed
>> from the access pattern from the API which is why I think there is a
>> separation.
>>
>> >
>> > Generally Helix systems are driven through participant state transition
>> callbacks and are generally low-touch with respect to configurations over
>> time. I would recommend looking at the other wiki pages first for some
>> context. So I disagree that everything should hang off of HelixAdmin.
>>
>> [Sandeep] Surely I am lacking context and will read through the other
>> wiki pages. I just don't think Helix needs to expose another object to
>> the user, managing connections should not be the responsibility of the
>> client if we can do away with that object being surfaced to the client
>> its one less object to manage through an API change process. In the
>> end you guys are more savvy on Helix than I am so it will be your call
>> anyway, I am going to push you towards trying to reduce the API
>> surface area so if it can be done then great if not then I would
>> imagine we will have a justification for why not.
>>
>> >
>> > By the way, being able to share a connection is something that was
>> actually requested by a user. There are legitimate reasons for wanting a
>> single live connection across multiple roles on a single node.
>>
>> [Sandeep] Sure I am not denying that there cant be legitimate reasons
>> all I am suggesting is hide that away from the user. The user will
>> only ask for the shared connection if he knows about the connection.
>> Too many touch points on an API comes to mind when we are exposing too
>> many objects. Also if we want to share connections I would imagine we
>> can create APIs which allow creating objects with a shared connection.
>>
>> >
>> > - Whether or not multiple admins attached to a single connection should
>> be legal:
>> >
>> > The administrator is passive. It doesn't run things in the background
>> and absolutely has nothing to clean up.
>>
>> [Sandeep] I was only reacting to the shutdown API the admin instance
>> can invoke and asking what happens if connections are shared in that
>> case. Seemed like a legitimate question to ask.
>>
>> >
>> > Kanak
>> >
>> > ----------------------------------------
>> >> Date: Tue, 25 Feb 2014 08:24:30 -0800
>> >> Subject: Re: API Review: Helix Admin - Feedback
>> >> From: [email protected]
>> >> To: [email protected]
>> >>
>> >> Actually coming to think of it I am starting to think why even open up
>> >> the HelixConnection? Why cant someone create a HelixAdmin instance
>> >> directly and we manage the connections internally. The lesser the
>> >> objects exposed through API the better.
>> >>
>> >> Thanks,
>> >>
>> >> Sandeep
>> >>
>> >> On Tue, Feb 25, 2014 at 12:27 AM, Sandeep Nayak <[email protected]>
>> wrote:
>> >>> [Sandeep] See responses inline.
>> >>>
>> >>> On Mon, Feb 24, 2014 at 11:05 PM, Kanak Biscuitwala <
>> [email protected]> wrote:
>> >>>> My responses are inline. Others please feel free to chime in.
>> >>>>
>> >>>> General comments:
>> >>>>
>> >>>> - I think HelixDataAccessor serves as a good basis for an SPI (which
>> I have largely ignored in the wiki pages I posted). It exposes leaf nodes
>> as HelixProperty instances and is entirely decoupled from ZooKeeper.
>> >>>> - Things I want to avoid: class/interface/code-size explosion and
>> overabstraction. I'm still not sure what the right balance is.
>> >>>
>> >>>>
>> >>>> ----------------------------------------
>> >>>>> Date: Mon, 24 Feb 2014 22:02:29 -0800
>> >>>>> Subject: API Review: Helix Admin - Feedback
>> >>>>> From: [email protected]
>> >>>>> To: [email protected]
>> >>>>>
>> >>>>> Hey guys,
>> >>>>>
>> >>>>> Here is my first set of feedback, thought I should start this in
>> >>>>> chunks rather than all in one go, so picking 3 items at a time.
>> >>>>>
>> >>>>> Connection Management
>> >>>>>
>> >>>>> * HelixConnection seems like it belongs to API
>> >>>>
>> >>>> Could you clarify what you mean by "belongs to API?"
>> >>>
>> >>> [Sandeep] So my notion of API is all interfaces/contracts which a
>> >>> Helix user can use safely. API is our contract to Helix users, as long
>> >>> as they stick to the API we make sure the contract is retained. In
>> >>> that regard ZKHelixAdmin should not be in API, it should belong
>> >>> somewhere in a core component. API is what we can hand over to a Helix
>> >>> user and say here, adhere to this and these contracts will be
>> >>> respected in future versions, we will only remove them through a
>> >>> deprecation process and you will get enough time to migrate.
>> >>>
>> >>>>
>> >>>>> * ZKHelixConnection is a specialization specific to Zookeeper
>> >>>>> * We need an factory pattern which allows creating Connections based
>> >>>>> on a provider-type. A provider type could be an enum which has
>> >>>>> Zookeeper to begin with and we can add other types later. The factory
>> >>>>> should be a part of SPI, one implementation of SPI is Zookeeper.
>> >>>>
>> >>>> Do you mean:
>> >>>>
>> >>>> HelixConnection connection =
>> HelixConnectionFactory.getConnection(ConnectionType.ZOOKEEPER, zkAddress);
>> >>>>
>> >>>> Can you write example code for this?
>> >>>
>> >>> [Sandeep] This is not API so not visible to the end user, this is
>> >>> internal to Helix core.
>> >>>
>> >>> HelixConnectionFactory becomes an interface which connection providers
>> >>> need to implement. One provider is zookeeper.
>> >>>
>> >>> enum HelixStoreProvider{ //Did not figure a better name, but since we
>> >>> store data in zookeeper thought of it as store
>> >>> ZOOKEEPER,
>> >>>
>> >>> IN_MEMORY; //Just a sample not that someone will write one, but
>> >>> demonstrates multiplicity of providers
>> >>> }
>> >>>
>> >>> interface HelixConnectionFactory{
>> >>> getConnection(String address);
>> >>>
>> >>> String getName(); //you can make the return type an enum if we
>> >>> control all providers added
>> >>> OR
>> >>> HelixStoreProvider getProvider(); //if you use an enum then it
>> >>> would be as such
>> >>>
>> >>> }
>> >>>
>> >>> ZookeeperConnectionFactory implements HelixConnectionFactory{
>> >>> getConnection(String address){
>> >>>
>> >>> }
>> >>>
>> >>> String getName(){
>> >>> return "zookeeper";
>> >>> }
>> >>> OR
>> >>>
>> >>> HelixStoreProvider getProvider(){
>> >>> return HelixStoreProvider.ZOOKEEPER;
>> >>> }
>> >>> }
>> >>>
>> >>> HelixConnectionFactoryRegistry contains all registered connection
>> >>> factories, its a map retained
>> >>> Map<String, HelixConnectionFactory> connectionFactories or
>> >>> Map<HelixStoreProvider, HelixConnectionFactory>
>> >>>
>> >>> We provider a mechanism for connection factories to be registered in
>> >>> the registries, a common model is like the java service discovery
>> >>> model. The registry would load all configured connection factories we
>> >>> place as a comma class names in
>> >>> META-INF/services/org.apache.helix.spi.HelixConnectionfactory. The
>> >>> file contains org.apache.helix.spi.impl.ZookeeperHelixConnectionFactory
>> >>>
>> >>>
>> >>> So in the call would look like such.
>> >>>
>> >>> new
>> HelixConnectionBuilder.usingProvider(HelixStoreProvider.ZOOKEEPER).toAddress(zkAddress).build();
>> >>>
>> >>>>
>> >>>>> * Should the zkAddress be a String? Should it be a special type like
>> >>>>> HelixProviderURL? Having something like String requires us to
>> document
>> >>>>> it clearly but having a a specialization like a HelixProviderURL
>> >>>>> allows a guided building of such URL. Not sure if its an overkill but
>> >>>>> If we use a provider URL we should allow creating them through a
>> >>>>> builder pattern.
>> >>>>
>> >>>> I would say this is overkill. Each layer of abstraction is more code
>> for the users. Most deployments know where ZK is and it's provided to them
>> as a string.
>> >>>
>> >>> [Sandeep] Yes, I thought so as much but was not sure if we need to
>> >>> safeguard against invalid urls, seems like its well understood
>> >>> pattern.
>> >>>>
>> >>>>> * We need to declare exceptions when connect fails
>> >>>>
>> >>>> I've ignored failure cases entirely for now.
>> >>>
>> >>> [Sandeep] Lets make sure we dont lose track of exceptions, when we get
>> >>> to some level of completion with the API lets iterate through
>> >>> exceptions.
>> >>>
>> >>>>
>> >>>>> * We need to allow for a time delay for the shutdown, something like
>> a
>> >>>>> future.get(timeout)
>> >>>>>
>> >>>>
>> >>>> I'm not super experienced with this, so could you give an example of
>> why this is preferred to alternative methods of preemption?
>> >>>
>> >>> [Sandeep] The reason for this is allow for a shutdown API which does
>> >>> not shutdown immediately but allows the users to trigger shutdown with
>> >>> a grace period. The delay just allows them to ask helix for a grace
>> >>> period incase they need to do some cleanup on their side.
>> >>>
>> >>> The flip side we provide interceptors so they can inject logic before
>> shutdown.
>> >>>
>> >>>>
>> >>>>>
>> >>>>> Get an Administrator
>> >>>>>
>> >>>>> * HelixAdministrator seems too generic, it should be
>> >>>>> HelixClusterAdministrator given that its an admin for that cluster?
>> >>>>> * We should have a HelixAdministrator which can be an administrator
>> to
>> >>>>> all clusters. How should we retrieve such an administrator?
>> >>>>
>> >>>> I would rather have one administrator that manages all clusters than
>> two administrator classes.
>> >>>
>> >>> [Sandeep] I think we should simplify and say 1 admin manages 1
>> >>> cluster, if we say 1 admin manages multiple clusters we then need to
>> >>> identify the cluster in each API when the user adds participant,
>> >>> shutsdown cluster etc. If an admin wants to manage multiple clusters
>> >>> then we should allow disconnect from one cluster and connect to
>> >>> another and manage that one. It just keeps the context clear, if we do
>> >>> 1 to many it just seems confusing.
>> >>>
>> >>>>
>> >>>>> * Having to get a connection first and then getting admin object
>> seems
>> >>>>> inverted. Why not allow
>> >>>>> HelixAdmin.create(HelixProviderURL).getConnection(). It should return
>> >>>>> a specialization of the HelixConnection i.e. HelixAdminConnection. We
>> >>>>> could then hang specific privileges on the HelixAdminConnection and
>> >>>>> not have them on the HelixConnection.
>> >>>>
>> >>>> It's in this direction so that user can share one connection while
>> managing different roles. For instance, a single node may be a participant
>> and administrator for one cluster, and a spectator for another.
>> >>>
>> >>> [Sandeep] Look at it like this, should a participant have all APIs on
>> >>> a connection that an admin has? But let me take a step back from your
>> >>> examples it seems like a connection can only be used to get an admin
>> >>> and everything else hangs off the admin, could there be other uses of
>> >>> a connection or is it only a means to get an admin handle?
>> >>>
>> >>>>
>> >>>>> * What happens if an administrator is already created? Should we
>> allow
>> >>>>> two administrators?
>> >>>>
>> >>>> I think this is acceptable. I would be more hesitant if a new
>> connection was opened for each administrator, but if they're sharing the
>> same connection, I don't think it's a big deal.
>> >>>
>> >>> [Sandeep] So what happens if I do this
>> >>> T1 -> Admin 1 using Connection 1 on Thread 1
>> >>> T2 -> Admin 2 using Connection 1 on Thread 2
>> >>> T3 -> Admin 2 closes Connection 1 on Thread 2
>> >>> T3 -> Admin 1 tries to use Connection 1 on Thread 1
>> >>>
>> >>> We would need to track all admin objects and invalidate them when a
>> >>> connection is discarded. Its doable not overly difficult but we are
>> >>> undertaking complexity here.
>> >>>
>> >>>>
>> >>>>>
>> >>>>> Resource Create
>> >>>>>
>> >>>>> * ResourceConfig seems like API
>> >>>>> * Should we go all the way ResourceConfiguration instead of
>> ResourceConfig?
>> >>>>
>> >>>> It means more typing, but I'm fine either way.
>> >>>>
>> >>>>> * ResourceConfigBuilder should have
>> >>>>> ResourceConfig.Builder().withId("string") and the remaining should
>> >>>>> follow like in the example
>> >>>>
>> >>>> Sure. We'll need to validate on build() that an id was provided.
>> >>>>
>> >>>>> * RebalancerConfig seems like API
>> >>>>> * Can we figure RebalancerConfig with types? So we dont have three
>> >>>>> versions we leverage a type to generate/assemble a builder?
>> >>>>
>> >>>> I don't follow, can you give an example?
>> >>>>
>> >>>> We want a RebalancerConfig to be an arbitrary string-serializable
>> class. We can provide provide three special RebalancerConfigs for the three
>> built-in rebalancing modes. These (and their builders) all derive from
>> PartitionedRebalancerConfig.
>> >>>
>> >>> [Sandeep] So I was thinking something like
>> >>>
>> RebalancerConfig.Builder(resourceId).usingStrategy(RebalancerStrategy.FullAuto)....
>> >>>
>> >>> RebalancerStrategy then becomes an enum which internally allows the
>> >>> builder to add a pre-baked recipe for FullAuto strategy with defaults,
>> >>> the defaults can be overridden by the user ofcourse if necessary. But
>> >>> imagine how simple it would be for a majority of new users who could
>> >>> do with baked in recipes while they onboard.
>> >>>
>> >>>>
>> >>>>> * How do we determine if the resource is already added? That should
>> >>>>> dictate the exception.
>> >>>>
>> >>>> A resource is already added if its IdealState is already added.
>> >>>>
>> >>>>>
>> >>>>> I think I will pause and wait for some responses/feedback on how this
>> >>>>> medium of feedback feels before I move forward.
>> >>>>>
>> >>>>> Thanks,
>> >>>>>
>> >>>>> Sandeep
>> >>>>
>> >
>>

Re: API Review: Helix Admin - Feedback

Reply via email to