Hi Guys, I have attached a latest patch for service discovery at https://issues.apache.org/jira/browse/TAJO-611. 1. I only added the service and didn't modify any code to use the service. 2. Zookeeper should be added to tajo, this work hasn't started yet.
We can have a discussion on how to introduce Zookeeper to Tajo. Thanks, Xuhui On Thu, Apr 17, 2014 at 4:32 PM, Azuryy Yu <[email protected]> wrote: > Xuhui, > > ZK is not base on PAXOS, instead, it use Zab(ZooKeeper Atomic Broadcast), > which is different from PAXOS > > > > On Thu, Apr 17, 2014 at 4:19 PM, Xuhui Liu <[email protected]> wrote: > > > It seems ZK is based on PAXOS. The it will be much simpler. We can focus > on > > how to use ZK well. > > > > Cheers, > > Xuhui > > > > > > On Thu, Apr 17, 2014 at 4:14 PM, Xuhui Liu <[email protected]> wrote: > > > > > Talking about the HA of TajoMaster. Keeping consistence among primary > > > master and slave masters will be a big challenge. Have we ever thought > > > about the PAXOS protocol? It's designed to keep consistence in > > distributed > > > environment. > > > > > > Thanks, > > > Daniel > > > > > > > > > On Wed, Apr 16, 2014 at 7:56 PM, Hyunsik Choi <[email protected]> > > wrote: > > > > > >> Hi Alvin, > > >> > > >> First of all, thank you Alvin for your contribution. Your proposal > looks > > >> nice and reasonable for me. > > >> > > >> BTW, as other guys mentioned, TAJO-704 and TAJO-611 seem to be > somewhat > > >> overlapped to each other. We need to arrange the tasks to avoid > > duplicated > > >> works. > > >> > > >> In my opinion, TajoMaster HA feature involves three sub features: > > >> 1) Leader election of multiple TajoMasters - One of multiple > > TajoMasters > > >> always is the leader TajoMaster. > > >> 2) Service discovery of TajoClient side - TajoClient API call should > > be > > >> resilient even though the original TajoMaster is not available. > > >> 3) Cluster resource management and Catalog information that > TajoMaster > > >> keeps in main-memory. - the information should not be lost. > > >> > > >> I think that (1) and (2) are duplicated to TAJO-611 for service > > discovery. > > >> So, it would be nice if TAJO-704 should only focus on (3). It's > because > > >> TAJO-611 already started few weeks ago and TAJO-704 may be the > > relatively > > >> earlier stage. *Instead, you can continue the work with Xuhui and > Min.* > > >> Someone can divide the service discovery issue into more subtasks. > > >> > > >> In addition, I'd like to more discuss (3). Currently, a running > > TajoMaster > > >> keeps two information: cluster resource information of all workers and > > >> catalog information. In order to guarantee the HA of the data, > > TajoMaster > > >> should either persistently materialize them or consistently > synchronize > > >> them across multiple TajoMasters. BTW, we will replace the resource > > >> management feature of TajoMaster into a decentralized manner in new > > >> scheduler issue. As a result, I think that TajoMaster HA needs to > focus > > on > > >> only the high availability of catalog information. The HA of catalog > can > > >> be > > >> easily achieved by database replication or we can make our own module > > for > > >> it. In my view, I prefer the former. > > >> > > >> Hi Xuhui and Min, > > >> > > >> Could you share the brief progress of service discovery issue? If so, > we > > >> can easily figure out how we start the service discovery together. > > >> > > >> Warm regards, > > >> Hyunsik > > >> > > >> > > >> > > >> On Wed, Apr 16, 2014 at 3:36 PM, Min Zhou <[email protected]> > wrote: > > >> > > >> > Actually, we are not only thinking about the HA, but also service > > >> discovery > > >> > when the future tajo scheduler would rely on. Tajo scheduler can > get > > >> all > > >> > the active workers from that service. > > >> > > > >> > > > >> > Regards, > > >> > Min > > >> > > > >> > > > >> > On Tue, Apr 15, 2014 at 10:05 PM, Xuhui Liu <[email protected]> > wrote: > > >> > > > >> > > Hi Alvin, > > >> > > > > >> > > TAJO-611 will introduce Curator as a service discovery service to > > Tajo > > >> > and > > >> > > Curator is based on ZK. Maybe we can work together. > > >> > > > > >> > > Thanks, > > >> > > Xuhui > > >> > > > > >> > > > > >> > > On Wed, Apr 16, 2014 at 12:17 PM, Min Zhou <[email protected]> > > >> wrote: > > >> > > > > >> > > > HI Alvin, > > >> > > > > > >> > > > I think this jira has somewhat overlap with TAJO-611, can you > > have > > >> > some > > >> > > > cooperation? > > >> > > > > > >> > > > Thanks, > > >> > > > Min > > >> > > > > > >> > > > > > >> > > > On Tue, Apr 15, 2014 at 7:22 PM, Henry Saputra < > > >> > [email protected] > > >> > > > >wrote: > > >> > > > > > >> > > > > Jaehwa, I think we should think about pluggable mechanism that > > >> would > > >> > > > > allow some kind distributed system like ZK to be used if > wanted. > > >> > > > > > > >> > > > > - Henry > > >> > > > > > > >> > > > > On Tue, Apr 15, 2014 at 7:15 PM, Jaehwa Jung < > > [email protected] > > >> > > > >> > > > wrote: > > >> > > > > > Hi, Alvin > > >> > > > > > > > >> > > > > > I'm sorry for late response, and thank you very much for > your > > >> > > > > contribution. > > >> > > > > > I agree with your opinion for zookeeper. But, zookeeper > > >> requires an > > >> > > > > > additional dependency that someone does not want. > > >> > > > > > > > >> > > > > > I'd like to suggest adding an abstraction layer for handling > > >> > > TajoMaster > > >> > > > > HA. > > >> > > > > > When I had created TAJO-740, I wished that TajoMaster HA > would > > >> > have a > > >> > > > > > generic interface and a basic implementation using HDFS. > Next, > > >> your > > >> > > > > > proposed zookeeper implementation will be added there. It > will > > >> > allow > > >> > > > > users > > >> > > > > > to choice their desired implementation according to their > > >> > > environments. > > >> > > > > > > > >> > > > > > In addition, I'd like to propose that TajoMaster embeds the > HA > > >> > > module, > > >> > > > > and > > >> > > > > > it would be great if HA works well by launching a backup > > >> > TajoMaster. > > >> > > > > > Deploying additional process besides TajoMaster and > TajoWorker > > >> > > > processes > > >> > > > > > may give more burden to users. > > >> > > > > > > > >> > > > > > *Cheers* > > >> > > > > > *Jaehwa* > > >> > > > > > > > >> > > > > > > > >> > > > > > 2014-04-13 14:36 GMT+09:00 Jihoon Son <[email protected] > >: > > >> > > > > > > > >> > > > > >> Hi Alvin. > > >> > > > > >> Thanks for your suggestion. > > >> > > > > >> > > >> > > > > >> In overall, your suggestion looks very reasonable to me! > > >> > > > > >> I'll check the POC. > > >> > > > > >> > > >> > > > > >> Many thanks, > > >> > > > > >> Jihoon > > >> > > > > >> Hi All , > > >> > > > > >> After doing lot of research in my opinion we > > should > > >> > > > utilize > > >> > > > > >> zookeeper for Tajo Master HA.I have created a small POC and > > >> shared > > >> > > it > > >> > > > > on my > > >> > > > > >> Github repository ( [email protected]: > > >> > alvinhenrick/zooKeeper-poc.git). > > >> > > > > >> > > >> > > > > >> Just to make things little bit easier and > > >> > maintainable I > > >> > > > am > > >> > > > > >> utilizing Apache Curator the Fluent Zookeeper Client API > > >> > developed > > >> > > at > > >> > > > > >> Netflix and is now part of an apache open source project. > > >> > > > > >> > > >> > > > > >> I have attached the diagram to convey my > message > > to > > >> > the > > >> > > > team > > >> > > > > >> members.Will upload it to JIRA once everyone agree with the > > >> > proposed > > >> > > > > >> solution. > > >> > > > > >> > > >> > > > > >> Here is the flow going to look like. > > >> > > > > >> > > >> > > > > >> TajoMasterZkController ==> > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> 1. This component will start and connect to zookeeper > > >> quorum > > >> > and > > >> > > > > fight > > >> > > > > >> ( :) ) to obtain the latch / lock to become the > master > > . > > >> > > > > >> 2. Once the lock is obtained the Apache Curator API > > will > > >> > > invoke > > >> > > > > >> takeLeadership () method at this time will start the > > >> > > TajoMaster. > > >> > > > > >> 3. As long as the TajoMaster is running the > Controller > > >> will > > >> > > keep > > >> > > > > the > > >> > > > > >> lock and update the meta data on zookeeper server > with > > >> the > > >> > > > > >> HOSTNAME and RPC > > >> > > > > >> PORT. > > >> > > > > >> 4. The other participant will keep waiting for the > > latch/ > > >> > lock > > >> > > > to > > >> > > > > be > > >> > > > > >> released by zookeeper to obtain the leadership. > > >> > > > > >> 5. The advantage is we can have as many Tajo Master's > > as > > >> we > > >> > > > wan't > > >> > > > > but > > >> > > > > >> only one can be the leader and will consume the > > resources > > >> > only > > >> > > > > after > > >> > > > > >> obtaining the latch/lock. > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> TajoWorkerZkController ==> > > >> > > > > >> > > >> > > > > >> 1. This component will start and connect to zookeeper > > (will > > >> > > create > > >> > > > > >> EPHEMERAL ZNODE) and wait for the events from > > zookeeper. > > >> > > > > >> 2. The first listener will listener for successful > > >> > > registration. > > >> > > > > >> 3. The second listener on master node will listen for > > any > > >> > > > > changes to > > >> > > > > >> the master node received from zookeeper server. > > >> > > > > >> 4. If the failover occurs the data on the master > ZNODE > > >> will > > >> > > be > > >> > > > > >> changed and the new HOSTNAME and RPC PORT can be > > obtained > > >> > and > > >> > > > the > > >> > > > > >> TajoWorker can establish the new RPC connection with > > the > > >> > > > > TajoMaster. > > >> > > > > >> > > >> > > > > >> To demonstrate I have created the small > Readme.txt > > >> file > > >> > > > > >> on Github on how to run the example. Please read the log > > >> > statements > > >> > > on > > >> > > > > the > > >> > > > > >> console. > > >> > > > > >> > > >> > > > > >> Similar to TajoWorkerZkController we can also > > >> > > > > >> implement TajoClientZkController. > > >> > > > > >> > > >> > > > > >> Any help or advice is appreciated. > > >> > > > > >> > > >> > > > > >> Thanks! > > >> > > > > >> Warm Regards, > > >> > > > > >> Alvin. > > >> > > > > >> > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > My research interests are distributed systems, parallel > computing > > >> and > > >> > > > bytecode based virtual machine. > > >> > > > > > >> > > > My profile: > > >> > > > http://www.linkedin.com/in/coderplay > > >> > > > My blog: > > >> > > > http://coderplay.javaeye.com > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > My research interests are distributed systems, parallel computing > and > > >> > bytecode based virtual machine. > > >> > > > >> > My profile: > > >> > http://www.linkedin.com/in/coderplay > > >> > My blog: > > >> > http://coderplay.javaeye.com > > >> > > > >> > > > > > > > > >
