It seems ZK is based on PAXOS. The it will be much simpler. We can focus on how to use ZK well.
Cheers, Xuhui On Thu, Apr 17, 2014 at 4:14 PM, Xuhui Liu <[email protected]> wrote: > Talking about the HA of TajoMaster. Keeping consistence among primary > master and slave masters will be a big challenge. Have we ever thought > about the PAXOS protocol? It's designed to keep consistence in distributed > environment. > > Thanks, > Daniel > > > On Wed, Apr 16, 2014 at 7:56 PM, Hyunsik Choi <[email protected]> wrote: > >> Hi Alvin, >> >> First of all, thank you Alvin for your contribution. Your proposal looks >> nice and reasonable for me. >> >> BTW, as other guys mentioned, TAJO-704 and TAJO-611 seem to be somewhat >> overlapped to each other. We need to arrange the tasks to avoid duplicated >> works. >> >> In my opinion, TajoMaster HA feature involves three sub features: >> 1) Leader election of multiple TajoMasters - One of multiple TajoMasters >> always is the leader TajoMaster. >> 2) Service discovery of TajoClient side - TajoClient API call should be >> resilient even though the original TajoMaster is not available. >> 3) Cluster resource management and Catalog information that TajoMaster >> keeps in main-memory. - the information should not be lost. >> >> I think that (1) and (2) are duplicated to TAJO-611 for service discovery. >> So, it would be nice if TAJO-704 should only focus on (3). It's because >> TAJO-611 already started few weeks ago and TAJO-704 may be the relatively >> earlier stage. *Instead, you can continue the work with Xuhui and Min.* >> Someone can divide the service discovery issue into more subtasks. >> >> In addition, I'd like to more discuss (3). Currently, a running TajoMaster >> keeps two information: cluster resource information of all workers and >> catalog information. In order to guarantee the HA of the data, TajoMaster >> should either persistently materialize them or consistently synchronize >> them across multiple TajoMasters. BTW, we will replace the resource >> management feature of TajoMaster into a decentralized manner in new >> scheduler issue. As a result, I think that TajoMaster HA needs to focus on >> only the high availability of catalog information. The HA of catalog can >> be >> easily achieved by database replication or we can make our own module for >> it. In my view, I prefer the former. >> >> Hi Xuhui and Min, >> >> Could you share the brief progress of service discovery issue? If so, we >> can easily figure out how we start the service discovery together. >> >> Warm regards, >> Hyunsik >> >> >> >> On Wed, Apr 16, 2014 at 3:36 PM, Min Zhou <[email protected]> wrote: >> >> > Actually, we are not only thinking about the HA, but also service >> discovery >> > when the future tajo scheduler would rely on. Tajo scheduler can get >> all >> > the active workers from that service. >> > >> > >> > Regards, >> > Min >> > >> > >> > On Tue, Apr 15, 2014 at 10:05 PM, Xuhui Liu <[email protected]> wrote: >> > >> > > Hi Alvin, >> > > >> > > TAJO-611 will introduce Curator as a service discovery service to Tajo >> > and >> > > Curator is based on ZK. Maybe we can work together. >> > > >> > > Thanks, >> > > Xuhui >> > > >> > > >> > > On Wed, Apr 16, 2014 at 12:17 PM, Min Zhou <[email protected]> >> wrote: >> > > >> > > > HI Alvin, >> > > > >> > > > I think this jira has somewhat overlap with TAJO-611, can you have >> > some >> > > > cooperation? >> > > > >> > > > Thanks, >> > > > Min >> > > > >> > > > >> > > > On Tue, Apr 15, 2014 at 7:22 PM, Henry Saputra < >> > [email protected] >> > > > >wrote: >> > > > >> > > > > Jaehwa, I think we should think about pluggable mechanism that >> would >> > > > > allow some kind distributed system like ZK to be used if wanted. >> > > > > >> > > > > - Henry >> > > > > >> > > > > On Tue, Apr 15, 2014 at 7:15 PM, Jaehwa Jung <[email protected] >> > >> > > > wrote: >> > > > > > Hi, Alvin >> > > > > > >> > > > > > I'm sorry for late response, and thank you very much for your >> > > > > contribution. >> > > > > > I agree with your opinion for zookeeper. But, zookeeper >> requires an >> > > > > > additional dependency that someone does not want. >> > > > > > >> > > > > > I'd like to suggest adding an abstraction layer for handling >> > > TajoMaster >> > > > > HA. >> > > > > > When I had created TAJO-740, I wished that TajoMaster HA would >> > have a >> > > > > > generic interface and a basic implementation using HDFS. Next, >> your >> > > > > > proposed zookeeper implementation will be added there. It will >> > allow >> > > > > users >> > > > > > to choice their desired implementation according to their >> > > environments. >> > > > > > >> > > > > > In addition, I'd like to propose that TajoMaster embeds the HA >> > > module, >> > > > > and >> > > > > > it would be great if HA works well by launching a backup >> > TajoMaster. >> > > > > > Deploying additional process besides TajoMaster and TajoWorker >> > > > processes >> > > > > > may give more burden to users. >> > > > > > >> > > > > > *Cheers* >> > > > > > *Jaehwa* >> > > > > > >> > > > > > >> > > > > > 2014-04-13 14:36 GMT+09:00 Jihoon Son <[email protected]>: >> > > > > > >> > > > > >> Hi Alvin. >> > > > > >> Thanks for your suggestion. >> > > > > >> >> > > > > >> In overall, your suggestion looks very reasonable to me! >> > > > > >> I'll check the POC. >> > > > > >> >> > > > > >> Many thanks, >> > > > > >> Jihoon >> > > > > >> Hi All , >> > > > > >> After doing lot of research in my opinion we should >> > > > utilize >> > > > > >> zookeeper for Tajo Master HA.I have created a small POC and >> shared >> > > it >> > > > > on my >> > > > > >> Github repository ( [email protected]: >> > alvinhenrick/zooKeeper-poc.git). >> > > > > >> >> > > > > >> Just to make things little bit easier and >> > maintainable I >> > > > am >> > > > > >> utilizing Apache Curator the Fluent Zookeeper Client API >> > developed >> > > at >> > > > > >> Netflix and is now part of an apache open source project. >> > > > > >> >> > > > > >> I have attached the diagram to convey my message to >> > the >> > > > team >> > > > > >> members.Will upload it to JIRA once everyone agree with the >> > proposed >> > > > > >> solution. >> > > > > >> >> > > > > >> Here is the flow going to look like. >> > > > > >> >> > > > > >> TajoMasterZkController ==> >> > > > > >> >> > > > > >> >> > > > > >> 1. This component will start and connect to zookeeper >> quorum >> > and >> > > > > fight >> > > > > >> ( :) ) to obtain the latch / lock to become the master . >> > > > > >> 2. Once the lock is obtained the Apache Curator API will >> > > invoke >> > > > > >> takeLeadership () method at this time will start the >> > > TajoMaster. >> > > > > >> 3. As long as the TajoMaster is running the Controller >> will >> > > keep >> > > > > the >> > > > > >> lock and update the meta data on zookeeper server with >> the >> > > > > >> HOSTNAME and RPC >> > > > > >> PORT. >> > > > > >> 4. The other participant will keep waiting for the latch/ >> > lock >> > > > to >> > > > > be >> > > > > >> released by zookeeper to obtain the leadership. >> > > > > >> 5. The advantage is we can have as many Tajo Master's as >> we >> > > > wan't >> > > > > but >> > > > > >> only one can be the leader and will consume the resources >> > only >> > > > > after >> > > > > >> obtaining the latch/lock. >> > > > > >> >> > > > > >> >> > > > > >> TajoWorkerZkController ==> >> > > > > >> >> > > > > >> 1. This component will start and connect to zookeeper (will >> > > create >> > > > > >> EPHEMERAL ZNODE) and wait for the events from zookeeper. >> > > > > >> 2. The first listener will listener for successful >> > > registration. >> > > > > >> 3. The second listener on master node will listen for any >> > > > > changes to >> > > > > >> the master node received from zookeeper server. >> > > > > >> 4. If the failover occurs the data on the master ZNODE >> will >> > > be >> > > > > >> changed and the new HOSTNAME and RPC PORT can be obtained >> > and >> > > > the >> > > > > >> TajoWorker can establish the new RPC connection with the >> > > > > TajoMaster. >> > > > > >> >> > > > > >> To demonstrate I have created the small Readme.txt >> file >> > > > > >> on Github on how to run the example. Please read the log >> > statements >> > > on >> > > > > the >> > > > > >> console. >> > > > > >> >> > > > > >> Similar to TajoWorkerZkController we can also >> > > > > >> implement TajoClientZkController. >> > > > > >> >> > > > > >> Any help or advice is appreciated. >> > > > > >> >> > > > > >> Thanks! >> > > > > >> Warm Regards, >> > > > > >> Alvin. >> > > > > >> >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > My research interests are distributed systems, parallel computing >> and >> > > > bytecode based virtual machine. >> > > > >> > > > My profile: >> > > > http://www.linkedin.com/in/coderplay >> > > > My blog: >> > > > http://coderplay.javaeye.com >> > > > >> > > >> > >> > >> > >> > -- >> > My research interests are distributed systems, parallel computing and >> > bytecode based virtual machine. >> > >> > My profile: >> > http://www.linkedin.com/in/coderplay >> > My blog: >> > http://coderplay.javaeye.com >> > >> > >
