Xuhui,

ZK is not base on PAXOS, instead, it use Zab(ZooKeeper Atomic Broadcast),
which is different from PAXOS



On Thu, Apr 17, 2014 at 4:19 PM, Xuhui Liu <[email protected]> wrote:

> It seems ZK is based on PAXOS. The it will be much simpler. We can focus on
> how to use ZK well.
>
> Cheers,
> Xuhui
>
>
> On Thu, Apr 17, 2014 at 4:14 PM, Xuhui Liu <[email protected]> wrote:
>
> > Talking about the HA of TajoMaster. Keeping consistence among primary
> > master and slave masters will be a big challenge. Have we ever thought
> > about the PAXOS protocol? It's designed to keep consistence in
> distributed
> > environment.
> >
> > Thanks,
> > Daniel
> >
> >
> > On Wed, Apr 16, 2014 at 7:56 PM, Hyunsik Choi <[email protected]>
> wrote:
> >
> >> Hi Alvin,
> >>
> >> First of all, thank you Alvin for your contribution. Your proposal looks
> >> nice and reasonable for me.
> >>
> >> BTW, as other guys mentioned, TAJO-704 and TAJO-611 seem to be somewhat
> >> overlapped to each other. We need to arrange the tasks to avoid
> duplicated
> >> works.
> >>
> >> In my opinion, TajoMaster HA feature involves three sub features:
> >>   1) Leader election of multiple TajoMasters - One of multiple
> TajoMasters
> >> always is the leader TajoMaster.
> >>   2) Service discovery of TajoClient side - TajoClient API call should
> be
> >> resilient even though the original TajoMaster is not available.
> >>   3) Cluster resource management and Catalog information that TajoMaster
> >> keeps in main-memory. - the information should not be lost.
> >>
> >> I think that (1) and (2) are duplicated to TAJO-611 for service
> discovery.
> >> So, it would be nice if TAJO-704 should only focus on (3). It's because
> >> TAJO-611 already started few weeks ago and TAJO-704 may be the
> relatively
> >> earlier stage. *Instead, you can continue the work with Xuhui and Min.*
> >> Someone can divide the service discovery issue into more subtasks.
> >>
> >> In addition, I'd like to more discuss (3). Currently, a running
> TajoMaster
> >> keeps two information: cluster resource information of all workers and
> >> catalog information. In order to guarantee the HA of the data,
> TajoMaster
> >> should either persistently materialize them or consistently synchronize
> >> them across multiple TajoMasters. BTW, we will replace the resource
> >> management feature of TajoMaster into a decentralized manner in new
> >> scheduler issue. As a result, I think that TajoMaster HA needs to focus
> on
> >> only the high availability of catalog information. The HA of catalog can
> >> be
> >> easily achieved by database replication or we can make our own module
> for
> >> it. In my view, I prefer the former.
> >>
> >> Hi Xuhui and Min,
> >>
> >> Could you share the brief progress of service discovery issue? If so, we
> >> can easily figure out how we start the service discovery together.
> >>
> >> Warm regards,
> >> Hyunsik
> >>
> >>
> >>
> >> On Wed, Apr 16, 2014 at 3:36 PM, Min Zhou <[email protected]> wrote:
> >>
> >> > Actually, we are not only thinking about the HA, but also service
> >> discovery
> >> > when the future tajo scheduler would rely on.  Tajo scheduler can get
> >> all
> >> > the active workers from that service.
> >> >
> >> >
> >> > Regards,
> >> > Min
> >> >
> >> >
> >> > On Tue, Apr 15, 2014 at 10:05 PM, Xuhui Liu <[email protected]> wrote:
> >> >
> >> > > Hi Alvin,
> >> > >
> >> > > TAJO-611 will introduce Curator as a service discovery service to
> Tajo
> >> > and
> >> > > Curator is based on ZK. Maybe we can work together.
> >> > >
> >> > > Thanks,
> >> > > Xuhui
> >> > >
> >> > >
> >> > > On Wed, Apr 16, 2014 at 12:17 PM, Min Zhou <[email protected]>
> >> wrote:
> >> > >
> >> > > > HI Alvin,
> >> > > >
> >> > > > I think this jira has somewhat overlap with TAJO-611,  can you
> have
> >> > some
> >> > > > cooperation?
> >> > > >
> >> > > > Thanks,
> >> > > > Min
> >> > > >
> >> > > >
> >> > > > On Tue, Apr 15, 2014 at 7:22 PM, Henry Saputra <
> >> > [email protected]
> >> > > > >wrote:
> >> > > >
> >> > > > > Jaehwa, I think we should think about pluggable mechanism that
> >> would
> >> > > > > allow some kind distributed system like ZK to be used if wanted.
> >> > > > >
> >> > > > > - Henry
> >> > > > >
> >> > > > > On Tue, Apr 15, 2014 at 7:15 PM, Jaehwa Jung <
> [email protected]
> >> >
> >> > > > wrote:
> >> > > > > > Hi, Alvin
> >> > > > > >
> >> > > > > > I'm sorry for late response, and thank you very much for your
> >> > > > > contribution.
> >> > > > > > I agree with your opinion for zookeeper. But, zookeeper
> >> requires an
> >> > > > > > additional dependency that someone does not want.
> >> > > > > >
> >> > > > > > I'd like to suggest adding an abstraction layer for handling
> >> > > TajoMaster
> >> > > > > HA.
> >> > > > > > When I had created TAJO-740, I wished that TajoMaster HA would
> >> > have a
> >> > > > > > generic interface and a basic implementation using HDFS. Next,
> >> your
> >> > > > > > proposed zookeeper implementation will be added there. It will
> >> > allow
> >> > > > > users
> >> > > > > > to choice their desired implementation according to their
> >> > > environments.
> >> > > > > >
> >> > > > > > In addition, I'd like to propose that TajoMaster embeds the HA
> >> > > module,
> >> > > > > and
> >> > > > > > it would be great if HA works well by launching a backup
> >> > TajoMaster.
> >> > > > > > Deploying additional process besides TajoMaster and TajoWorker
> >> > > > processes
> >> > > > > > may give more burden to users.
> >> > > > > >
> >> > > > > > *Cheers*
> >> > > > > > *Jaehwa*
> >> > > > > >
> >> > > > > >
> >> > > > > > 2014-04-13 14:36 GMT+09:00 Jihoon Son <[email protected]>:
> >> > > > > >
> >> > > > > >> Hi Alvin.
> >> > > > > >> Thanks for your suggestion.
> >> > > > > >>
> >> > > > > >> In overall, your suggestion looks very reasonable to me!
> >> > > > > >> I'll check the POC.
> >> > > > > >>
> >> > > > > >> Many thanks,
> >> > > > > >> Jihoon
> >> > > > > >> Hi All ,
> >> > > > > >>             After doing lot of research in my opinion we
> should
> >> > > > utilize
> >> > > > > >> zookeeper for Tajo Master HA.I have created a small POC and
> >> shared
> >> > > it
> >> > > > > on my
> >> > > > > >> Github repository ( [email protected]:
> >> > alvinhenrick/zooKeeper-poc.git).
> >> > > > > >>
> >> > > > > >>             Just to make things little bit easier and
> >> > maintainable I
> >> > > > am
> >> > > > > >> utilizing Apache Curator the Fluent Zookeeper Client API
> >> >  developed
> >> > > at
> >> > > > > >> Netflix and is now part of an  apache open source project.
> >> > > > > >>
> >> > > > > >>             I have attached the diagram to convey my message
> to
> >> > the
> >> > > > team
> >> > > > > >> members.Will upload it to JIRA once everyone agree with the
> >> > proposed
> >> > > > > >> solution.
> >> > > > > >>
> >> > > > > >>             Here is the flow going to look like.
> >> > > > > >>
> >> > > > > >>             TajoMasterZkController   ==>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>    1. This component  will start and connect to zookeeper
> >> quorum
> >> > and
> >> > > > > fight
> >> > > > > >>       ( :) ) to obtain the latch / lock to become the master
> .
> >> > > > > >>       2. Once the lock is obtained the Apache Curator API
> will
> >> > > invoke
> >> > > > > >>       takeLeadership () method at this time will start the
> >> > > TajoMaster.
> >> > > > > >>       3. As long as the TajoMaster is running the Controller
> >> will
> >> > > keep
> >> > > > > the
> >> > > > > >>       lock and update the meta data on zookeeper server with
> >> the
> >> > > > > >> HOSTNAME and RPC
> >> > > > > >>       PORT.
> >> > > > > >>       4. The other participant will keep waiting for the
> latch/
> >> > lock
> >> > > > to
> >> > > > > be
> >> > > > > >>       released by zookeeper to obtain the leadership.
> >> > > > > >>       5. The advantage is we can have as many Tajo Master's
> as
> >> we
> >> > > > wan't
> >> > > > > but
> >> > > > > >>       only one can be the leader and will consume the
> resources
> >> > only
> >> > > > > after
> >> > > > > >>       obtaining the latch/lock.
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>            TajoWorkerZkController ==>
> >> > > > > >>
> >> > > > > >>    1. This component  will start and connect to zookeeper
> (will
> >> > > create
> >> > > > > >>       EPHEMERAL ZNODE) and wait for the events from
> zookeeper.
> >> > > > > >>       2. The first listener will listener for successful
> >> > > registration.
> >> > > > > >>       3. The second listener on master node will listen for
> any
> >> > > > >  changes to
> >> > > > > >>       the master node received from zookeeper server.
> >> > > > > >>       4.  If the failover occurs the data on the master ZNODE
> >> will
> >> > > be
> >> > > > > >>       changed and the new HOSTNAME and RPC PORT can be
> obtained
> >> > and
> >> > > > the
> >> > > > > >>       TajoWorker can establish the new RPC connection with
> the
> >> > > > > TajoMaster.
> >> > > > > >>
> >> > > > > >>           To demonstrate I have created the small Readme.txt
> >> file
> >> > > > > >> on Github on how to run the example. Please read the log
> >> > statements
> >> > > on
> >> > > > > the
> >> > > > > >> console.
> >> > > > > >>
> >> > > > > >>           Similar to TajoWorkerZkController we can also
> >> > > > > >> implement TajoClientZkController.
> >> > > > > >>
> >> > > > > >>           Any help or advice is appreciated.
> >> > > > > >>
> >> > > > > >> Thanks!
> >> > > > > >> Warm Regards,
> >> > > > > >> Alvin.
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > My research interests are distributed systems, parallel computing
> >> and
> >> > > > bytecode based virtual machine.
> >> > > >
> >> > > > My profile:
> >> > > > http://www.linkedin.com/in/coderplay
> >> > > > My blog:
> >> > > > http://coderplay.javaeye.com
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > My research interests are distributed systems, parallel computing and
> >> > bytecode based virtual machine.
> >> >
> >> > My profile:
> >> > http://www.linkedin.com/in/coderplay
> >> > My blog:
> >> > http://coderplay.javaeye.com
> >> >
> >>
> >
> >
>

Reply via email to