Re: HAWQ standby master sync process

Kyle Dunn Thu, 08 Sep 2016 10:22:43 -0700

Ming -

Thank you for the info, this is very helpful in understanding how WAL
shipment happens.

One question I have is: if/where the destination host is configured in
walsendserver.c? Alternatively, does a standby master client initiate the
request rather than the active master pushing out WALs as they become
available? I ask because for a more robust DR solution than what I'm
currently working on would allow multiple standby targets (i.e. one
traditional standby, one DR mirror, etc.)

At the moment I've opted for an approach that stops the active HAWQ master,
creates a tarball of the entire MASTER_DATA_DIRECTORY, archives it on HDFS,
then invokes distcp via Apache Falcon to mirror /hawq_default in HDFS to
the DR site. After a DR event there would be some manual process to restore
said archive and update the hostname / DFS references to reflect the actual
DR environment.

This approach is a step in the right direction but the act of creating the
tarball necessitates a brief HAWQ master outage (currently ~1 minute when
excluding pg_log contents and not compressing), whereas extending the
walserver code could avoid any outage by allowing WAL replication to have
multiple destinations.

The top-level code for orchestrating this process is currently written in
Python 2.6 compatible code - I'd like to have some review of it by the DEV
team, if possible, as a first step to a future PR for "HAWQ DR" via Falcon.

Thoughts?

-Kyle

On Mon, Sep 5, 2016 at 9:41 AM Ming Li <[email protected]> wrote:

> Hi,
>
> The general idea please refer to PostgreSQL:
>
> https://www.pgcon.org/2008/schedule/attachments/61_Synchronous%20Log%20Shipping%20Replication.pdf
>
>
> Here just share some info about standby code.
>
> The standby related code is here:
> src/backend/postmaster/walredoserver.c
> src/backend/postmaster/walsendserver.c
>
> Global pic:
> - Backend generate WAL and pass it to the forked process "WAL Sender",  the
> calling stack is: XLogQDMirrorWrite() => WalSendServerClientSendRequest()
>
> - "WAL sender" process will be forked up and loop for processing request
> and response, the calling stack is:
> walsendserver_forkexec() -> walsendserver_start() -> ServiceMain() ->
> ServiceListenLoop() -> ServiceProcessRequest() ->
> serviceConfig->ServiceRequest()
> -> WalSendServer_ServiceRequest()
>
> - "WAL Sender" send WAL to "WAL Receiver" which is on the standby node, the
> calling stack is:
> WalSendServer_ServiceRequest() => WalSendServerDoRequest() =>
> disconnectMirrorQD_SendClose() => write_qd_sync() => PQsendQuery()
>
> - On the standby side, all API are similar,  e.g. walredoserver_forkexec()
> vs walsendserver_forkexec()
>
> Hope it helps you! ~_~
>
>
>
> On Thu, Aug 11, 2016 at 1:09 AM, Kyle Dunn <[email protected]> wrote:
>
> > Hello,
> >
> > I'm investigating DR options for HAWQ and was curious about the existing
> > master catalog synchronization process. My question is mainly around what
> > this process does at a high level and where I might look in the code base
> > or management tools to see about extending it for additional standby
> > masters (e.g. one in a geographically distant data center and/or
> different
> > logical HAWQ cluster). The assumption is the HDFS blocks would be
> > replicated by something like distcp via Falcon.
> >
> > I believe there are obvious things to address like DFS / namenode URI
> > parameters, FQDNs, and certainly failure scenarios / edge cases, but I'm
> > mainly trying to get a dialog started to see what input, ideas, and
> > considerations others have. One thing I'm specifically interested in is
> > whether / how WAL can be used (@Keaton).
> >
> >
> > Thanks,
> > Kyle
> > --
> > *Kyle Dunn | Data Engineering | Pivotal*
> > Direct: 303.905.3171 <3039053171> | Email: [email protected]
> >
>
-- 
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: [email protected]

Re: HAWQ standby master sync process

Reply via email to