Github user kdunn-pivotal commented on the issue:

    https://github.com/apache/incubator-hawq/pull/940
  
    Here is the step-by-step process - it may have some gaps but this is likely 
90% of the steps:
    
    # HAWQSYNC initial setup runbook:
    
    1. Ensure network connectivity between source and DR sites 
    
    | Port      | Function      | Servers                                       
                                |
    |-------    |----------     
|----------------------------------------------------------------------------   
|
    | 11000     | Oozie         | From Falcon server in each env to Oozie 
server in other env                   |
    | 15000     | Falcon        | From HAWQ master to Falcon server in other 
env                                |
    | 50010     | Datanode      | From Falcon server & datanodes in each env to 
datanodes in other env          |
    | 50070     | Namenode      | From Falcon server to namenodes,(primary and 
standby) in other env            |
    | 8020      | Namenode      | From datanodes to namenodes (primary and 
standby) other env                   |
    | 8050      | YARN RM       | From Falcon server in each env to YARN 
ResourceManager server in other env    |
    
    2. Install Falcon and Oozie on source and DR HAWQ clusters
    
    3. Make prerequisite directories on both clusters (source, DR):
    
    ```
    $ sudo su falcon -l -c 'hdfs dfs -mkdir /tmp/{staging,working}'
    $ sudo su falcon -l -c 'hdfs dfs -chmod 777 /tmp/staging'
    $ sudo su hdfs -l -c 'hdfs dfs -mkdir /apps/data-mirroring/workflows/lib'
    $ sudo su hdfs -l -c 'hdfs dfs -chmod -R 777 /apps/data-mirroring'
    $ sudo su hdfs -l -c 'hdfs dfs -mkdir /user/falcon && hdfs dfs -chown 
falcon:falcon /user/falcon'
    $ sudo su hdfs -l -c 'hdfs dfs -mkdir /user/gpadmin && hdfs dfs -chown 
gpadmin:gpadmin /user/gpadmin/'
    ```
    
    4. Setup cluster entities for source and DR clusters:
    
    ```
    gpadmin@source $ curl -H "Content-Type:text/xml" -X POST 
http://<FALCON_HOST>:15000/api/entities/submit/cluster?user.name=falcon -d 
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <cluster name="primaryIDP" description="" colo="SOURCE" 
xmlns="uri:falcon:cluster:0.1">
        <interfaces>
            <interface type="readonly" 
endpoint="hftp://sandbox.hortonworks.com:50070"; version="2.2.0"/>
            <interface type="write" 
endpoint="hdfs://sandbox.hortonworks.com:8020" version="2.2.0"/>
            <interface type="execute" endpoint="sandbox.hortonworks.com:8050" 
version="2.2.0"/>
            <interface type="workflow" 
endpoint="http://sandbox.hortonworks.com:11000/oozie/"; version="4.0.0"/>
            <interface type="messaging" 
endpoint="tcp://sandbox.hortonworks.com:61616?daemon=true" version="5.1.6"/>
        </interfaces>
        <locations>
            <location name="staging" path="/tmp/staging"/>
            <location name="temp" path="/tmp"/>
            <location name="working" path="/tmp/working"/>
        </locations>
        <ACL owner="hdfs" group="users" permission="0755"/>
    </cluster>'
    ```
    
    ```
    gpadmin@dr $ curl -H "Content-Type:text/xml" -X POST 
http://<FALCON_HOST>:15000/api/entities/submit/cluster?user.name=falcon -d 
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <cluster name="drIDP" description="" colo="DR" 
xmlns="uri:falcon:cluster:0.1">
        <interfaces>
            <interface type="readonly" 
endpoint="hftp://sandbox2.hortonworks.com:50070"; version="2.2.0"/>
            <interface type="write" 
endpoint="hdfs://sandbox2.hortonworks.com:8020" version="2.2.0"/>
            <interface type="execute" endpoint="sandbox2.hortonworks.com:8050" 
version="2.2.0"/>
            <interface type="workflow" 
endpoint="http://sandbox2.hortonworks.com:11000/oozie/"; version="4.0.0"/>
            <interface type="messaging" 
endpoint="tcp://sandbox2.hortonworks.com:61616?daemon=true" version="5.1.6"/>
        </interfaces>
        <locations>
            <location name="staging" path="/tmp/staging"/>
            <location name="temp" path="/tmp"/>
            <location name="working" path="/tmp/working"/>
        </locations>
        <ACL owner="hdfs" group="users" permission="0755"/>
    </cluster>'
    ```
    
    5. Stage distcp-based replication workflow on both source, DR HDFS
    ```
    gpadmin@{source,dr} $ hdfs dfs -cat - 
/apps/data-mirroring/workflows/hdfs-replication-workflow-v2.xml <<EOF
    <!--
           Licensed to the Apache Software Foundation (ASF) under one
      or more contributor license agreements.  See the NOTICE file
      distributed with this work for additional information
      regarding copyright ownership.  The ASF licenses this file
      to you under the Apache License, Version 2.0 (the
      "License"); you may not use this file except in compliance
      with the License.  You may obtain a copy of the License at
    
          http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License.
      -->
    <workflow-app xmlns='uri:oozie:workflow:0.3' name='falcon-dr-fs-workflow'>
        <start to='dr-replication'/>
        <!-- Replication action -->
        <action name="dr-replication">
            <distcp xmlns="uri:oozie:distcp-action:0.2">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <configuration>
                    <property>
                        <name>mapred.job.priority</name>
                        <value>${jobPriority}</value>
                    </property>
                    <property>
                        <name>mapred.job.queue.name</name>
                        <value>${queueName}</value>
                    </property>
                </configuration>
                <arg>-update</arg>
                <arg>-delete</arg>
                <arg>-m</arg>
                <arg>${distcpMaxMaps}</arg>
                <arg>-bandwidth</arg>
                <arg>${distcpMapBandwidth}</arg>
                <arg>-strategy</arg>
                <arg>dynamic</arg>
                <arg>${drSourceClusterFS}${drSourceDir}</arg>
                <arg>${drTargetClusterFS}${drTargetDir}</arg>
            </distcp>
            <ok to="end"/>
            <error to="fail"/>
        </action>
        <kill name="fail">
            <message>
                Workflow action failed, error 
message[${wf:errorMessage(wf:lastErrorNode())}]
            </message>
        </kill>
        <end name="end"/>
    </workflow-app>
    EOF
    ```
    
    # Sync operation runbook:
    1. Run hawqsync-extract to capture known-good HDFS file sizes (protects 
against HDFS / catalog inconsistency if failure during sync)
    
    2. Run ETL batch
    
    3. Run hawqsync-falcon, which performs the following steps:
    (source safemode during sync is only allowable if using a remote Falcon to 
"pull", meaning the distcp job executes on the DR site)
      1. Stop both HAWQ masters (source and target)
      2. Archive source MASTER_DATA_DIRECTORY (MDD) tarball to HDFS
      3. Restart source HAWQ master
      4. Enable HDFS safe mode and force source checkpoint
      5. Disable remote HDFS safe mode
      6. Execute Apache Falcon-based distcp sync process
      7. Disable source HDFS safe mode
      8. Enable HDFS safe mode and force remote checkpoint
    
    # DR event runbook:
    1. Copy new catalog to local filesystem
    ```
    [gpadmin@dr-hawqmaster ~]$ hdfs dfs -copyToLocal 
/hawq_default/hawqMdd-2016-10-11-1028.tar .
    ```
    
    2. Archive previous catalog
    ```
    # cd /data/hawq/
    root@dr-hawqmaster:/data/hawq # mv master master.save-11oct2016
    ```
    
    3. Unpack new catalog
    ```
    root@dr-hawqmaster:/data/hawq # tar xpf 
~gpadmin/hawqMdd-2016-10-11-1028.tar -C `pwd`
    ```
    
    4. Restart Master with new catalog in place
    ```
    [gpadmin@dr-hawqmaster ~]$ hawq start master -a
    ```
    
    5. Update Standby Master Identity
    ```
    [gpadmin@dr-hawqmaster ~]$ export PGOPTIONS="-c gp_session_role=UTILITY -c 
allow_system_table_mods=dml"
    [gpadmin@dr-hawqmaster ~]$ cat > psql template1 <<SQL
    
    UPDATE gp_segment_configuration SET (hostname, address) = 
('clppn1prhdbmn02.infosolco.net', '10.228.45.12') WHERE role = 's' ;
    SQL
    ```
    
    6. Start cluster segments
    ```
    [gpadmin@dr-hawqmaster ~]$ hawq start allsegments -a
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to