Murali Ramasami created FALCON-2090:
---------------------------------------

             Summary: HDFS Snapshot failed with UnknownHostException when 
scheduling in HA Mode
                 Key: FALCON-2090
                 URL: https://issues.apache.org/jira/browse/FALCON-2090
             Project: Falcon
          Issue Type: Bug
          Components: replication
    Affects Versions: trunk
            Reporter: Murali Ramasami
             Fix For: trunk


In NN HA, when I schedule a hdfs snapshot  replication, it is failing with 
"java.net.UnknownHostException: mycluster1". In the error message primary is 
the source cluster Nameservice. Please see the complete stack trace.
Stack Trace:
{noformat}

Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/grid/0/hadoop/yarn/local/filecache/371/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/grid/0/hadoop/yarn/local/filecache/213/mapreduce.tar.gz/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Error: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
mycluster1
        at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:411)
        at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:429)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:207)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2730)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2764)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2746)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:178)
        at 
org.apache.falcon.hive.util.EventUtils.initializeFS(EventUtils.java:145)
        at org.apache.falcon.hive.mapreduce.CopyMapper.setup(CopyMapper.java:47)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.net.UnknownHostException: mycluster1
        ... 19 more
{noformat}

Steps to Reproduce:

primaryCluster:
============
{noformat}
<?xml version="1.0" encoding="UTF-8"?>
<cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" 
description="oregonHadoopCluster" name="primaryCluster">
   <interfaces>
      <interface type="readonly" endpoint="webhdfs://mycluster1:20070" 
version="0.20.2" />
      <interface type="write" endpoint="hdfs://mycluster1:8020" 
version="0.20.2" />
      <interface type="execute" 
endpoint="mramasami-falcon-multi-ha-bug-12.openstacklocal:8050" 
version="0.20.2" />
      <interface type="workflow" 
endpoint="http://mramasami-falcon-multi-ha-bug-14.openstacklocal:11000/oozie"; 
version="3.1" />
      <interface type="messaging" 
endpoint="tcp://mramasami-falcon-multi-ha-bug-9.openstacklocal:61616?daemon=true"
 version="5.1.6" />
      <interface type="registry" 
endpoint="thrift://mramasami-falcon-multi-ha-bug-14.openstacklocal:9083" 
version="0.11.0" />
   </interfaces>
   <locations>
      <location name="staging" path="/tmp/fs" />
      <location name="temp" path="/tmp" />
      <location name="working" path="/tmp/fw" />
   </locations>
   <ACL owner="hrt_qa" group="users" permission="0755" />
   <properties>
      <property name="dfs.namenode.kerberos.principal" 
value="nn/_h...@example.com" />
      <property name="hive.metastore.kerberos.principal" 
value="hive/_h...@example.com" />
      <property name="hive.metastore.sasl.enabled" value="true" />
      <property name="hadoop.rpc.protection" value="authentication" />
      <property name="hive.metastore.uris" 
value="thrift://mramasami-falcon-multi-ha-bug-14.openstacklocal:9083" />
      <property name="hive.server2.uri" 
value="hive2://mramasami-falcon-multi-ha-bug-14.openstacklocal:10000" />
   </properties>
</cluster>
{noformat}

falcon entity -submit -type cluster -file primaryCluster.xml --> primaryCluster


backupCluster :
============
{noformat}
<?xml version="1.0" encoding="UTF-8"?>
<cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" 
description="oregonHadoopCluster" name="backupCluster">
   <interfaces>
      <interface type="readonly" endpoint="webhdfs://mycluster2:20070" 
version="0.20.2" />
      <interface type="write" endpoint="hdfs://mycluster2:8020" 
version="0.20.2" />
      <interface type="execute" 
endpoint="mramasami-falcon-multi-ha-bug-5.openstacklocal:8050" version="0.20.2" 
/>
      <interface type="workflow" 
endpoint="http://mramasami-falcon-multi-ha-bug-6.openstacklocal:11000/oozie"; 
version="3.1" />
      <interface type="messaging" 
endpoint="tcp://mramasami-falcon-multi-ha-bug-1.openstacklocal:61616" 
version="5.1.6" />
      <interface type="registry" 
endpoint="thrift://mramasami-falcon-multi-ha-bug-6.openstacklocal:9083" 
version="0.11.0" />
   </interfaces>
   <locations>
      <location name="staging" path="/tmp/fs" />
      <location name="temp" path="/tmp" />
      <location name="working" path="/tmp/fw" />
   </locations>
   <ACL owner="hrt_qa" group="users" permission="0755" />
   <properties>
      <property name="dfs.namenode.kerberos.principal" 
value="nn/_h...@example.com" />
      <property name="hive.metastore.kerberos.principal" 
value="hive/_h...@example.com" />
      <property name="hive.metastore.sasl.enabled" value="true" />
      <property name="hadoop.rpc.protection" value="authentication" />
      <property name="hive.metastore.uris" 
value="thrift://mramasami-falcon-multi-ha-bug-6.openstacklocal:9083" />
      <property name="hive.server2.uri" 
value="hive2://mramasami-falcon-multi-ha-bug-6.openstacklocal:10000" />
   </properties>
</cluster>
{noformat}

falcon entity -submit -type cluster -file backupCluster.xml --> backupCluster

HDFS Snapshot Replication:
=========================

Source:
======

hdfs dfs -mkdir -p /tmp/falcon-regression/HDFSSnapshotTest/source
hdfs dfs -put 
/grid/0/hadoopqe/tests/ha/falcon/combinedActions/mr_input/2015/01/02/NYSE-2000-2001.tsv
 /tmp/falcon-regression/HDFSSnapshotTest/source

Create Snapshot :
===============

hdfs dfsadmin -allowSnapshot /tmp/falcon-regression/HDFSSnapshotTest/source [ 
hdfs]
hdfs dfs -createSnapshot /tmp/falcon-regression/HDFSSnapshotTest/source [ 
hrt_qa]

hdfs lsSnapshottableDir [ hrt_qa]

hdfs dfs -ls /tmp/falcon-regression/HDFSSnapshotTest/source/.snapshot


Target:
======

hdfs dfs -mkdir -p /tmp/falcon-regression/HDFSSnapshotTest/target

hdfs dfsadmin -allowSnapshot /tmp/falcon-regression/HDFSSnapshotTest/target
hdfs dfs -ls /tmp/falcon-regression/HDFSSnapshotTest/target/.snapshot


hdfs-snapshot.properties
==========================
{noformat}
jobName=HDFSSnapshotTest
jobClusterName=primaryCluster
jobValidityStart=2016-05-09T06:25Z
jobValidityEnd=2017-05-09T08:00Z
jobFrequency=days(1)
sourceCluster=primaryCluster
sourceSnapshotDir=/tmp/falcon-regression/HDFSSnapshotTest/source
sourceSnapshotRetentionAgeLimit=days(1)
sourceSnapshotRetentionNumber=3
targetCluster=backupCluster
targetSnapshotDir=/tmp/falcon-regression/HDFSSnapshotTest/target
targetSnapshotRetentionAgeLimit=days(1)
targetSnapshotRetentionNumber=3
jobAclOwner=hrt_qa
jobAclGroup=users
jobAclPermission="0x755"            
{noformat}

falcon extension -extensionName hdfs-snapshot-mirroring -submitAndSchedule 
-file hdfs-snapshot.properties



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to