[jira] [Comment Edited] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

Erik Krogen (JIRA) Fri, 16 Nov 2018 09:59:08 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689764#comment-16689764
 ]


Erik Krogen edited comment on HDFS-14017 at 11/16/18 5:58 PM:
--------------------------------------------------------------

Hm.. Something is pretty wrong with Jenkins. It's not actually running any of 
the tests, failing with errors like:
{code}
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client && 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx2048m 
-XX:+HeapDumpOnOutOfMemoryError -jar 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/target/surefire/surefirebooter375579229167329239.jar
 /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/target/surefire 
2018-11-16T17-42-57_928-jvmRun1 surefire586051240617267363tmp 
surefire_07291752059438872666tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
{code}
It looks like the last patch where it actually ran tests was v009. We were 
seeing the same issue on HDFS-14035, but I don't see it on other Jenkins runs 
against trunk (rather than the HDFS-12943 branch).

[~vagarychen], I see you didn't merge in trunk after committing HDFS-14035, I'm 
going to do so now and then re-run Jenkins and see if things get better.


was (Author: xkrogen):
Hm.. Something is pretty wrong with Jenkins. It's not actually running any of 
the tests, failing with errors like:
{code}
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client && 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx2048m 
-XX:+HeapDumpOnOutOfMemoryError -jar 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/target/surefire/surefirebooter375579229167329239.jar
 /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/target/surefire 
2018-11-16T17-42-57_928-jvmRun1 surefire586051240617267363tmp 
surefire_07291752059438872666tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
{code}
It looks like the last patch where it actually ran tests was v009. We were 
seeing the same issue on HDFS-14035, but I don't see it on other Jenkins runs 
against trunk (rather than the HDFS-12943 branch).

[~chliang], I see you didn't merge in trunk after committing HDFS-14035, I'm 
going to do so now and then re-run Jenkins and see if things get better.

> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -------------------------------------------------------------------------
>
>                 Key: HDFS-14017
>                 URL: https://issues.apache.org/jira/browse/HDFS-14017
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>            Priority: Major
>         Attachments: HDFS-14017-HDFS-12943.001.patch, 
> HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, 
> HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, 
> HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, 
> HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, 
> HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch, 
> HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
> {{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
> factory to use {{IPFailoverProxyProvider}}. However this is not enough 
> because when calling constructor of {{ObserverReadProxyProvider}} in 
> super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
>         HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured 
> failover. But in the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in 
> current IPFailover setup, say we have a virtual host nn.xyz.com, which points 
> to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, 
> there is always only one DT being exchanged, which has hostname nn.xyz.com. 
> Server only issues this DT, and client only knows the host nn.xyz.com, so all 
> is good. But in Observer read, even with IPFailover, the client will no 
> longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and 
> nn2.xyz.com. During this process, current code will look for DT associated 
> with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT 
> given by NN. causing Token authentication to fail. This happens in 
> {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy 
> provider will need to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

Reply via email to