[ 
https://issues.apache.org/jira/browse/HDDS-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai resolved HDDS-5919.
------------------------------------
    Fix Version/s: 1.3.0
       Resolution: Fixed

> In kubernettes om HA has circular dependency on the service availability
> ------------------------------------------------------------------------
>
>                 Key: HDDS-5919
>                 URL: https://issues.apache.org/jira/browse/HDDS-5919
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM
>    Affects Versions: 1.1.0
>            Reporter: Shawn
>            Assignee: Shawn
>            Priority: Critical
>              Labels: kubernetes, pull-request-available
>             Fix For: 1.3.0
>
>
> In Kubernettes, for OM HA, we need to specify each OM FQDN in the 
> configuration. However, the OM address is in the form 
> <om_pod_name>.<om_service_name>. During the OM initialization, OM needs to 
> resolve the FQDN <om_pod_name>.<om_service_name>. But this FQDN can only be 
> resolvable if the OM is in ready state (the OM service only includes the pods 
> in ready states). It is kind of circular dependency.
>  
> My current hacking resolution is to replace the FQDN name with the local host 
> name (om-0.omservice vs om-0) in ozone-site.xml config before the OM 
> initialization. However, the side effect of this solution is that the recon 
> component cannot be launched, because when recon look up the list of the om 
> peers, the return list would be something like: om-0 (the leader), 
> om-1.omservice, om-2.omservice, and the leader om-0 cannot be accessed.
> I feel the current ozone is more targeting to bare metal deployment (IPs do 
> not change). We should take kubernettes environment, where the ip could be 
> dynamic (node rescheduled, or whole app is redeployed for upgrading), into 
> account.
> 2021-11-01 18:55:55 ERROR OzoneManagerServiceProviderImpl:315 - Unable to 
> obtain Ozone Manager DB Snapshot.2021-11-01 18:55:55 ERROR 
> OzoneManagerServiceProviderImpl:315 - Unable to obtain Ozone Manager DB 
> Snapshot.java.net.UnknownHostException: Error while authenticating with 
> endpoint: [http://test-ozone-om-uat-0:9874/dbCheckpoint] at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method) at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) 
> at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:232)
>  at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:216)
>  at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348)
>  at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.openConnection(URLConnectionFactory.java:186)
>  at 
> org.apache.hadoop.ozone.recon.ReconUtils.makeHttpCall(ReconUtils.java:237) at 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.lambda$getOzoneManagerDBSnapshot$1(OzoneManagerServiceProviderImpl.java:298)
>  at java.base/java.security.AccessController.doPrivileged(Native Method) at 
> java.base/javax.security.auth.Subject.doAs(Subject.java:423) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
>  at org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:535) 
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:516) 
> at 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.getOzoneManagerDBSnapshot(OzoneManagerServiceProviderImpl.java:297)
>  at 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.updateReconOmDBWithNewSnapshot(OzoneManagerServiceProviderImpl.java:329)
>  at 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.syncDataFromOM(OzoneManagerServiceProviderImpl.java:427)
>  at 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.lambda$start$0(OzoneManagerServiceProviderImpl.java:233)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> java.net.UnknownHostException: test-ozone-om-uat-0 at 
> java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
>  at java.base/java.net.Socket.connect(Socket.java:609) at 
> java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177) at 
> java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474) at 
> java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569) at 
> java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at 
> java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341) at 
> java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362) at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
>  at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
>  at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
>  at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
>  at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:189)
>  ... 19 more



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to