[
https://issues.apache.org/jira/browse/HDDS-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai resolved HDDS-5919.
------------------------------------
Fix Version/s: 1.3.0
Resolution: Fixed
> In kubernettes om HA has circular dependency on the service availability
> ------------------------------------------------------------------------
>
> Key: HDDS-5919
> URL: https://issues.apache.org/jira/browse/HDDS-5919
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM
> Affects Versions: 1.1.0
> Reporter: Shawn
> Assignee: Shawn
> Priority: Critical
> Labels: kubernetes, pull-request-available
> Fix For: 1.3.0
>
>
> In Kubernettes, for OM HA, we need to specify each OM FQDN in the
> configuration. However, the OM address is in the form
> <om_pod_name>.<om_service_name>. During the OM initialization, OM needs to
> resolve the FQDN <om_pod_name>.<om_service_name>. But this FQDN can only be
> resolvable if the OM is in ready state (the OM service only includes the pods
> in ready states). It is kind of circular dependency.
>
> My current hacking resolution is to replace the FQDN name with the local host
> name (om-0.omservice vs om-0) in ozone-site.xml config before the OM
> initialization. However, the side effect of this solution is that the recon
> component cannot be launched, because when recon look up the list of the om
> peers, the return list would be something like: om-0 (the leader),
> om-1.omservice, om-2.omservice, and the leader om-0 cannot be accessed.
> I feel the current ozone is more targeting to bare metal deployment (IPs do
> not change). We should take kubernettes environment, where the ip could be
> dynamic (node rescheduled, or whole app is redeployed for upgrading), into
> account.
> 2021-11-01 18:55:55 ERROR OzoneManagerServiceProviderImpl:315 - Unable to
> obtain Ozone Manager DB Snapshot.2021-11-01 18:55:55 ERROR
> OzoneManagerServiceProviderImpl:315 - Unable to obtain Ozone Manager DB
> Snapshot.java.net.UnknownHostException: Error while authenticating with
> endpoint: [http://test-ozone-om-uat-0:9874/dbCheckpoint] at
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method) at
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
> at
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:232)
> at
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:216)
> at
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348)
> at
> org.apache.hadoop.hdfs.web.URLConnectionFactory.openConnection(URLConnectionFactory.java:186)
> at
> org.apache.hadoop.ozone.recon.ReconUtils.makeHttpCall(ReconUtils.java:237) at
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.lambda$getOzoneManagerDBSnapshot$1(OzoneManagerServiceProviderImpl.java:298)
> at java.base/java.security.AccessController.doPrivileged(Native Method) at
> java.base/javax.security.auth.Subject.doAs(Subject.java:423) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> at org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:535)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:516)
> at
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.getOzoneManagerDBSnapshot(OzoneManagerServiceProviderImpl.java:297)
> at
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.updateReconOmDBWithNewSnapshot(OzoneManagerServiceProviderImpl.java:329)
> at
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.syncDataFromOM(OzoneManagerServiceProviderImpl.java:427)
> at
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.lambda$start$0(OzoneManagerServiceProviderImpl.java:233)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)Caused by:
> java.net.UnknownHostException: test-ozone-om-uat-0 at
> java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
> at java.base/java.net.Socket.connect(Socket.java:609) at
> java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177) at
> java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474) at
> java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569) at
> java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at
> java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341) at
> java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362) at
> java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
> at
> java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
> at
> java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
> at
> java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
> at
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:189)
> ... 19 more
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]