[
https://issues.apache.org/jira/browse/HDDS-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729371#comment-17729371
]
David Ayres commented on HDDS-8749:
-----------------------------------
Attila, I did miss that detail in the documentation, thanks for pointing it
out. Even with the serviceID set it still errors out that the ID is null.
WARN ha.OMProxyInfo: OzoneManager address testcluster:9862 for serviceID null
remains unresolved for node ID null Check your ozone-site.xml file to ensure
ozone manager addresses are configured properly.
It is not quite clear in the documentation, but does the ServiceID need to be a
DNS record as well?
My ozone-site.xml is set up as follows:
<!--OM HA Settings-->
<property>
<name>ozone.om.ratis.enable</name>
<value>true</value>
</property>
<property>
<name>ozone.om.service.ids</name>
<value>testcluster</value>
</property>
<property>
<name>ozone.om.nodes.testcluster</name>
<value>om1,om2,om3</value>
</property>
<property>
<name>ozone.om.address.testcluster.om1</name>
<value>ddl07oom01.root.local</value>
</property>
<property>
<name>ozone.om.address.testcluster.om2</name>
<value>ddl07oom02.root.local</value>
</property>
<property>
<name>ozone.om.address.testcluster.om3</name>
<value>ddl07oom03.root.local</value>
</property>
<!--SCM HA Settings-->
<property>
<name>ozone.scm.ratis.enable</name>
<value>true</value>
</property>
<property>
<name>ozone.scm.service.ids</name>
<value>testcluster</value>
</property>
<property>
<name>ozone.scm.nodes.testcluster</name>
<value>scm1,scm2,scm3</value>
</property>
<property>
<name>ozone.scm.address.testcluster.scm1</name>
<value>ddl07oscm01.root.local</value>
</property>
<property>
<name>ozone.scm.address.testcluster.scm2</name>
<value>ddl07oscm02.root.local</value>
</property>
<property>
<name>ozone.scm.address.testcluster.scm3</name>
<value>ddl07oscm03.root.local</value>
</property>
> [Hadoop OFS] HDFS commands fail when not set as the leader of OMHA
> ------------------------------------------------------------------
>
> Key: HDDS-8749
> URL: https://issues.apache.org/jira/browse/HDDS-8749
> Project: Apache Ozone
> Issue Type: Bug
> Components: OFS, OM HA
> Affects Versions: 1.3.0
> Environment: OS: Red Hat 8
> Reporter: David Ayres
> Priority: Minor
>
> When setting the defaultFS in Hadoop's core-site.xml it seems you are only
> allowed to declare one OM node, but if the node declared is not the leader it
> fails with the following error:
> INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException):
> OM:om1 is not the leader. Could not determine the leader node.
>
> , while invoking $Proxy13.submitRequest over
> nodeId=null,nodeAddress=ddl07oom01.vuhl.root.mrc.local:9862 after 1 failover
> attempts. Trying to failover after sleeping for 4000ms. Current retry count:
> 1.
>
> HDFS commands only work when declaring the leader, but that would defeat the
> purpose of HA. As if the OM node were to fail over HDFS commands would cease
> to work.
>
> There also does not seem to be any documentation on how HA works with
> OFS/O3FS as of yet and I am not sure if this is a feature in the works or not.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]