[ https://issues.apache.org/jira/browse/IGNITE-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Zinoviev updated IGNITE-11871: ------------------------------------- Fix Version/s: (was: 2.9) 3.0 > [ML] IP resolver in TensorFlow cluster manager doesn't work properly > -------------------------------------------------------------------- > > Key: IGNITE-11871 > URL: https://issues.apache.org/jira/browse/IGNITE-11871 > Project: Ignite > Issue Type: Bug > Components: ml > Affects Versions: 2.7, 2.8 > Reporter: Alexey Zinoviev > Assignee: Alexey Zinoviev > Priority: Critical > Fix For: 3.0 > > > TensorFlow cluster manager requires NodeId to be resolved into IP address or > hostname to pass the address/name to TensorFlow worker. Currently, it uses > strategy "return first" and returns the first available address/name. As a > result of that, in the case when the server has more than one interface > cluster resolver might work incorrectly and return different addresses/names > for the same server. > To fix this problem we need to update > [TensorFlowServerAddressSpec|https://github.com/apache/ignite/blob/master/modules/tensorflow/src/main/java/org/apache/ignite/tensorflow/cluster/spec/TensorFlowServerAddressSpec.java] > so that it returns the same address/name for the same server all the time. > If a server has multiple network interfaces we need to find a "GCD", a > network with all Ignite nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)