[ 
https://issues.apache.org/jira/browse/FLINK-28947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-28947:
----------------------------------
    Component/s: Runtime / Coordination

> Curator framework fails with NullPointerException
> -------------------------------------------------
>
>                 Key: FLINK-28947
>                 URL: https://issues.apache.org/jira/browse/FLINK-28947
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.1
>            Reporter: Juha
>            Priority: Major
>
> I'm getting the following error in JobManager and as a result JobManager 
> exits.
> {code:java}
> Aug 12 06:37:30 server_name java[173]: [2022-08-12 06:37:30,491] ERROR 
> Background exception was not retry-able or retry gave up 
> (org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl:733)
> Aug 12 06:37:30 server_name java[173]: java.lang.NullPointerException: null
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress(Compatibility.java:116)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:185)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:206)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:150)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:926)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:683)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:598)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]: [2022-08-12 06:37:30,493] ERROR 
> Unhandled error in curator framework, error message: Background exception was 
> not retry-able or retry gave up 
> (org.apache.flink.runtime.util.ZooKeeperUtils:292)
> Aug 12 06:37:30 server_name java[173]: java.lang.NullPointerException: null
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress(Compatibility.java:116)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:185)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:206)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:150)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:926)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:683)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:598)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]: [2022-08-12 06:37:30,494] ERROR Fatal 
> error occurred while executing the TaskManager. Shutting it down... 
> (org.apache.flink.runtime.taskexecutor.TaskManagerRunner:427)
> Aug 12 06:37:30 server_name java[173]: java.lang.NullPointerException: null
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress(Compatibility.java:116)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:185)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:206)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:150)
>  ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:926)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:683)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:598)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> Aug 12 06:37:30 server_name java[173]:         at 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
>  [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
> {code}
> Steps
>  * Create three servers
>  * Run Flink JobManager and TaskManager on all of them (let's call these A, B 
> and C). Use ZooKeeper HA Services.
>  * Everything works as expected
>  * Add a new server (D).
>  * Shutdown server C
>  * This error can be seen on both servers A and D. I didn't check B and C.
> This can be reproduced (apparently) with every execution.
> I'm using Flink 1.15.1. Actually I'm migrating from 1.13.X to 1.15.X. I'm not 
> totally sure whether this ever happens on 1.13.X, but it seems to _always_ 
> happen on 1.15.1.
> I looked using debugger what's going on in the JobManager:
> {code:java}
> main-EventThread[1] where
>   [1] 
> org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress
>  (Compatibility.java:116)
>   [2] 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString
>  (EnsembleTracker.java:185)
>   [3] 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData
>  (EnsembleTracker.java:206)
>   [4] 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300
>  (EnsembleTracker.java:50)
>   [5] 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult
>  (EnsembleTracker.java:150)
>   [6] 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback
>  (CuratorFrameworkImpl.java:926)
>   [7] 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation
>  (CuratorFrameworkImpl.java:683)
>   [8] 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation
>  (WatcherRemovalFacade.java:152)
>   [9] 
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult
>  (GetConfigBuilderImpl.java:222)
>   [10] 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent
>  (ClientCnxn.java:598)
>   [11] 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run
>  (ClientCnxn.java:510)
> main-EventThread[1] dump address
>  address = {
>     holder: instance of 
> java.net.InetSocketAddress$InetSocketAddressHolder(id=8302)
>     serialVersionUID: 5076001401234631237
>     serialPersistentFields: instance of java.io.ObjectStreamField[3] (id=8303)
>     UNSAFE: instance of jdk.internal.misc.Unsafe(id=8304)
>     FIELDS_OFFSET: 12
>     java.net.SocketAddress.serialVersionUID: 5215720748342549866
> }
> main-EventThread[1] dump address.holder
>  address.holder = {
>     hostname: "host_name_here"
>     addr: null
>     port: 2888
> }
> main-EventThread[1] print address.getAddress()
>  address.getAddress() = null
> {code}
> (The hostname has been changed).
> It can be seen that on line 116 of Compatibility.java 
> (https://github.com/apache/curator/blob/d65669b64f003326c98843b32b997e3ffab1e442/curator-client/src/main/java/org/apache/curator/utils/Compatibility.java#L116)
>  there's this
> {code}
>         return (address != null) ? address.getAddress().getHostAddress() : 
> "unknown";
> {code}
> Here {{address.getAddress()}} returns {{null}} causing the crash.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to