[ https://issues.apache.org/jira/browse/FLINK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991725#comment-14991725 ]
Till Rohrmann commented on FLINK-2979: -------------------------------------- The failure might be caused by {code} java.lang.Exception: Could not restore checkpointed state to operators and functions at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Failed to restore state to function: Could not invoke truncate. at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406) ... 3 more Caused by: java.lang.RuntimeException: Could not invoke truncate. at org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:695) at org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162) ... 4 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:678) ... 6 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to TRUNCATE_FILE /string-non-rolling-out/part-2-2 for DFSClient_NONMAPREDUCE_-401178409_229 on 127.0.0.1 because DFSClient_NONMAPREDUCE_-401178409_229 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2885) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInternal(FSNamesystem.java:2082) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInt(FSNamesystem.java:2028) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncate(FSNamesystem.java:1998) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.truncate(NameNodeRpcServer.java:926) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.truncate(ClientNamenodeProtocolServerSideTranslatorPB.java:599) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1407) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy23.truncate(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.truncate(ClientNamenodeProtocolTranslatorPB.java:313) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy24.truncate(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.truncate(DFSClient.java:2024) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:689) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:685) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.truncate(DistributedFileSystem.java:696) ... 11 more {code} > RollingSink does not work with Hadoop 2.7.1 > ------------------------------------------- > > Key: FLINK-2979 > URL: https://issues.apache.org/jira/browse/FLINK-2979 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors > Affects Versions: 0.10 > Reporter: Till Rohrmann > > When executing the {{RollingSinkFaultToleranceITCase}} with Hadoop 2.7.1, > then the test either does not finish because it's stuck in an endless restart > loop with the following exception > {code} > java.lang.Exception: Could not restore checkpointed state to operators and > functions > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Failed to restore state to function: > In-Progress file hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was > neither moved to pending nor is still in progress. > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406) > ... 3 more > Caused by: java.lang.RuntimeException: In-Progress file > hdfs://127.0.0.1:52884/string-non-rolling-out/part-0-1 was neither moved to > pending nor is still in progress. > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:670) > at > org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:120) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:162) > ... 4 more > {code} > or it fails because the number of read strings differs from the exactly-once > result (some strings are read multiple times). -- This message was sent by Atlassian JIRA (v6.3.4#6332)