Xuehai-Chen opened a new issue #2950:
URL: https://github.com/apache/hudi/issues/2950


   **_Tips before filing an issue_**
   
   - Have you gone through our 
[FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   The Streaming query crash after about 10 mins running.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Build from source with branch [master], the version is 0.9.0-SNAPSHOT. 
HUDI-1880 is already included.
   2. Start a flink streaming job, write data to hudi table [tableA].
   3. Streaming query [tableA] using flink sql-client.
   
   **Expected behavior**
   Streaming query running until I cancel it.
   
   
   **Environment Description**
   
   * Hudi version : 0.9.0-SNAPSHOT, with HUDI-1880
   
   * Spark version : NONE
   
   * Hive version : NONE
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   **Additional context**
   
![image](https://user-images.githubusercontent.com/12780483/118247231-878b0200-b4d5-11eb-9902-d7c43e3a5dfd.png)
   
   
![image](https://user-images.githubusercontent.com/12780483/118247150-6cb88d80-b4d5-11eb-97d5-772f6dc26600.png)
   
   
   **Stacktrace**
   
   ```2021-05-14 16:11:00,734 WARN  org.apache.flink.table.client.cli.CliClient 
                 [] - Could not execute SQL statement.
   org.apache.flink.table.client.gateway.SqlExecutionException: Error while 
retrieving result.
        at 
org.apache.flink.table.client.gateway.local.result.CollectStreamResult.lambda$startRetrieval$0(CollectStreamResult.java:96)
 ~[flink-sql-client_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:739)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
 ~[?:1.8.0_65]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
~[?:1.8.0_65]
        at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
~[?:1.8.0_65]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
~[?:1.8.0_65]
        at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) 
~[?:1.8.0_65]
   Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 
cdcba5438cefb67a14df58bff39d9d6e)
        at 
org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:125)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
~[?:1.8.0_65]
        at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$pollResourceAsync$22(RestClusterClient.java:665)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
~[?:1.8.0_65]
        at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:394)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[?:1.8.0_65]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[?:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_65]
   Caused by: org.apache.flink.client.program.ProgramInvocationException: Job 
failed (JobID: cdcba5438cefb67a14df58bff39d9d6e)
        at 
org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:125)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
~[?:1.8.0_65]
        at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$pollResourceAsync$22(RestClusterClient.java:665)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
~[?:1.8.0_65]
        at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:394)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[?:1.8.0_65]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[?:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_65]
   Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
execution failed.
        at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:123)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
~[?:1.8.0_65]
        at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$pollResourceAsync$22(RestClusterClient.java:665)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
~[?:1.8.0_65]
        at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:394)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) 
~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 ~[?:1.8.0_65]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[?:1.8.0_65]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[?:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_65]
   Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
        at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:118)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:80)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:233)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:224)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:215)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:669)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:89)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:447)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_65]
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_65]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_65]
        at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_65]
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:305)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:212)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
   Caused by: org.apache.hudi.exception.HoodieException: No files found for 
reading in user provided path.
        at 
org.apache.hudi.source.StreamReadMonitoringFunction.monitorDirAndForwardSplits(StreamReadMonitoringFunction.java:228)
 ~[hudi-flink-bundle_2.11-0.9.0-rc1.jar:0.9.0-rc1]
        at 
org.apache.hudi.source.StreamReadMonitoringFunction.run(StreamReadMonitoringFunction.java:180)
 ~[hudi-flink-bundle_2.11-0.9.0-rc1.jar:0.9.0-rc1]
        at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66) 
~[flink-dist_2.11-1.12.2.jar:1.12.2]
        at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:263)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to