Kuai Yu created GOBBLIN-484: ------------------------------- Summary: Propagate fork exception to task commit Key: GOBBLIN-484 URL: https://issues.apache.org/jira/browse/GOBBLIN-484 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu
>>> Today if exception occurred in task level, we will not propagate this >>> exception to the commit phase, which means in fork.commit, we will see some >>> exceptions like this : 2018/04/30 08:03:19.369 ERROR [Task] [Task-committing-pool-0] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed org.apache.gobblin.runtime.ForkException: Fork branches [0] failed for task task_DYNAMICS-CONTACT-438563007_1525075320170_0 at org.apache.gobblin.runtime.Task.commit(Task.java:884) at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:167) at org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:162) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) >>> However the root cause of exception happened earlier before the commit >>> phase, which is in the task run() stage, some records failed to process: 2018/04/30 08:03:19.352 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Processing record incurs an unexpected exception: java.lang.IllegalStateException: Fork 0 of task task_DYNAMICS-CONTACT-438563007_1525075320170_0 has failed and is no longer running at org.apache.gobblin.runtime.fork.Fork.putRecord(Fork.java:285) at org.apache.gobblin.runtime.Task.processRecord(Task.java:778) at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:459) at org.apache.gobblin.runtime.Task.run(Task.java:341) at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443) at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2018/04/30 08:03:19.353 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed java.lang.RuntimeException at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:464) at org.apache.gobblin.runtime.Task.run(Task.java:341) at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443) at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2018/04/30 08:03:19.368 INFO [com_2792] [TaskState >>> Now further look into the problem, we know it is due to the record >>> processing timeout from espresso writer: 2018/04/30 08:03:19.348 ERROR [Fork-0] [ForkExecutor-0] [gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Fork 0 of task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed to process data records java.io.IOException: java.util.concurrent.ExecutionException: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on async write at org.apache.gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:143) at org.apache.gobblin.writer.RetryWriter.writeEnvelope(RetryWriter.java:123) at org.apache.gobblin.runtime.fork.Fork.processRecord(Fork.java:492) at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:103) at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:86) at org.apache.gobblin.runtime.fork.Fork.run(Fork.java:238) at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on async write at ligobblin.shaded.com.github.rholder.retry.Retryer$ExceptionAttempt.<init>(Retryer.java:254) at ligobblin.shaded.com.github.rholder.retry.Retryer.call(Retryer.java:163) at ligobblin.shaded.com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318) at org.apache.gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:141) ... 11 more Caused by: org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on async write at org.apache.gobblin.writer.AsyncWriterManager.maybeThrow(AsyncWriterManager.java:309) at org.apache.gobblin.writer.AsyncWriterManager.write(AsyncWriterManager.java:271) at org.apache.gobblin.writer.AsyncWriterManager.writeEnvelope(AsyncWriterManager.java:259) at org.apache.gobblin.writer.CloseOnFlushWriterWrapper.writeEnvelope(CloseOnFlushWriterWrapper.java:93) at org.apache.gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeEnvelope(InstrumentedDataWriterDecorator.java:75) at org.apache.gobblin.writer.PartitionedDataWriter.writeEnvelope(PartitionedDataWriter.java:161) at org.apache.gobblin.writer.ThrottleWriter.writeEnvelope(ThrottleWriter.java:131) at org.apache.gobblin.writer.RetryWriter$2.call(RetryWriter.java:118) at org.apache.gobblin.writer.RetryWriter$2.call(RetryWriter.java:115) at ligobblin.shaded.com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) at ligobblin.shaded.com.github.rholder.retry.Retryer.call(Retryer.java:160) ... 13 more Caused by: java.lang.RuntimeException: java.io.IOException: java.util.concurrent.TimeoutException at org.apache.gobblin.proxies.EspressoProxy.getRecordsPerGetRequest(EspressoProxy.java:199) at org.apache.gobblin.proxies.EspressoProxy.get(EspressoProxy.java:216) at org.apache.gobblin.writer.http.espresso.EspressoWriter.changeExist(EspressoWriter.java:81) at org.apache.gobblin.writer.http.espresso.EspressoMultiputWriter$1.call(EspressoMultiputWriter.java:89) at org.apache.gobblin.writer.http.espresso.EspressoMultiputWriter$1.call(EspressoMultiputWriter.java:86) ... 4 more Caused by: java.io.IOException: java.util.concurrent.TimeoutException at com.linkedin.espresso.client.r2d2impl.R2D2EspressoClient.execute(R2D2EspressoClient.java:560) at org.apache.gobblin.proxies.EspressoProxy.getRecordsPerGetRequest(EspressoProxy.java:162) ... 8 more -- This message was sent by Atlassian JIRA (v7.6.3#76005)