Kuai Yu created GOBBLIN-484:
-------------------------------

             Summary: Propagate fork exception to task commit
                 Key: GOBBLIN-484
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-484
             Project: Apache Gobblin
          Issue Type: Improvement
            Reporter: Kuai Yu
            Assignee: Kuai Yu


>>> Today if exception occurred in task level, we will not propagate this 
>>> exception to the commit phase, which means in fork.commit, we will see some 
>>> exceptions like this :

2018/04/30 08:03:19.369 ERROR [Task] [Task-committing-pool-0] 
[gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Task 
task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed
org.apache.gobblin.runtime.ForkException: Fork branches [0] failed for task 
task_DYNAMICS-CONTACT-438563007_1525075320170_0
at org.apache.gobblin.runtime.Task.commit(Task.java:884)
at 
org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:167)
at 
org.apache.gobblin.runtime.GobblinMultiTaskAttempt$1$1.call(GobblinMultiTaskAttempt.java:162)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

>>> However the root cause of exception happened earlier before the commit 
>>> phase, which is in the task run() stage, some records failed to process:

2018/04/30 08:03:19.352 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] 
[DYNAMICS-CONTACT-438563007_1525075320170] Processing record incurs an 
unexpected exception:
java.lang.IllegalStateException: Fork 0 of task 
task_DYNAMICS-CONTACT-438563007_1525075320170_0 has failed and is no longer 
running
at org.apache.gobblin.runtime.fork.Fork.putRecord(Fork.java:285)
at org.apache.gobblin.runtime.Task.processRecord(Task.java:778)
at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:459)
at org.apache.gobblin.runtime.Task.run(Task.java:341)
at 
org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
at 
org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018/04/30 08:03:19.353 ERROR [Task] [TaskExecutor-1] [gobblin-cluster-worker] 
[DYNAMICS-CONTACT-438563007_1525075320170] Task 
task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed
java.lang.RuntimeException
at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:464)
at org.apache.gobblin.runtime.Task.run(Task.java:341)
at 
org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
at 
org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018/04/30 08:03:19.368 INFO [com_2792] [TaskState

>>> Now further look into the problem, we know it is due to the record 
>>> processing timeout from espresso writer:

2018/04/30 08:03:19.348 ERROR [Fork-0] [ForkExecutor-0] 
[gobblin-cluster-worker] [DYNAMICS-CONTACT-438563007_1525075320170] Fork 0 of 
task task_DYNAMICS-CONTACT-438563007_1525075320170_0 failed to process data 
records
java.io.IOException: java.util.concurrent.ExecutionException: 
org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on 
async write
at org.apache.gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:143)
at org.apache.gobblin.writer.RetryWriter.writeEnvelope(RetryWriter.java:123)
at org.apache.gobblin.runtime.fork.Fork.processRecord(Fork.java:492)
at 
org.apache.gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:103)
at 
org.apache.gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:86)
at org.apache.gobblin.runtime.fork.Fork.run(Fork.java:238)
at 
org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.gobblin.exception.NonTransientException: Irrecoverable failure on 
async write
at 
ligobblin.shaded.com.github.rholder.retry.Retryer$ExceptionAttempt.<init>(Retryer.java:254)
at ligobblin.shaded.com.github.rholder.retry.Retryer.call(Retryer.java:163)
at 
ligobblin.shaded.com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
at org.apache.gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:141)
... 11 more
Caused by: org.apache.gobblin.exception.NonTransientException: Irrecoverable 
failure on async write
at 
org.apache.gobblin.writer.AsyncWriterManager.maybeThrow(AsyncWriterManager.java:309)
at 
org.apache.gobblin.writer.AsyncWriterManager.write(AsyncWriterManager.java:271)
at 
org.apache.gobblin.writer.AsyncWriterManager.writeEnvelope(AsyncWriterManager.java:259)
at 
org.apache.gobblin.writer.CloseOnFlushWriterWrapper.writeEnvelope(CloseOnFlushWriterWrapper.java:93)
at 
org.apache.gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeEnvelope(InstrumentedDataWriterDecorator.java:75)
at 
org.apache.gobblin.writer.PartitionedDataWriter.writeEnvelope(PartitionedDataWriter.java:161)
at 
org.apache.gobblin.writer.ThrottleWriter.writeEnvelope(ThrottleWriter.java:131)
at org.apache.gobblin.writer.RetryWriter$2.call(RetryWriter.java:118)
at org.apache.gobblin.writer.RetryWriter$2.call(RetryWriter.java:115)
at 
ligobblin.shaded.com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at ligobblin.shaded.com.github.rholder.retry.Retryer.call(Retryer.java:160)
... 13 more
Caused by: java.lang.RuntimeException: java.io.IOException: 
java.util.concurrent.TimeoutException
at 
org.apache.gobblin.proxies.EspressoProxy.getRecordsPerGetRequest(EspressoProxy.java:199)
at org.apache.gobblin.proxies.EspressoProxy.get(EspressoProxy.java:216)
at 
org.apache.gobblin.writer.http.espresso.EspressoWriter.changeExist(EspressoWriter.java:81)
at 
org.apache.gobblin.writer.http.espresso.EspressoMultiputWriter$1.call(EspressoMultiputWriter.java:89)
at 
org.apache.gobblin.writer.http.espresso.EspressoMultiputWriter$1.call(EspressoMultiputWriter.java:86)
... 4 more
Caused by: java.io.IOException: java.util.concurrent.TimeoutException
at 
com.linkedin.espresso.client.r2d2impl.R2D2EspressoClient.execute(R2D2EspressoClient.java:560)
at 
org.apache.gobblin.proxies.EspressoProxy.getRecordsPerGetRequest(EspressoProxy.java:162)
... 8 more
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to