autumnust opened a new pull request #3265: URL: https://github.com/apache/gobblin/pull/3265
Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-XXX ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): The test in the `TestSingleTask` has been failing when being executed on CI. It is also reproducible by running: ```./gradlew :gobblin-cluster:test --tests org.apache.gobblin.cluster.TestSingleTask``` The real problem is more complicated and the proper fix will be out of the scope for now. The issue can be described in the following: - Without overriding `shutdown` method in the `dummyExtrator`, it throws exception when being called. Note that this code path is only executed when `shutdownRequested` is set (only be set when a `Task` is being cancelled), as it wrote in `Task.java#510`, not through the regular shutdown of extractor. - Before the change of GOBBLIN-1416 it was not surfacing up because the sync-barrier was placed in the wrong place. Will expand it into more details later. In short, from the log, it seems the cancel call and run call was running sequentially. (See the log paragraph I in the bottom) - After GOBBLIN-1416 this test failed, reasoning being the `org.apache.gobblin.runtime.GobblinMultiTaskAttempt#run #167` is returning as a result of cancel, so that the real cancel code-path was being executed so that the `shutdown` method in the `dummyExtractor` is called. However the default implementation in the `Extractor` interface throws an error and it eventually blow up the task. A supporting evidence for this process is the status of fork was printed as "RUNNING" from the failing tests, which means the return from `org.apache.gobblin.runtime.GobblinMultiTaskAttempt#run #167` doesn't handle the state of fork within the cancelled task. The follow up of this PR should address: - The sync-barrier, `org.apache.gobblin.cluster.SingleTask#_taskAttemptBuilt` should already be sync between the creation of `List<Task>` in the taskAttempt (instead of the creation of `taskAttempt` and cancel call). GOBBLIN-1416 was trying to solve this problem in a different way. I believe it could be unified. - The handling of fork state when a task is being shutdown is probably something that should be fixed also. Paragraph I ```2021-04-19 16:23:09 PDT INFO [pool-32-thread-1] org.apache.gobblin.cluster.SingleTask 225 - Task cancelled: Shutdown starting for tasks with jobId: testJob 2021-04-19 16:23:09 PDT INFO [pool-32-thread-1] org.apache.gobblin.runtime.GobblinMultiTaskAttempt 255 - Shutting down tasks 2021-04-19 16:23:09 PDT INFO [pool-32-thread-1] org.apache.gobblin.cluster.SingleTask 227 - Task cancelled: Shutdown complete for tasks with jobId: testJob 2021-04-19 16:23:09 PDT INFO [pool-32-thread-2] org.apache.gobblin.runtime.GobblinMultiTaskAttempt$2 510 - Task creation attempt 1 2021-04-19 16:23:09 PDT WARN [pool-32-thread-2] org.apache.gobblin.metrics.MetricContext$Builder 714 - MetricContext with specified name already exists, appending UUID to the given name: 2f98aa98-1c81-4770-85e9-7731ee3afff1 2021-04-19 16:23:09 PDT INFO [pool-32-thread-2] org.apache.gobblin.runtime.TaskExecutor 259 - Submitting task randomTask 2021-04-19 16:23:09 PDT WARN [TaskExecutor-0] org.apache.gobblin.runtime.Task 362 - Synchronous task execution model is deprecated. Please consider using stream model. 2021-04-19 16:23:09 PDT INFO [pool-32-thread-2] org.apache.gobblin.runtime.GobblinMultiTaskAttempt 167 - Waiting for submitted tasks of job testJob to complete in container ... 2021-04-19 16:23:09 PDT INFO [pool-32-thread-2] org.apache.gobblin.runtime.GobblinMultiTaskAttempt 175 - 1 out of 1 tasks of job testJob are running in container 2021-04-19 16:23:09 PDT INFO [TaskExecutor-0] org.apache.gobblin.runtime.TaskExecutor 280 - Submitting fork 0 of task randomTask 2021-04-19 16:23:09 PDT INFO [TaskExecutor-0] org.apache.gobblin.runtime.Task 460 - Task mode streaming = false 2021-04-19 16:23:09 PDT INFO [ForkExecutor-0] org.apache.gobblin.runtime.TaskContext 375 - Found configured writer builder as org.apache.gobblin.cluster.InMemoryWuSingleTask$DummyDataWriterBuilder``` ### Tests Running `./gradlew :gobblin-cluster:test --tests org.apache.gobblin.cluster.TestSingleTask` now passes, and the logging seems to be right after setting `testLogging.showStandardStreams = true` ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
