Kip Kohn created GOBBLIN-2188: --------------------------------- Summary: Allow use of stateful writer/converter `Initializer`s with Gobblin-on-Temporal Key: GOBBLIN-2188 URL: https://issues.apache.org/jira/browse/GOBBLIN-2188 Project: Apache Gobblin Issue Type: New Feature Components: gobblin-api Reporter: Kip Kohn Assignee: Hung Tran
Stateful writer/converter `Initializer`s, such as `JdbcWriterInitializer`, work fine with Gobblin-on-MR, but get disrupted by GoT. While GoMR does also launch an MR application, the remainder of the `MRJobLauncher` execution is within the same process. `Initializer`s must execute at the end of WorkDiscovery, before `WorkUnit` processing may begin, but are `.close()`d only after Job Commit completes. Crucially, with GoMR, the same `Initializer` instances remain in memory all throughout. With GoT, in contrast, Work Discovery and Commit execute completely independently - creating new objects/instances, perhaps even on a new host/container. Problem: Some `Initializer`s, such as the `JdbcWriter`'s `JdbcWriterInitializer` are stateful. (In its case, to maintain the temp/staging table's name, so that may be dropped upon successful Commit.) Specific state originates during Work Discovery (the `GenerateWorkUnitsImpl` activity in GoT) yet must be available during Commit (the `CommitActivityImpl` in GoT). Solution: Use the Memento (GoF) Pattern to enable `Initializer`s to convey arbitrary state from one concrete `Initializer` instance to another of the same type. Leverage the `JobState` to tunnel mementos, since it is serialized at the end of Work Discovery, to be loaded later as the Commit activity begins. -- This message was sent by Atlassian Jira (v8.20.10#820010)