Kip Kohn created GOBBLIN-2188:
---------------------------------

             Summary: Allow use of stateful writer/converter `Initializer`s 
with Gobblin-on-Temporal
                 Key: GOBBLIN-2188
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2188
             Project: Apache Gobblin
          Issue Type: New Feature
          Components: gobblin-api
            Reporter: Kip Kohn
            Assignee: Hung Tran


Stateful writer/converter `Initializer`s, such as `JdbcWriterInitializer`, work 
fine with Gobblin-on-MR, but get disrupted by GoT.  While GoMR does also launch 
an MR application, the remainder of the `MRJobLauncher` execution is within the 
same process.  `Initializer`s must execute at the end of WorkDiscovery, before 
`WorkUnit` processing may begin, but are `.close()`d only after Job Commit 
completes.  Crucially, with GoMR, the same `Initializer` instances remain in 
memory all throughout.  With GoT, in contrast, Work Discovery and Commit 
execute completely independently - creating new objects/instances, perhaps even 
on a new host/container.

Problem: Some `Initializer`s, such as the `JdbcWriter`'s 
`JdbcWriterInitializer` are stateful.  (In its case, to maintain the 
temp/staging table's name, so that may be dropped upon successful Commit.)  
Specific state originates during Work Discovery (the `GenerateWorkUnitsImpl` 
activity in GoT) yet must be available during Commit (the `CommitActivityImpl` 
in GoT).

Solution: Use the Memento (GoF) Pattern to enable `Initializer`s to convey 
arbitrary state from one concrete `Initializer` instance to another of the same 
type.  Leverage the `JobState` to tunnel mementos, since it is serialized at 
the end of Work Discovery, to be loaded later as the Commit activity begins.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to