sv2000 commented on pull request #3158:
URL: https://github.com/apache/gobblin/pull/3158#issuecomment-797141187


   @Will-Lo Thanks for the PR. Going through the PR, I have a different 
proposal for how dataset-specific logic can be injected. We could invoke one or 
more configurable handler classes inside the AbstractJobLauncher right after 
the workunit creation step. For your use case, you could use the DATASET_URN 
property which is set in different source classes. You could provide an 
implementation of a handler that sets dataset-specific staging/output dirs. 
Please take a look at the TaskStateCollectorServiceHandler class as an example, 
which is invoked on task completion. What I am proposing here is an identical 
solution, but invoked before a job is launched.  
   
   This high level approach is leverageable for future use cases e.g. where a 
Gobblin pipeline needs to write to a Table (e.g. Iceberg) and we could define a 
handler that ensures table is created before the job is started. 
   
   Happy to discuss offline if you have questions. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to