[
https://issues.apache.org/jira/browse/GOBBLIN-1948?focusedWorklogId=888544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-888544
]
ASF GitHub Bot logged work on GOBBLIN-1948:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 03/Nov/23 02:48
Start Date: 03/Nov/23 02:48
Worklog Time Spent: 10m
Work Description: Will-Lo commented on code in PR #3819:
URL: https://github.com/apache/gobblin/pull/3819#discussion_r1381115387
##########
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/api/FlowSpec.java:
##########
@@ -423,6 +423,17 @@ public boolean isScheduled() {
return getConfig().hasPath(ConfigurationKeys.JOB_SCHEDULE_KEY);
}
+ /**
+ * Create a new FlowSpec object with the added property defined by path and
value parameters
+ * @param path key for new property
+ * @param value
+ */
+ public FlowSpec addProperty(String path, String value) {
+ Properties properties = this.getConfigAsProperties();
+ properties.setProperty(path, value);
+ return new
Builder(this.getUri()).withConfigAsProperties(properties).build();
Review Comment:
My concern here is mostly that it would create a decent amount of object
churn, given the size of configs it should lead to an increase of jvm GC given
that this would happen once per execution.
That being said it might be an overoptimization to enforce such a rule until
you instrument results on a high load, the alternative would be messy, need to
add a way for compileSpec() to take in a flow execution ID and enforce using
that ID instead of using the provided one in the flow config or from the system
time, which would also lead to a bit of an anti-pattern of compiler expecting
to use either the flow config's ID (that you provided here) or it's own
generated ID.
Issue Time Tracking
-------------------
Worklog Id: (was: 888544)
Time Spent: 1h 20m (was: 1h 10m)
> Use Consist flowExecutionId Across Participants
> -----------------------------------------------
>
> Key: GOBBLIN-1948
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1948
> Project: Apache Gobblin
> Issue Type: New Feature
> Components: gobblin-service
> Reporter: Urmi Mustafi
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> A consistent flowExecutionId is used during lease arbitration between hosts
> and that id is available to the DagActionStoreChangeMonitor but not used when
> passing launch events to the Orchestrator for recompilation and eventual
> execution. The unintended consequence of this is that a different
> flowExecutionId will be used across each participant when the flow is
> recompiled before passing to the DagManager. This messes up the job status of
> the most recent flow execution of each flow as it appears there are N flow
> execution Ids, 1 for each of the N hosts. Only one will be executed but the
> other N-1 are stuck in "compiling" job status state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)