Joel Baranick created GOBBLIN-227:
-------------------------------------
Summary: JobLauncherUtils.cleanTaskStagingData fails for jobs with
forks
Key: GOBBLIN-227
URL: https://issues.apache.org/jira/browse/GOBBLIN-227
Project: Apache Gobblin
Issue Type: Bug
Reporter: Joel Baranick
*Precondition:*
Using Hocon configuration and have two forks configured.
*Summary:*
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it
tries to lookup {{writer.staging.dir}} in the configuration and fails.
*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser
encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is
now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:
{code:javascript}
{
"writer": {
"staging": {
"dir": {
"0": "/foo",
"1": "/foo"
}
}
}
}
{code}
Fork specific configuration uses the same config keys as regular configuration
except the fork number is appended like: {{.1}}. The code that looks up fork
specific configuration doesn't automatically fallback to regular configuration.
For example, if the code is trying to find {{writer.staging.dir.0}} and it
isn't configured, the job will fail. Then means that all forks must configure
fork specific versions of {{writer.staging.dir}}.
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it
cleans up the based on the current job's configuration. Because of this,
{{fork.branches}} is always set to {{1}}. The call to
{{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is make with
{{numBranches=1}} and {{branchId=0}}. This results in the method looking for
{{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value
{{writer.staging.dir}} doesn't exist and the job fails.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)