[ 
https://issues.apache.org/jira/browse/GOBBLIN-227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Baranick updated GOBBLIN-227:
----------------------------------
    Description: 
*Precondition:* 
Using Hocon configuration and have two forks configured.

*Summary:* 
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
tries to lookup {{writer.staging.dir}} in the configuration and fails.

*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser 
encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is 
now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:

{code:javascript}
{
  "writer": {
    "staging": {
      "dir": {
        "0": "/foo",
        "1": "/foo"
      }
    }
  }
}
{code}

Fork specific configuration uses the same config keys as regular configuration 
except the fork number is appended like: {{.1}}.  The code that looks up fork 
specific configuration doesn't automatically fallback to regular configuration. 
 For example, if the code is trying to find {{writer.staging.dir.0}} and it 
isn't configured, the job will fail.  Then means that all forks must configure 
fork specific versions of {{writer.staging.dir}}.

When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
cleans up the based on the current job's configuration.  Because of this, 
{{fork.branches}} is always set to {{1}}. The call to 
{{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is made with 
{{numBranches=1}} and {{branchId=0}}.  This results in the method looking for 
{{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value 
{{writer.staging.dir}} doesn't exist and the job fails.

  was:
*Precondition:* 
Using Hocon configuration and have two forks configured.

*Summary:* 
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
tries to lookup {{writer.staging.dir}} in the configuration and fails.

*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser 
encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is 
now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:

{code:javascript}
{
  "writer": {
    "staging": {
      "dir": {
        "0": "/foo",
        "1": "/foo"
      }
    }
  }
}
{code}

Fork specific configuration uses the same config keys as regular configuration 
except the fork number is appended like: {{.1}}.  The code that looks up fork 
specific configuration doesn't automatically fallback to regular configuration. 
 For example, if the code is trying to find {{writer.staging.dir.0}} and it 
isn't configured, the job will fail.  Then means that all forks must configure 
fork specific versions of {{writer.staging.dir}}.

When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it 
cleans up the based on the current job's configuration.  Because of this, 
{{fork.branches}} is always set to {{1}}. The call to 
{{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is make with 
{{numBranches=1}} and {{branchId=0}}.  This results in the method looking for 
{{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value 
{{writer.staging.dir}} doesn't exist and the job fails.


> JobLauncherUtils.cleanTaskStagingData fails for jobs with forks
> ---------------------------------------------------------------
>
>                 Key: GOBBLIN-227
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-227
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Joel Baranick
>
> *Precondition:* 
> Using Hocon configuration and have two forks configured.
> *Summary:* 
> When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} 
> it tries to lookup {{writer.staging.dir}} in the configuration and fails.
> *Details:*
> Hocon configuration doesn't allow the following config:
> {code:none}
> writer.staging.dir=/foo
> writer.staging.dir.0=/foo
> writer.staging.dir.1=/foo
> {code}
> Initially {{writer.staging.dir}} is of type String, but when the Hocon parser 
> encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} 
> is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
> The effective Hocon configuration is:
> {code:javascript}
> {
>   "writer": {
>     "staging": {
>       "dir": {
>         "0": "/foo",
>         "1": "/foo"
>       }
>     }
>   }
> }
> {code}
> Fork specific configuration uses the same config keys as regular 
> configuration except the fork number is appended like: {{.1}}.  The code that 
> looks up fork specific configuration doesn't automatically fallback to 
> regular configuration.  For example, if the code is trying to find 
> {{writer.staging.dir.0}} and it isn't configured, the job will fail.  Then 
> means that all forks must configure fork specific versions of 
> {{writer.staging.dir}}.
> When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} 
> it cleans up the based on the current job's configuration.  Because of this, 
> {{fork.branches}} is always set to {{1}}. The call to 
> {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is made 
> with {{numBranches=1}} and {{branchId=0}}.  This results in the method 
> looking for {{writer.staging.dir}}. Unfortunately, when using Hocon 
> configuration the value {{writer.staging.dir}} doesn't exist and the job 
> fails.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to