[
https://issues.apache.org/jira/browse/GOBBLIN-1952?focusedWorklogId=889333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-889333
]
ASF GitHub Bot logged work on GOBBLIN-1952:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 07/Nov/23 19:54
Start Date: 07/Nov/23 19:54
Worklog Time Spent: 10m
Work Description: Will-Lo commented on code in PR #3822:
URL: https://github.com/apache/gobblin/pull/3822#discussion_r1385478350
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/spec/JobExecutionPlan.java:
##########
@@ -112,10 +112,10 @@ private static JobSpec buildJobSpec(FlowSpec flowSpec,
Config jobConfig, Long fl
// job names are assumed to be unique within a dag.
int hash = flowInputPath.hashCode();
jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup,
flowName, jobName, edgeId, hash);
- // jobNames are commonly used as a directory name, which is limited to
255 characters
+ // jobNames are commonly used as a directory name, which is limited to
255 characters (account for potential prefixes added/file name lengths)
if (jobName.length() >= MAX_JOB_NAME_LENGTH) {
- // shorten job length to be 128 characters (flowGroup) + (hashed)
flowName, hashCode length
- jobName =
Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup,
flowName.hashCode(), hash);
+ // shorten job length but make it uniquely identifiable in multihop
flows or concurrent jobs, max length 139 characters (128 flow group + hash)
+ jobName =
Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup,
jobName.hashCode());
Review Comment:
I think it will only be done on long flow names so it should be fine, I
wanted to get a hash of the jobName and not reuse the prior hash so that it can
cover scenarios where there is the same input path in a multi hop flow with
multiple edges
Issue Time Tracking
-------------------
Worklog Id: (was: 889333)
Time Spent: 1h 20m (was: 1h 10m)
> GaaS JobNames with long lengths cause issues with HDFS folders
> --------------------------------------------------------------
>
> Key: GOBBLIN-1952
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1952
> Project: Apache Gobblin
> Issue Type: Bug
> Components: gobblin-service
> Reporter: William Lo
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Gobblin-as-a-Service creates jobnames using flowgroups flownames, edges, and
> jobnames from the template. However, this tends to create a very long string
> which then causes issues in Gobblin job when creating files that use the
> jobname to create working directories or state stores. Although there has
> been previous code that shortens job name lengths, we want to further
> increase this by being more aggressive with the maximum length of the jobname
> to reduce the odds of exceeding 255 chars (max length of HDFS component)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)