[ 
https://issues.apache.org/jira/browse/GOBBLIN-1653?focusedWorklogId=774863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-774863
 ]

ASF GitHub Bot logged work on GOBBLIN-1653:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/May/22 01:06
            Start Date: 26/May/22 01:06
    Worklog Time Spent: 10m 
      Work Description: umustafi commented on code in PR #3514:
URL: https://github.com/apache/gobblin/pull/3514#discussion_r882231254


##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/spec/JobExecutionPlan.java:
##########
@@ -108,8 +109,13 @@ private static JobSpec buildJobSpec(FlowSpec flowSpec, 
Config jobConfig, Long fl
 
       // Modify the job name to include the flow group, flow name, edge id, 
and a random string to avoid collisions since
       // job names are assumed to be unique within a dag.
-      jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, 
flowName, jobName, edgeId, flowInputPath.hashCode());
-
+      int hash = flowInputPath.hashCode();
+      jobName = Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, 
flowName, jobName, edgeId, hash);
+      // jobNames are commonly used as a directory name, which is limited to 
255 characters
+      if (jobName.length() >= MAX_JOB_NAME_LENGTH) {
+        // shorten job length to be 128 characters (flowGroup) + hashCode 
length
+        jobName = 
Joiner.on(JOB_NAME_COMPONENT_SEPARATION_CHAR).join(flowGroup, hash);

Review Comment:
   is there any incentive to having jobName be the first 128 characters of the 
joined jobName you create above instead of only taking flowGroup if the name 
length exceeds max length? The hash code isn't 255-128=127 characters I'm 
guessing, but this approach will save us some space if that's a concern for the 
column. 





Issue Time Tracking
-------------------

    Worklog Id:     (was: 774863)
    Time Spent: 0.5h  (was: 20m)

> Long flownames and flowgroup combinations can exceed maximum component length 
> of folder
> ---------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1653
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1653
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-service
>            Reporter: William Lo
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Gobblin uses jobName to create folder paths for temporary work folders. In 
> GaaS, the jobName is composed of the flowGroup, flowName, edge ID, and some 
> hash. This combination can exceed the maximum folder component length if the 
> flowName and flowGroup approaches their maximums (128 characters). Instead of 
> enforcing a shorter flowGroup/flowName (which would require many db 
> migrations), we should shorten the jobName sent to Gobblin as it's only used 
> for temporary file storage.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to