[jira] [Work logged] (GOBBLIN-2033) Append HelixInstanceName to task runner staging paths

ASF GitHub Bot (Jira) Mon, 08 Apr 2024 14:26:29 -0700


     [ 
https://issues.apache.org/jira/browse/GOBBLIN-2033?focusedWorklogId=913526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-913526
 ]


ASF GitHub Bot logged work on GOBBLIN-2033:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Apr/24 21:25
            Start Date: 08/Apr/24 21:25
    Worklog Time Spent: 10m 
      Work Description: rongshen commented on PR #3910:
URL: https://github.com/apache/gobblin/pull/3910#issuecomment-2043669144

   > Two questions here:
   > 
   > 1. Why do we try to use taskAttempId instead of one random UUID? It's 
still possible that same task get retried in the same container?
   It is possible that same path could be reused, the orphan file removal 
function in the publisher will clean up the path before writing new data. This 
change is to fix the issue found while testing orphan file deletions that a 
container in the shutdown can still run even though the task has been 
reassigned by Helix.
   
   > 2. Why do we want to avoid removing orphan files from the previous run? 
Any concern related to that?
   Orphan file removal should be enabled together with this change. This change 
is prerequisite for orphan file removal because this change is to make sure 
taskOutput path is unique for each work unit.
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 913526)
    Time Spent: 40m  (was: 0.5h)

> Append HelixInstanceName to task runner staging paths
> -----------------------------------------------------
>
>                 Key: GOBBLIN-2033
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2033
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Matthew Ho
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> During the shutdown of a task runner, it's possible to write bad files to the 
> staging area. On startup of a new task runner, it should not try to reuse 
> that old file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (GOBBLIN-2033) Append HelixInstanceName to task runner staging paths

Reply via email to