[
https://issues.apache.org/jira/browse/GOBBLIN-2033?focusedWorklogId=913526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-913526
]
ASF GitHub Bot logged work on GOBBLIN-2033:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 08/Apr/24 21:25
Start Date: 08/Apr/24 21:25
Worklog Time Spent: 10m
Work Description: rongshen commented on PR #3910:
URL: https://github.com/apache/gobblin/pull/3910#issuecomment-2043669144
> Two questions here:
>
> 1. Why do we try to use taskAttempId instead of one random UUID? It's
still possible that same task get retried in the same container?
It is possible that same path could be reused, the orphan file removal
function in the publisher will clean up the path before writing new data. This
change is to fix the issue found while testing orphan file deletions that a
container in the shutdown can still run even though the task has been
reassigned by Helix.
> 2. Why do we want to avoid removing orphan files from the previous run?
Any concern related to that?
Orphan file removal should be enabled together with this change. This change
is prerequisite for orphan file removal because this change is to make sure
taskOutput path is unique for each work unit.
Issue Time Tracking
-------------------
Worklog Id: (was: 913526)
Time Spent: 40m (was: 0.5h)
> Append HelixInstanceName to task runner staging paths
> -----------------------------------------------------
>
> Key: GOBBLIN-2033
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2033
> Project: Apache Gobblin
> Issue Type: Improvement
> Reporter: Matthew Ho
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> During the shutdown of a task runner, it's possible to write bad files to the
> staging area. On startup of a new task runner, it should not try to reuse
> that old file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)