Arpit Varshney created GOBBLIN-2105:
---------------------------------------

             Summary: Ensure the destination path does not exist before 
renaming during Gobblin compaction.
                 Key: GOBBLIN-2105
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2105
             Project: Apache Gobblin
          Issue Type: Improvement
          Components: gobblin-compaction
            Reporter: Arpit Varshney
            Assignee: Issac Buenrostro


As part of Gobblin compaction (deduplication), compacted files are moved from 
staging to their final location at the end of the process. This movement is 
handled by the 
org.apache.gobblin.compaction.action.CompactionCompleteFileOperationAction#onCompactionJobComplete
 method, which determines the appropriate destination path and moves the 
compacted files accordingly.



Current Issue:

- If the flag compaction.rename.source.dir.enabled is set to false (not in 
append mode) and recompaction.write.to.new.folder is set to true, a new 
directory is determined based on the execution count derived from the state 
file.
- The state file, however, is generated after the move to the final location. 
If there are any failures during this move, the state file will be incorrect.
- In the next execution, the determined destination path might already exist. 
This will cause the rename operation to create an additional child directory, 
as is the behavior of HDFS rename when the destination directory already exists.



Requirement:

We need to ensure that the destination path determined must not exist before 
the rename operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to