[
https://issues.apache.org/jira/browse/GOBBLIN-1558?focusedWorklogId=673915&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-673915
]
ASF GitHub Bot logged work on GOBBLIN-1558:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 02/Nov/21 21:20
Start Date: 02/Nov/21 21:20
Worklog Time Spent: 10m
Work Description: Will-Lo commented on a change in pull request #3409:
URL: https://github.com/apache/gobblin/pull/3409#discussion_r741406740
##########
File path:
gobblin-core/src/main/java/org/apache/gobblin/publisher/BaseDataPublisher.java
##########
@@ -487,6 +480,15 @@ protected void addSingleTaskWriterOutputToExistingDir(Path
writerOutputDir, Path
}
}
+ protected void addWriterOutputToNewDir(Path writerOutputDir, Path
publisherOutputDir,
+ WorkUnitState workUnitState, int branchId, ParallelRunner parallelRunner)
+ throws IOException {
+ // Create the parent directory of the final output directory if it does
not exist
+
WriterUtils.mkdirsWithRecursivePermissionWithRetry(this.publisherFileSystemByBranches.get(branchId),
+ publisherOutputDir.getParent(), this.permissions.get(branchId),
retrierConfig);
Review comment:
Is there a reason why we're omitting the set output dir groups? I
believe it's needed for permissions if configured
```
if(this.publisherOutputDirOwnerGroupByBranches.get(branchId).isPresent()) {
LOG.info(String.format("Setting path %s group to %s",
publisherOutputDir.toString(),
this.publisherOutputDirOwnerGroupByBranches.get(branchId).get()));
HadoopUtils.setGroup(this.publisherFileSystemByBranches.get(branchId),
publisherOutputDir,
this.publisherOutputDirOwnerGroupByBranches.get(branchId).get());
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 673915)
Time Spent: 40m (was: 0.5h)
> Overwrite BaseDataPublisher behavior when parent-folder doesn't exist and use
> it in TimePartitionedDataPublisher
> ----------------------------------------------------------------------------------------------------------------
>
> Key: GOBBLIN-1558
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1558
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-core
> Reporter: Joseph Allemandou
> Assignee: Abhishek Tiwari
> Priority: Minor
> Time Spent: 40m
> Remaining Estimate: 0h
>
> It's currently not possible to overwrite the behavior of the publisher when
> publishing to non-existing parent directory (first run of job type for
> instance). This would be needed to make TimePartitionDataPublisher save
> recordPublisherOutputDirs at lowest granularity (detailed subfolders).
>
> BaseDataPublisher: Extract new method `addWriterOutputToNewDir` that goes
> with the already existing `addWriterOutputToExistingDir`. No test needed, the
> code is no-op on class behavior.
> TimePartitionedDataPublisher: Override the new `addWriterOutputToNewDir`
> method to create the publisher parent folder and reuse the
> `addWriterOutputToExistingDir` method. Rename and update
> TimePartitionedStreamingDataPublisherTest class to actually test
> TimePartitionedDataPublisher.
> TimePartitionedStreamingDataPublisher: Remove publisher parent folder
> creation as it managed in TimePartitionedDataPublisher superclass. Add test
> based on the copied TimePartitionedStreamingDataPublisherTest to verify the
> change also works for this child class.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)