Will-Lo commented on a change in pull request #3158:
URL: https://github.com/apache/gobblin/pull/3158#discussion_r589814422



##########
File path: 
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/CopySource.java
##########
@@ -370,9 +368,16 @@ public Void call() {
               workUnit.setProp(ConfigurationKeys.COPY_EXPECTED_SCHEMA, 
((ConfigBasedDataset) this.copyableDataset).getExpectedSchema());
             }
           }
-          if ((this.copyableDataset instanceof HiveDataset) && 
(state.getPropAsBoolean(ConfigurationKeys.IS_DATASET_STAGING_DIR_USED,false))) {
-            workUnit.setProp(DATASET_STAGING_DIR_PATH, ((HiveDataset) 
this.copyableDataset).getProperties().getProperty(DATASET_STAGING_PATH));
+
+          // Ensure that the writer temporary directories are contained within 
the dataset shard
+          if ((this.copyableDataset instanceof HiveDataset) && 
(state.getPropAsBoolean(ConfigurationKeys.USE_SHARDED_WRITER_DIRS,false))) {
+            String datasetPath = ((HiveDataset) 
this.copyableDataset).getProperties().getProperty(DATASET_STAGING_PATH);
+            workUnit.setProp(ConfigurationKeys.WRITER_STAGING_DIR, datasetPath 
+ ConfigurationKeys.STAGING_DIR_DEFAULT_SUFFIX + "/" + state

Review comment:
       @aplex Yes I was thinking whether or not to allow both configs and have 
the `WRITER_STAGING_DIR` config work as specifying the suffix when 
`USE_SHARDED_WRITER_DIRS` is enabled, or to keep them fully mutually exclusive. 
Will ask SRE for their input on this as it'll cause quite a few configurations 
to be modified, but that's be unavoidable regardless what we go with.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to