yuanlihan commented on code in PR #14984:
URL: https://github.com/apache/druid/pull/14984#discussion_r1326796012


##########
extensions-core/hdfs-storage/src/main/java/org/apache/druid/storage/hdfs/HdfsDataSegmentPusher.java:
##########
@@ -124,8 +124,18 @@ public DataSegment pushToPath(File inDir, DataSegment 
segment, String storageDir
         fullyQualifiedStorageDirectory.get(),
         storageDirSuffix
     );
-
-    final String storageDir = StringUtils.format("%s/%s", 
fullyQualifiedStorageDirectory.get(), storageDirSuffix);
+    
+    final String outIndexFilePath;
+    if (storageDirSuffix.endsWith("index.zip")) {

Review Comment:
   @kfaraz Thanks for looking into this! 
   
   My first intuition was also trying to fix 
`DeepStorageIntermediaryDataManager` by passing a file path contains 
`index.zip` suffix here
   
https://github.com/apache/druid/blob/3ae5e978012e7a553e6ad3007eb210ab8da5b6fc/indexing-service/src/main/java/org/apache/druid/indexing/worker/shuffle/DeepStorageIntermediaryDataManager.java#L78
   
   But after checking other implementations of `DataSegmentPusher`, I found 
they all call method 
[pushToPath](https://github.com/apache/druid/blob/3ae5e978012e7a553e6ad3007eb210ab8da5b6fc/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentPusher.java#L71)
 in their 
[push](https://github.com/apache/druid/blob/3ae5e978012e7a553e6ad3007eb210ab8da5b6fc/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentPusher.java#L69)
 method. Some of them format segment file path contains `index.zip` suffix and 
pass it by parameter `storageDirSuffix` to method `pushToPath`, see below
   1. `AzureDataSegmentPusher`
   
https://github.com/apache/druid/blob/3ae5e978012e7a553e6ad3007eb210ab8da5b6fc/extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureDataSegmentPusher.java#L124-L125
   2. `HdfsDataSegmentPusher`
   
https://github.com/apache/druid/blob/279b3818f0e02f6a2ff389532f1e6118598a7c24/extensions-core/hdfs-storage/src/main/java/org/apache/druid/storage/hdfs/HdfsDataSegmentPusher.java#L108-L115
   
   Other implementations format `storageDirSuffix` by calling 
[getStorageDir](https://github.com/apache/druid/blob/279b3818f0e02f6a2ff389532f1e6118598a7c24/processing/src/main/java/org/apache/druid/segment/loading/DataSegmentPusher.java#L89)
 in their `push` methods.
   
   To fix the issue, I'd prefer to fix the inconsistent usages of `pushToPath` 
in method `AzureDataSegmentPusher#push` and `HdfsDataSegmentPusher#push`. We 
should always pass a dir to parameter `storageDirSuffix` of method `pushToPath` 
and format segment file path with suffix `index.zip` in method `pushToPath`. 
So, we don't need to change `DeepStorageIntermediaryDataManager` as it passes a 
dir to method `pushToPath`. Does it make sense to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to