Re: [PR] [CELEBORN-1469] Support writing shuffle data to OSS(S3 only) [celeborn]

via GitHub Tue, 30 Jul 2024 21:09:27 -0700


maobaolong commented on code in PR #2579:
URL: https://github.com/apache/celeborn/pull/2579#discussion_r1697871184



##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/storage/FlushTask.scala:
##########
@@ -55,3 +56,31 @@ private[worker] class HdfsFlushTask(
     hdfsStream.close()
   }
 }
+
+private[worker] class S3FlushTask(
+    buffer: CompositeByteBuf,
+    val path: Path,
+    notifier: FlushNotifier,
+    keepBuffer: Boolean) extends FlushTask(buffer, notifier, keepBuffer) {
+  override def flush(): Unit = {
+    if (StorageManager.hadoopFs.exists(path)) {
+      val conf = StorageManager.hadoopFs.getConf

Review Comment:
   @zhaohehuhu Is there another approach to support integrated a new 
`Filesystem` like s3 which cannot support `append`? Looks it is Inefficient, it 
copy the old data from s3 to worker and write to s3 again, this can scale out 
the write extremely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CELEBORN-1469] Support writing shuffle data to OSS(S3 only) [celeborn]

Reply via email to