[GitHub] [incubator-uniffle] jerqi commented on issue #1030: [Umbrella] Object Storage Support (help wanted)

via GitHub Sun, 23 Jul 2023 19:23:17 -0700


jerqi commented on issue #1030:
URL: 
https://github.com/apache/incubator-uniffle/issues/1030#issuecomment-1647102736


   > **Regarding upload to S3**: As long as you use the Apache HDFS S3A adapter 
you can stream data to an object store. However you can only append as long as 
you keep the stream open and you can only do so from a single client. The [S3A 
filesystem implementation uses buffered multi-part uploads to stream a file to 
an object 
store](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#How_S3A_writes_data_to_S3).
 Streaming from multiple clients should be possible in principle, but the 
coordination overhead and the way Java streams are implemented make things 
tricky.
   > 
   > **Regarding `.index` files**: For better performance you can always cache 
the index files, or serve them from a different location (Redis, Uniffle 
Coordinator, etc...).
   > 
   > **Regarding `list` support**: You can always store the list of objects 
somewhere else if you want to avoid any expensive file-listing operations. 
`spark-s3-shuffle` only uses listings when it needs to delete objects.
   
   Thanks for your inputs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-uniffle] jerqi commented on issue #1030: [Umbrella] Object Storage Support (help wanted)

Reply via email to