jerqi commented on issue #1030: URL: https://github.com/apache/incubator-uniffle/issues/1030#issuecomment-1647102736
> **Regarding upload to S3**: As long as you use the Apache HDFS S3A adapter you can stream data to an object store. However you can only append as long as you keep the stream open and you can only do so from a single client. The [S3A filesystem implementation uses buffered multi-part uploads to stream a file to an object store](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#How_S3A_writes_data_to_S3). Streaming from multiple clients should be possible in principle, but the coordination overhead and the way Java streams are implemented make things tricky. > > **Regarding `.index` files**: For better performance you can always cache the index files, or serve them from a different location (Redis, Uniffle Coordinator, etc...). > > **Regarding `list` support**: You can always store the list of objects somewhere else if you want to avoid any expensive file-listing operations. `spark-s3-shuffle` only uses listings when it needs to delete objects. Thanks for your inputs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org