rdblue commented on PR #4792: URL: https://github.com/apache/iceberg/pull/4792#issuecomment-1153296951
@samredai, to review this, I started looking more into the boto3 API. It looks like the API that you're using isn't a streaming API, which is what we typically want so that we can avoid things like buffering whole files in memory before writing them with a single PUT. When I went looking more into how to use boto3 for streaming reads and streaming writes, I quickly ran into `smart_open`, which appears to do everything that we want. I think you had a S3FileIO that used smart_open before. Is there a reason not to use that to wrap boto3 now? I think we would be able to avoid maintaining a lot of this code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
