phoerious commented on pull request #15931: URL: https://github.com/apache/beam/pull/15931#issuecomment-996892709
With this PR itself, I don't think there are any open issues at this point except that I cannot add tests for the changes without writing a whole new test suite with a more or less complete Boto3 stub client. The existing tests do not cover the actual S3Client, but only its interface via a mock S3Client. As for the problems addressed by my changes: Initial problem: The S3 client is way too slow, because it opens a new connection for each range request. This is very inefficient for both the client and the server. The new implementation tries to keep the stream open for sequential reads (at least 2x faster, but more like 10-20x in a real-world scenario). Second problem that I fixed just now with my second (and so far unreviewed) commit: The exception clauses were designed only for HTTP errors and did not work for errors on lower levels, such as TCP connection issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
