[
https://issues.apache.org/jira/browse/BEAM-13585?focusedWorklogId=702544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-702544
]
ASF GitHub Bot logged work on BEAM-13585:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 31/Dec/21 08:59
Start Date: 31/Dec/21 08:59
Worklog Time Spent: 10m
Work Description: phoerious commented on pull request #15931:
URL: https://github.com/apache/beam/pull/15931#issuecomment-1003313792
Done. Sorry for the delay.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 702544)
Remaining Estimate: 0h
Time Spent: 10m
> Python SDK S3 reader vastly inefficient
> ---------------------------------------
>
> Key: BEAM-13585
> URL: https://issues.apache.org/jira/browse/BEAM-13585
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Affects Versions: 2.34.0
> Reporter: Janek Bevendorff
> Priority: P2
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This is an "after-the-fact" Jira issue for my [GitHub
> PR|https://github.com/apache/beam/pull/15931#issuecomment-999945083] to make
> S3 streaming in the Python SDK vastly more efficient.
> The issue with the old implementation is that a new connection is opened for
> each range request, which is very inefficient for both the client and the
> server, adding a lot of unnecessary latency. The new implementation tries to
> reused an existing connection and continues reading from the same HTTP stream
> if possible.
> Speed gain: 1.7-12x in benchmarks, more like 10x in real-word applications.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)