halaharr opened a new issue, #28765:
URL: https://github.com/apache/beam/issues/28765
We are seeing Dataflow pipelines taking 2x to 3x more time to run in
Apache beam SDK ver 2.50 compared to Apache beam SDK ver 2.44. As part of
troubleshooting we compared the DAGS in 2.44 and 2.50 and we are seeing BQ read
from table step in DAG (full table scan using DIRECT_TABLE_ACCESS) taking 3 sec
to read 19 records / 13KB size in 2.44 and same exact pipeline with exactly
same 19 records and 13KB size taking 1 min 5 sec in 2.50. Is this because
this API has degraded in ver 2.50 since I also see throughput for this DAG step
is much higher in 2.44 than 2.50. Please find the throughput graph images
(elements/sec) below for both versions below
Throughput in ver 2.44 --> 0.15 sec (High)
Throughput in ver 2.50 --> 0.083 sec (Low)
<img width="598" alt="apache_beam_250"
src="https://github.com/apache/beam/assets/16997826/27292c20-c2c1-4bd6-b3be-d3a74e82c638">
<img width="611" alt="apache_beam_244"
src="https://github.com/apache/beam/assets/16997826/3a3481a8-3de1-4916-82e6-5b9cd4ff981f">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]