Kasia Kucharczyk created BEAM-10659:
---------------------------------------

             Summary: ParDo Python streaming load tests timeouts on 
200-iterations case
                 Key: BEAM-10659
                 URL: https://issues.apache.org/jira/browse/BEAM-10659
             Project: Beam
          Issue Type: Bug
          Components: testing
            Reporter: Kasia Kucharczyk


Running Python Dataflow load test in streaming option timeouts on Jenkins on 
case 2:

{code:java}
2GB 100 byte records 200 times
{code}


 It 
[iterates|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py#L147]
 same ParDo step sequentially. 

Jenkins jobs has 2h timeout. Second case usually is 
[cancelled|https://console.cloud.google.com/dataflow/jobs/us-central1/2020-08-04_05_00_47-15183151853043328210;mainTab=JOB_METRICS?project=apache-beam-testing]
 after 1h 47 min. The most suspicious metric here is throughput which in 
comparison to other jobs doesn't look steady. Sometimes there are spike after 1 
hour of non action, or there are several spikes (to 30 000 elements/sec).

[Python batch 
case|https://console.cloud.google.com/dataflow/jobs/us-central1/2020-08-04_06_32_29-2466435392086580014;step=s1;mainTab=JOB_METRICS?project=apache-beam-testing]
 scenario takes ~56 minutes, with steady throughput ~7000 elements/sec for 
almost whole job run.

In comparison [Java same test 
case|https://console.cloud.google.com/dataflow/jobs/us-central1/2020-08-03_05_13_48-16554947290254286391;mainTab=JOB_GRAPH?project=apache-beam-testing]
 takes ~6 minutes. Here throughput goes up to ~100 000 elements/sec then after 
processing all elements it decreases.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to