Yes, it is possible that this had an influence--Reads are now all implemented as SDFs and Creates involve a reshuffle to better redistribute data. This much of a change is quite surprising. Where is the pipeline for, say, "Python | ParDo | 2GB, 100 byte records, 10 iterations | Batch" and how does one run it?
On Fri, Dec 20, 2019 at 6:50 AM Kamil Wasilewski <[email protected]> wrote: > > Hi all, > > We have a couple of Python load tests running on Flink in which we are > testing the performance of ParDo, GroupByKey, CoGroupByKey and Combine > operations. > > Recently, I've discovered that the runtime of all those tests rose up > significantly. It happened between the 6th and 7th of December (the tests are > running daily). Here are the dashboards where you can see the results: > > https://apache-beam-testing.appspot.com/explore?dashboard=5649695233802240 > https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792 > https://apache-beam-testing.appspot.com/explore?dashboard=5698549949923328 > https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536 > > I've seen in that period we submitted some changes to the core, including > Read transform. Do you think this might have influenced the results? > > Thanks, > Kamil
