BjornPrime commented on PR #25965: URL: https://github.com/apache/beam/pull/25965#issuecomment-1574141112
> It's this PR not managing kms anymore, currently both in Java and Python SDK does I had meant this PR, and the GCSIO in general, won't need to manage kms like we had been previously. I'm not sure what you mean by the last half of the comment. Other parts of the Python SDK are still managing kms. Do any of those parts conflict with the changes I've made in this PR? > How is conclusion reached? This is different than what I heard last time. At the IO sync a couple weeks ago I laid out my case that the current performance tests were using a large number of small records, which was not representative of typical GCSIO jobs, and this caused the larger per-unit overhead of the GCS client implementation to have an outsized impact and cause it to be slower. In tests with larger record sizes, the performance regression disappeared, and according to the GCS client team, the larger record sizes are more representative of what GCS usually handles. I'm still working on my write-up of the case but I can send you the raw results in a spreadsheet if you'd like. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
