BjornPrime commented on PR #25965:
URL: https://github.com/apache/beam/pull/25965#issuecomment-1574141112

   > It's this PR not managing kms anymore, currently both in Java and Python 
SDK does
   I had meant this PR, and the GCSIO in general, won't need to manage kms like 
we had been previously. I'm not sure what you mean by the last half of the 
comment. Other parts of the Python SDK are still managing kms. Do any of those 
parts conflict with the changes I've made in this PR?
   
   > How is conclusion reached? This is different than what I heard last time.
   At the IO sync a couple weeks ago I laid out my case that the current 
performance tests were using a large number of small records, which was not 
representative of typical GCSIO jobs, and this caused the larger per-unit 
overhead of the GCS client implementation to have an outsized impact and cause 
it to be slower. In tests with larger record sizes, the performance regression 
disappeared, and according to the GCS client team, the larger record sizes are 
more representative of what GCS usually handles. I'm still working on my 
write-up of the case but I can send you the raw results in a spreadsheet if 
you'd like.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to