nastra commented on pull request #3304:
URL: https://github.com/apache/iceberg/pull/3304#issuecomment-950593362


   > > > My biggest concern is that the benchmarks can take a while to run and 
the stability of the GH Action hosted runners is somewhat unknown (in regards 
to variability in performance runs)
   > > 
   > > 
   > > @kbendick I agree and I share the same concern. However, I think it it 
still better to have a hosted way of running benchmarks, rather than having 
people to run them locally as the variability in performance will be equally 
bad, since you most likely will be still doing dev work while you run 
benchmarks locally.
   > > One solution to that problem would be some dedicated GH runners with 
better CPUs and more memory. Would be interesting to know if that's possible to 
get from Apache INFRA
   > 
   > I agree with your point here. And at least with time, the GH Actions env 
will hopefully have some sort of known variability (vs two different local dev 
envs that can be totally different in terms of available resources and what 
else the user is doing there).
   > 
   > For that reason, and the fact that this can / should be run in a fork, I 
think this is a good idea.
   > 
   > In my opinion, if / when we include this, we should:
   > 
   > 1. Ensure we document the input parameters well - Can be done in a follow 
up.
   > 2. Document that the benchmarks shouldn’t be taken as a great source of 
truth but rather something to help guide developer understanding of code 
changes (e.g. hopefully people don’t run these benchmarks, then run the same 
thing somewhere else that’s more stable with a different framework etc, and 
make production choices off of that). - Also can be a follow up.
   > 3. Look into what impact, if any, running GH Actions in forks has on the 
resource pool for the whole `Apache` GitHub account. I know all projects share 
a limited set of concurrent runners, but I don’t know whether or not a fork of 
those projects has the same concerns. - I don’t think that forks use up the 
ASF’s limited GH Action concurrency / resources, but I admittedly don’t know. I 
only bring it up as the benchmarks can take quite some time to run.
   > 4. Look into the possibility of hosted runners. I know that the Flink 
project has people run the test suite against Azure (and that the Azure account 
/ resources are free). Possibly that’s an avenue worth exploring? I doubt ASF 
infra can provide us with much in the way of better guarantees on the GH Action 
runners. If they can, wonderful. But otherwise, taking a look at how the Flink 
project has devs set up Azure accounts that are free for running tests might be 
something to look into (maybe they just do this for free for open-source 
projects?).
   > 
   > Overall, thanks for working on this. I’m always a little leery of 
providing much prominence to the JMH tests, but they’re much better than 
nothing and some power users make great use of them (i.e. - you 
slightly_smiling_face). I’ve just switched jobs and so don’t have a ton of free 
cycles, but once I do possibly we can collaborate on a slightly more 
standardized performance benchmark? I’ve had a WIP / rough version of one I’ve 
been working with for a while but haven’t found time to get it over the finish 
line.
   
   I agree with your observations and I'm planning to look at all of those as 
follow-ups. I'd like to first get this merged so that we can experiment a bit 
more and see how stable things really are.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to