comphead commented on code in PR #12372: URL: https://github.com/apache/datafusion/pull/12372#discussion_r1748609618
########## docs/source/contributor-guide/index.md: ########## @@ -88,35 +106,61 @@ committer who approved your PR to help remind them to merge it. ## Creating Pull Requests -We recommend splitting your contributions into multiple smaller focused PRs rather than large PRs (500+ lines) because: +When possible, we recommend splitting your contributions into multiple smaller focused PRs rather than large PRs (500+ lines) because: 1. The PR is more likely to be reviewed quickly -- our reviewers struggle to find the contiguous time needed to review large PRs. 2. The PR discussions tend to be more focused and less likely to get lost among several different threads. 3. It is often easier to accept and act on feedback when it comes early on in a small change, before a particular approach has been polished too much. If you are concerned that a larger design will be lost in a string of small PRs, creating a large draft PR that shows how they all work together can help. -Note all commits in a PR are squashed when merged to the `main` branch so there is one commit per PR. +Note all commits in a PR are squashed when merged to the `main` branch so there is one commit per PR after merge. # Reviewing Pull Requests Some helpful links: -- [PRs Waiting for Review] -- [Approved PRs Waiting for Merge] +- [PRs Waiting for Review] on GitHub +- [Approved PRs Waiting for Merge] on GitHub [prs waiting for review]: https://github.com/apache/datafusion/pulls?q=is%3Apr+is%3Aopen+-review%3Aapproved+-is%3Adraft+ [approved prs waiting for merge]: https://github.com/apache/datafusion/pulls?q=is%3Apr+is%3Aopen+review%3Aapproved+-is%3Adraft -When reviewing PRs, please remember our primary goal is to improve DataFusion and its community together. PR feedback should be constructive with the aim to help improve the code as well as the understanding of the contributor. +When reviewing PRs, our primary goal is to improve DataFusion and its community together. PR feedback should be constructive with the aim to help improve the code as well as the understanding of the contributor. Please ensure any issues you raise contains a rationale and suggested alternative -- it is frustrating to be told "don't do it this way" without any clear reason or alternate provided. Some things to specifically check: -1. Is the feature or fix covered sufficiently with tests (see `Test Organization` below)? +1. Is the feature or fix covered sufficiently with tests (see the [Testing](testing.md) section)? 2. Is the code clear, and fits the style of the existing codebase? +## Performance Improvements + +Performance improvements are always welcome: performance is a key DataFusion +feature. + +In general, the performance improvement from a change should be "enough" to +justify any added code complexity. How much is "enough" is a judgement made by +the committers, but generally means that the improvement should be noticeable in +a real-world scenario and is greater than the noise of the benchmarking system. + +To help committers evaluate the potential improvement, performance PRs should +in general be accompanied by benchmark results that demonstrate the improvement. + +The best way to demonstrate a performance improvement is with the existing +benchmarks: + +- [System level SQL Benchmarks](https://github.com/apache/datafusion/tree/main/benchmarks) +- Microbenchmarks such as those in [functions/benches](https://github.com/apache/datafusion/tree/main/datafusion/functions/benches) + +If there is no suitable existing benchmark, you can create a new one. It helps Review Comment: This is a good point, very often reviewing PRs we have to ask showing the actual performance benefit with existing benches or other performance tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org