Hi community, in the context of HIVE-26196 <https://issues.apache.org/jira/browse/HIVE-26196> we started considering the adoption of SonarCloud <https://sonarcloud.io/features> analysis for Apache Hive to promote data-driven code quality improvements and to allow reviewers to focus on the conceptual part of the changes by helping them spot trivial code smells, security issues and bugs.
SonarCloud has already been adopted and integrated into a few top Apache projects like DolphinScheduler <https://dolphinscheduler.apache.org/> and Apache Jackrabbit FileVault <https://jackrabbit.apache.org/filevault/>. For those who don't know, Sonar is a code analysis tool, the initial adoption would aim at tracking code quality for the master branch, and making the PRs' review process easier, by allowing to compare which code/security issues a PR solved/introduced with respect to the main branch. We already have a Hive-dedicated project under the Apache foundation's SonarCloud account: https://sonarcloud.io/project/overview?id=apache_hive. In what follows I will highlight the main points of interest: 1) sonar adoption scope: For the time being a descriptive approach (just show the analysis and associated metrics) could be adopted, delaying a prescriptive one (i.e., quality gates based on the metrics for PRs' mergeability) to a later time where we have tested SonarCloud for long enough to judge that it could be a sensible move. 2) false positives: Sonar suffers from false positives, but they can be marked as such from the web UI: (source https://docs.sonarqube.org/latest/faq/#header-1) How do I get rid of issues that are False-Positives? > False-Positive and Won't Fix > You can mark individual issues False Positive or Won't Fix through the > issues interface. If you're using PR analysis provided by the Developer > Edition, issues marked False Positive or Won't Fix will retain that status > after merge. This is the preferred approach. > //NOSONAR > For most languages, SonarQube supports the use of the generic mechanism: > //NOSONAR at the end of the line of the issue. This will suppress all > issues - now and in the future - that might be raised on the line. For the time being, I think that marking false positives via the UI is more convenient than using "//NOSONAR", but this can be discussed further. 3) test code coverage: Due to the specific structure of the ptest infra (split execution and other peculiarities), we are not yet supporting test code coverage, this can be added at a later stage, in the meantime all the code quality and security metrics are available. 4) what will be analyzed: the master branch and each open PR 5) integration with github: SonarCloud integrates with GitHub in two ways, the first one is an additional item in the list of checks (where you have the spell checking, CI result etc.) that will just say Passed/Not Passed and provide a link for all the details, the second is a "summary" comment under the PR highlighting the main info (you can see an example here <https://github.com/apache/hive/pull/3254#issuecomment-1206641629>). The second integration can be disabled if we consider that the first one is enough, and that if we want to dig more we can open the associated link for the full analysis in SonarCloud. 6) analysis runtime: In CI the full analysis takes around 30 minutes, but this step is executed in parallel with the test split tasks and won't add to the total runtime. For PRs SonarCloud detects unchanged files and avoids analysing them, so the runtime there is expected to be lower. I'd like to hear your thoughts on this, and I am looking for reviewers for the PR https://github.com/apache/hive/pull/3339. Best regards, Alessandro