Hi community,
in the context of HIVE-26196
<https://issues.apache.org/jira/browse/HIVE-26196> we started considering
the adoption of SonarCloud <https://sonarcloud.io/features> analysis for
Apache Hive to promote data-driven code quality improvements and to allow
reviewers to focus on the conceptual part of the changes by helping them
spot trivial code smells, security issues and bugs.

SonarCloud has already been adopted and integrated into a few top Apache
projects like DolphinScheduler <https://dolphinscheduler.apache.org/>
and Apache
Jackrabbit FileVault <https://jackrabbit.apache.org/filevault/>.

For those who don't know, Sonar is a code analysis tool, the initial
adoption would aim at tracking code quality for the master branch, and
making the PRs' review process easier, by allowing to compare which
code/security issues a PR solved/introduced with respect to the main branch.

We already have a Hive-dedicated project under the Apache foundation's
SonarCloud account: https://sonarcloud.io/project/overview?id=apache_hive.

In what follows I will highlight the main points of interest:

1) sonar adoption scope:
For the time being a descriptive approach (just show the analysis and
associated metrics) could be adopted, delaying a prescriptive one (i.e.,
quality gates based on the metrics for PRs' mergeability) to a later time
where we have tested SonarCloud for long enough to judge that it could be a
sensible move.

2) false positives:
Sonar suffers from false positives, but they can be marked as such from the
web UI: (source https://docs.sonarqube.org/latest/faq/#header-1)

How do I get rid of issues that are False-Positives?
> False-Positive and Won't Fix
> You can mark individual issues False Positive or Won't Fix through the
> issues interface. If you're using PR analysis provided by the Developer
> Edition, issues marked False Positive or Won't Fix will retain that status
> after merge. This is the preferred approach.


> //NOSONAR
> For most languages, SonarQube supports the use of the generic mechanism:
> //NOSONAR at the end of the line of the issue. This will suppress all
> issues - now and in the future - that might be raised on the line.


For the time being, I think that marking false positives via the UI is more
convenient than using "//NOSONAR", but this can be discussed further.

3) test code coverage:

Due to the specific structure of the ptest infra (split execution and other
peculiarities), we are not yet supporting test code coverage, this can be
added at a later stage, in the meantime all the code quality and security
metrics are available.

4) what will be analyzed:

the master branch and each open PR

5) integration with github:

SonarCloud integrates with GitHub in two ways, the first one is an
additional item in the list of checks (where you have the spell checking,
CI result etc.) that will just say Passed/Not Passed and provide a link for
all the details, the second is a "summary" comment under the PR
highlighting the main info (you can see an example here
<https://github.com/apache/hive/pull/3254#issuecomment-1206641629>).

The second integration can be disabled if we consider that the first one is
enough, and that if we want to dig more we can open the associated link for
the full analysis in SonarCloud.

6) analysis runtime:

In CI the full analysis takes around 30 minutes, but this step is executed
in parallel with the test split tasks and won't add to the total runtime.
For PRs SonarCloud detects unchanged files and avoids analysing them, so
the runtime there is expected to be lower.

I'd like to hear your thoughts on this, and I am looking for reviewers for
the PR https://github.com/apache/hive/pull/3339.

Best regards,
Alessandro

Reply via email to