On Fri, 14 Jul 2017 17:19:36 +0300
Andrey Karpov <kar...@viva64.com> wrote:
Hello Carsten,
After publishing my article, several unreasonable news appeared on
the Internet. That's why, I'd like to note that in my article I
didn’t write about the bad/good quality of Tizen code or that
PVS-Studio analyzer is magic, best of the best. I only gave those
numbers that I’d got. I can't talk about the quality of Tizen code,
since I have insufficient data for this purpose. I understand clearly
what you are talking about and agree with you.
Indeed that is one reason I ask for numbers that can be compared
because when you publish some number, and that number is large (in this
case due to the fact that codebase you extrapolated for is large) ... a
lot of people don't see the forest from the trees.
Don't get me wrong. Code quality is important. Fixing bugs is
important. Tools to point out "here is a potential issue" are very
useful. But you do need to stand back and see the possibly urgency in
light of everything else and that requires comparative numbers.
My objective was to show that despite the already used techniques,
PVS-Studio analyzer can help to make Tizen code better and more
reliable. As I think, I managed to demonstrate this by pointing to
the 900 fragments of code, which, in my opinion, deserve attention,
fixing and refactoring.
At this point, for me at least, the jury is out on PVS Studio. We have
internal tools and we're using those first at this point on Tizen's
codebase. If PVS Studio is better or not than those tools, I'm not
going to comment, as I am not in a good position to do so.
For the upstream projects tizen depends on that I look after, I'm VERY
familiar with Coverity scan and to a lesser extent clang's static
analyser. I can say the text reports PVS studio produce pale in
comparison to what Coverity provides in the Web UI. It provides code
flow analysis (what code path was used to get there) which is very
useful in learning the "why", A full browsable view of the code (so I
don't keep having to match it back to the source file/line when
reading), and a collaborative environment where many deveopers can work
together live on the issue list and see what is triaged by who, whenand
a log about it, BUT it offers something I have not seen so far in the
information you provided for PVS Studio on the blog references here,
... the ability to say "No. False positive. Tool - you're wrong. Here's
why". Coverity lets you do this... which lets you move on and not
modify code that works fine and is correct.
My experience shows that a decent percentage of "fixes for warnings
from coverity" lead to ADDING bugs. I've seen it happen many times. It
made things WORSE. It went from a "theoretical bug but actually due to
environment will never happen" to "I wake up, update code and suddenly
app x, y, z are broken in some way and it was a fix for a warning".
This is also why I'm wary of having people unfamiliar with code "fix
it" based on such warnings. Even people who know the code well mess up.
Unfortunately, the question "Include this as a bugs per 1 k lines of
code or similar metric?" is not very clear.
It's simple. "This code has an average of 0.123 bugs per 1k lines of
code". But measure that on everything you analyse. You at least have a
ROUGH measuring stick. They may be all major exploitable issues, maybe
all minor unlikely ones or maybe some combination so a rating system
ultimately would score them based on that, BUT let's keep it simple. If
you quote a number, quote numbers for everything you look at and to
make it comparable... make it proportional to size.
In my opinion, the article presents all the necessary data. I've got:
* The density of detected errors in code (c) 2015 Samsung
Electronics: 0.41 errors on 1000 lines of code.
Your numbers say it's 0.375. 900 in 2.4 million lines of code.
900/2400 = 0.375. The 0.41 is including false positives? Now we have a
report disagreeing with itself. It's small, but important.
OK - found it now. I missed that paragraph towards the end (it is a
long article), but these numbers disagrees with others you give right at
the top and in the headline (the headline number is an extrapolation).
Why do your numbers differ?
* The density of detected errors in the third-party libraries: 0.36
errors on 1000 lines of code.
(I did not consider comments as the lines of code).
Can these data be incorrect? Yes, they can. This is not a scientific
research, this is a demonstration on practice that the tool may be
useful.
Moreover, some errors, in Tizen developers’ opinion, canbe not that
erroneous. Well, at least, there is no sense to fix them. Then the
density of detected errors will diminish.
This is another issue entirely - if the error is "worth worrying about".
On the other hand, I might highlight not all the errors. I approached
to the study of the report very carefully, but without bigotry. For
example, I was a bit lazy to study the warnings V730 -
https://www.viva64.com/en/w/V730/. This is a very time consuming and
thankless work, when you work with someone else's code. All the time
it is unclear, if it is dangerous or not, that some class member has
been left uninitialized. It is a tedious long labour that needs to be
done carefully. So, perhaps, with more careful reviewing of the log,
other errors can be found.
Indeed it is time consuming to examine them all. I think you did a good
job at explaining what errors are there and why in the general sense
these are potential issues.
About the comparison with the quality of other projects... Difficult
question. I ask to understand, that while writing the articles we
don’t have the aim to compare which code is better or
worse.Therefore, we usually stop when found enough of quite
The problem is - when you publish numbers like you did, it becomes a
comparison thing. That's how people respond. Headlines like "27000
bugs found in Tizen!" are very much designed to catch attention with
numbers but when there is no context provided... the only thing people
see is a headline.
interesting bugs for writing an article. Performing the careful
analysis of all warnings for a large project will take a lot of time.
I also ask to consider that it is difficult and long to deal with
unknown code. Therefore, we sometimes mention about density of errors
only for small projects, since is not very difficult to view the
whole report. Examples:
* Notepad++: we detect about 2 errors per 1000 lines of code.
https://www.viva64.com/en/b/0511/
WOW. that's a lot. Thanks for the number.
* Far Manager for Linux: we detect about 0.464 errors per 1000 lines
of code. https://www.viva64.com/en/b/0478/
Thanks. your numbers are in line with what coverity indicates too.
* Tor project: we do not find anything. Density
0.https://www.viva64.com/en/b/0507/
<https://www.viva64.com/en/b/0507/>
Indeed that is good. I also noted openssl last i checked on coverity
scan had a density of 0 too.
As we can see, the results are different. However, it seems to me
they are not worth to concentrate on. Static analyzer is a tool of
They are a tool for prioritization. As I said - bugs are bugs and
should be fixed. Knowing if your bug count is particularly bad or now
compared to "the industry at large" etc. lets you know how much effort
or money or time to spend on these issues.
I have taken a position in upstream projects to just slowly fix the
reported issues over a long period of time. every release - fix some
more so every release has the bug rate go down. So you mix feature
addition and static analysis triage, but focus on the feature
development etc. and less so on triage.
finding bugs in fresh code. Yes, old mistakes are also worth to be
fixed, but generally they are not as critical as new ones. Actually,
if the error is in the code for several years,it means that it rarely
reveals itself or interferes no one. That’s why it is interesting to
look to the future rather than the past. Sure, the PVS-Studio
analyzer can be a good assistant for a programmer.
Indeed I agree there.
About the percent of false positives. It makes no sense to talk about
them without first configuring the analyzer. It's a lot of work,
which we are ready to engage, if a cooperation begins someday. Can we
deal with it? Yes, we can:
That sounds like a fair bit of work.
*
https://www.unrealengine.com/en-US/blog/how-pvs-studio-team-improved-unreal-engines-code
*
https://www.unrealengine.com/en-US/blog/static-analysis-as-part-of-the-process
I think when it comes to projects of such a large size as Tizen, it
makes sense to speak not only on product licensing, but also on a
great support, carried out by our team.
P.S. On Monday I will demonstrate that the analyzer can be useful not
only for finding bugs, but also regarding micro-optimizations. :)
Actually ... that does interest me. That's a new use of static
analysis. I think that this would be generally well received in Tizen
too. If PVS Studio can do this too... it just bumped up in value for
me. I'd like to see what these reports are and what they point out etc.
----
Best regards,
Andrey Karpov, Microsoft MVP,
Ph.D. in Mathematics, CTO
"Program Verification Systems" Co Ltd.
On 14.07.2017 4:35, Carsten Haitzler wrote:
On Thu, 13 Jul 2017 14:26:35 +0300
Andrey Karpov <kar...@viva64.com> wrote:
Could you:
1. Include this as a bugs per 1k lines of code or similar metric?
Total bugs is not that useful without knowing total size of code
looked at. At least in the summary.
2. Include metrics calculated similarly for other major projects
(Linux kernel, etc. etc.).
Why? The below is like saying "you're doing 120km/h!!!!!!" ... but
if it's on a freeway and the speed limit is 130km/h ... in context
it's very different. This here lacks context.
As I haven't used PVS studio before (it's on a list of things to try
out and see if it's good), but I do know Coverity's scan service
very well, I'll do some back of a napkin numbers:
1. In my experience about ~10-15% of bugs are false positives etc.
with coverity.
2. Coverity says Linux kernel gets 0.48 issues peer 1k lines of
code. applying the above false positive rate, let's call that 0.40.
Qt gets 0.72, so lets call that 0.61 adjusting for false positives.
Glib gets 0.45, so 0.38 accounting for false positives. So:
With your numbers, Tizen sees 900 issues in 2.4 million lines of
code. that comes out at 0.38.
Linux kernel = 0.40
Qt = 0.61
Glib = 0.38
Tizen = 0.38
Yes PVS studio is a different tool to coverity. I'm making an
assumption (much like you do too in many ways) that these two tools
are in the same ballpark and will report similar issues and
numbers, but may be disjoint sets. I'm going with this assumption
because you didn't provide other numbers to go by, and it'd be nice
to.
My conclusion is that Tizen code quality is pretty decent in the
scheme of things. It's bug rate is pretty low-ish.
Now on the other side, it';s always great to have tools point out
possible errors. Another tool is another weapon in a war chest to
improve code quality. That's a good thing. Bugs should be looked
into and addressed accordingly based on actual severity and
context. just blindly fixing issues will result in misallocation of
time and resources because it may be an issue in a debug tool that
is rarely used and only for gathering quick information by a
developer when something goes wrong... it may be a seriously
exploitable bug in code that is always able to be triggered
remotely. So context is important. Knowing issues are there and
what a tool thinks they are is a great speedup vs full code review.
PVS Studio is indeed such a tool. There are others too. We have
tools of our own we're using more and more.
Hello All,
This article will demonstrate that during the development of large
projects static analysis is not just a useful, but a completely
necessary part of the development process. This article is the
first one in a series of posts, devoted to the ability to use
PVS-Studio static analyzer to improve the quality and reliability
of the Tizen operating system. For a start, I checked a small part
of the code of the operating system (3.3%) and noted down about
900 warnings pointing to real errors. If we extrapolate the
results, we will see that our team is able to detect and fix about
27000 errors in Tizen. Using the results of the conducted study, I
made a presentation for the demonstration to the Samsung
representatives with the offers about possible cooperation. The
meeting was postponed, that is why I decided not to waste time and
transform the material of the presentation to an article:
https://www.viva64.com/en/b/0519/
----
Best regards,
Andrey Karpov, Microsoft MVP,
Ph.D. in Mathematics, CTO
"Program Verification Systems" Co Ltd.
_______________________________________________
Dev mailing list
Dev@lists.tizen.org
https://lists.tizen.org/listinfo/dev