Koen De Groote created KAFKA-7013:
-------------------------------------

             Summary: Do Infer analysis and investigate results
                 Key: KAFKA-7013
                 URL: https://issues.apache.org/jira/browse/KAFKA-7013
             Project: Kafka
          Issue Type: Bug
            Reporter: Koen De Groote


Infer is a tool by Facebook that performs checks on code in order to find 
potential flaws.

Specifically, it attemps to track down null dereferences, thread safety 
violations and resource leaks.

Site: [http://fbinfer.com|http://fbinfer.com/]

Github: [https://github.com/facebook/infer]

The tool supports java. I made a docker container for the kafka code and the 
infer tool and ran Infer's analysis on the `gradle clean compileJava` process.

Results:
Summary of the reports

THREAD_SAFETY_VIOLATION: 274
RESOURCE_LEAK: 263
NULL_DEREFERENCE: 151

Do take into account: the analysis is not perfect. Most resources leaks 
reported are simply because the end of a scope was reached and the resource 
wasn't closed. But there are times you don't actually want to close them.

For instance:

[https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/DefaultKafkaClientSupplier.java]

All those byte serializers/deserializers are reported as resource leaks. While 
I don't know this code, a quick gander tells me it seems like these resources 
should not be closed and the end of the scope.

688 issues is a lot and probably worth going through, even if it is only to 
conclude that all the reported instances are false positives.

Another example, actual infer output this time:
{noformat}
687. 
clients/src/main/java/org/apache/kafka/common/record/CompressionRatioEstimator.java:52:
 error: THREAD_SAFETY_VIOLATION
       Read/Write race. Non-private method `float 
CompressionRatioEstimator.updateEstimation(String,CompressionType,float)` reads 
without synchronization from `compressionRatioForTopic.[_]`. Potentially races 
with write in method `CompressionRatioEstimator.updateEstimation(...)`.
{noformat}

While technically correct in that the synchronization only happens on write and 
not on read, one can wonder if adding it would negatively affect performance 
and if the "correct" ratio would differ so much from the incorrect that it 
actually warrants adding the synchronization on read.

The results should definitely be taken with a grain of salt.

Reason for not uploading the file with the bugs listed in them: because that 
file only shows the last step in the code that would potentially trigger the 
problem. Infer has a secondary command `infer-explore` which lets you pick one 
of the issues it found and then it outputs the exact code path that leads to 
the problem, something showing surprising cases.

Because of the scope of this and my relative inexperience with this codebase, I 
don't really know how to properly fill in this ticket, so I'll be leaving the 
ticket as is.

Final word: looking into stuff like this is work I usually do outside of 
working hours. Unless someone is really really interested, I'd ask people not 
to take time away from what would be their regular tickets. Especially since 
eventual PR(s) will most likely have a discussion of whether or not a 
particular change should actually happen or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to