Re: [PATCH] [StaticAnalyzer] New checker Sizeof on expression

Anna Zaks Mon, 18 Nov 2013 09:42:12 -0800

On Nov 18, 2013, at 1:56 AM, Anders Rönnholm <[email protected]> 
wrote:


> Hi,
>  
> I have run the checker on llvm codebase and chrome and did not find any false 
> positives. Not any true positives either. I also ran it on some SFINAE 
> examples with no false positives. But that was expected as I have not seen 
> any SFINAE examples where they use a binary expression in the sizeof nor 
> sizeof(sizeof()), is that really common?
>  
> I saw that there is an isSFINAEContext function in semaexpr. What do you 
> think about moving the check to sema and use that function to avoid 
> triggering on SFINAE?
>  

Adding the test case referenced by Richard, moving the check to Sema, and 
resubmitting the patch sound like a good plan to me.

Anna.

> /Anders
>  
>  
> From: Anna Zaks [mailto:[email protected]] 
> Sent: den 23 oktober 2013 17:35
> To: Daniel Marjamäki
> Cc: Ted Kremenek; Anders Rönnholm; Richard Smith; [email protected]; 
> Jordan Rose
> Subject: Re: [PATCH] [StaticAnalyzer] New checker Sizeof on expression
>  
> Daniel,
>  
> I think Richard was asking about false positives/true positives ratio and I 
> do not see any false positives in the list below. It looks like all the 
> reports you found are valid. Did you run this check on large C++ codebases?
>  
> As far as I can tell there are two outstanding issues:
>  - Addressing these questions from Richard:
> sizeof(expression) is a common idiom in SFINAE contexts. Is that covered here?
> Richard, can you provide examples?
>  
> sizeof(sizeof(int)) is a "traditional" way to get sizeof(size_t). Why should 
> we warn on that?
> (I personally think that if this check goes into the analyzer, it's fine to 
> warn in the second case.)
>  
>  - Where this check should go (compiler or the analyzer)?
> I think that if you can address the questions above, this check could go into 
> the compiler.
>  
> Cheers,
> Anna.
>  
> On Oct 14, 2013, at 12:43 PM, Daniel Marjamäki <[email protected]> 
> wrote:
> 
> 
> 
> Hello!
> 
> As I understood it.. you wanted to know what the hit ratio is. I have 
> investigated the hit ratio on a number of packages in debian. Below is the 
> negative and positives I've found. I provide the package URLs so you can look 
> for yourself..
> 
> I will continue investigating and let you know if I see any false positives...
> 
> 
> NEGATIVE (macro)
> package: 
> http://ftp.us.debian.org/debian/pool/main/a/agedu/agedu_9723.orig.tar.gz
> 
> #define sresize(array, number, type) \
>     ( (void)sizeof((array)-(type *)0), \
>       (type *) srealloc ((array), (number) * sizeof (type)) )
> 
> We don't warn about this since it's in a macro. However it is interesting:
>     sizeof((array)-(type *)0)
> It seems to be written this way by intention. To get sizeof(ptrdiff_t)? Maybe 
> sizeof(ptr-ptr) should be ok.
> 
> 
> 
> POSITIVE
> package: 
> http://ftp.us.debian.org/debian/pool/main/a/argus-client_3.0.0.orig.tar.gz
> 
> file: argus-clients-3.0.0/radump/print-ldp.c
> line: 607
>             print_unknown_data(tptr+sizeof(sizeof(struct 
> ldp_msg_header)),"\n\t  ",
>                                msg_len);
> 
> file: argus-clients-3.0.0/radump/print-lmp.c
> line: 878
>             print_unknown_data(tptr+sizeof(sizeof(struct 
> lmp_object_header)),"\n\t    ",
>                                lmp_obj_len-sizeof(struct lmp_object_header));
> 
> Not sure if these are intended to be sizeof(size_t) or sizeof(struct 
> lmp_object_header) but I guess the latter.
> 
> 
> 
> POSITIVE
> package: 
> http://ftp.us.debian.org/debian/pool/main/b/babel/babel_1.4.0.dfsg.orig.tar.gz
> file: babel-1.4.0.dfsg/regression/output/libC/synch_RegOut_Impl.c
> 
> 332:    if ((res < 0) || (res > sizeof(s_result_strings)/sizeof(sizeof(char 
> *)))) {
> 
> This "sizeof(sizeof(char *))" should be "sizeof(char*)" since 
> s_result_strings is a array of char * pointers
> 
> 
> 
> POSITIVE
> package: 
> http://ftp.us.debian.org/debian/pool/main/b/bird/bird_1.3.7.orig.tar.gz
> file: bird-1.3.7/lib/mempool.c
> 
> 253:
>   return ALLOC_OVERHEAD + sizeof(struct linpool) +
>     cnt * (ALLOC_OVERHEAD + sizeof(sizeof(struct lp_chunk))) +
>     m->total + m->total_large;
> 
> Not sure if this is intended to be sizeof(size_t) or sizeof(struct lp_chunk) 
> but I guess the latter.
> 
> 
> POSITIVE:
> package: 
> http://ftp.us.debian.org/debian/pool/main/b/blender/blender_2.49.2~dfsg.orig.tar.gz
> file: blender-2.49b.orig/source/blender/blenloader/intern/readfile.c
> 6987:    if(0==strncmp(ima->id.name+2, "Viewer Node", 
> sizeof(ima->id.name+2))) {
> 6991:    if(0==strncmp(ima->id.name+2, "Render Result", 
> sizeof(ima->id.name+2))) {
> 
> These are bad, ima->id.name is an array, it should probably be changed to:
>     if(0==strncmp(ima->id.name+2, "Viewer Node", sizeof(ima->id.name) - 2)) {
> 
> 
> Best regards,
> Daniel Marjamäki
>  
> ..................................................................................................................
> Daniel Marjamäki Senior Engineer
> Evidente ES East AB  Warfvinges väg 34  SE-112 51 Stockholm  Sweden
>  
> Mobile:                 +46 (0)709 12 42 62
> E-mail:                 [email protected]                      
>  
> www.evidente.se
> Från: [email protected] [[email protected]] för 
> Anna Zaks [[email protected]]
> Skickat: den 8 oktober 2013 18:18
> Till: Ted Kremenek
> Cc: Anders Rönnholm; Richard Smith; [email protected]
> Ämne: Re: [PATCH] [StaticAnalyzer] New checker Sizeof on expression
> 
>  
> On Oct 7, 2013, at 10:05 PM, Ted Kremenek <[email protected]> wrote:
> 
> 
> On Oct 7, 2013, at 6:05 PM, Jordan Rose <[email protected]> wrote:
> 
> 
>  
> On Oct 7, 2013, at 13:58, Richard Smith <[email protected]> wrote:
> 
> 
> On Mon, Oct 7, 2013 at 9:55 AM, Jordan Rose <[email protected]> wrote:
> I'm fine with this staying in the analyzer for now, unless David, Richard, or 
> Eli feel it should be a warning right away.
>  
> Do we have evidence that we want this? Does it catch bugs? If so, what do 
> they look like? It seems like this would trigger on legitimate code; how does 
> a user suppress the warning in that case, and does that suppression make 
> their code clearer?
>  
> What is the false/true positive ratio for bug finding here?
>  
> sizeof(expression) is a common idiom in SFINAE contexts. Is that covered here?
>  
> sizeof(sizeof(int)) is a "traditional" way to get sizeof(size_t). Why should 
> we warn on that?
>  
> And more as a general question than something specific to this patch: Is 
> there a region in the space of false positive ratios where we think a 
> syntactic warning should go into the static analyzer? If so, why? And what is 
> that region? I would have thought that the static analyzer, like the clang 
> warnings, would be aimed at (eventually) having a false positive ratio of 
> near zero. If so, then should we ever put a warning in the static analyzer if 
> it doesn't require the static analyzer's technology (or have a high runtime 
> cost)?
>  
> On this last (and bringing in Ted and Anna):
>  
> I think the main difference between compiler warnings and syntactic analyzer 
> checks is that we try very hard to turn new compiler warnings on by default. 
> A second-order effect of this is that we generally avoid style warnings. The 
> analyzer can be a bit looser about this, though: because people know the 
> analyzer is stricter and more in-depth, I think they might also accept that a 
> particular check doesn't fit their project.
>  
> On the other hand, we still haven't gotten around to designing a proper bug 
> tracking and/or manual suppression system, so that's one advantage of 
> compiler warnings. And as you say, checks without a high runtime cost don't 
> really have a technical reason to be in the analyzer.
>  
> Richard’s point is correct that we want the static analyzer to also have a 
> high signal-to-noise ratio.  Otherwise it is a useless tool.  I’m also not a 
> fan of having the analyzer having a bunch of “junk” checkers that aren’t on 
> by default, but if a checker, when it is enabled, is HIGHLY useful to some 
> set of people (e.g., security experts who are more tolerant of false positive 
> rates if they want to do an aggressive code audit)  I think the analyzer is a 
> reasonable place to put them, given the current warning policy in Clang where 
> we want the warnings there to be generally useful to everybody.
>  
> To put it in more context, in the beginning the guiding principal of what 
> goes in the static analyzer was:
>  
> (a) The warning is very expensive to compute.
>  
>  OR
>  
> (b) The warning is very domain-specific.  For example, an API such as 
> CFNumberCreate() on OS X has some interesting API invariants, but we 
> generally should not be hacking those API-specific warnings into the 
> compiler.  Exceptions exist of course, e.g., printf checking, but often the 
> are grounded when such APIs are fairly standard (e.g., in the C standard 
> itself) or the checking is based on some annotation like 
> __attribute__((format)) where the compiler doesn’t know anything about a 
> specific function itself, just the annotation.
>  
> Style warnings can sometimes fall into (a) (in which case not putting them in 
> the compiler makes obvious sense), but one could argue that they are more the 
> flavor of (b) then a traditional compiler warning.  As I mentioned earlier, 
> the static analyzer can be home to some highly specialized checkers that may 
> not be generally useful for everybody but when enabled are very useful to 
> certain people.  Style warnings often fit in this category of warnings.
>  
> A related problem is that we don’t have an ontology for style warnings in 
> clang (the compiler).  Should they really be lumped into the same group of 
> all other compiler warnings?  What about their behavior with -Werror?  
> -Weverything?  Style warnings really are about personal style, and users can 
> be highly polarized about them.  For example, it would be highly useful for 
> Clang to have a -W80-columns or -Wspaces-instead-of-tabs (not a serious 
> proposal).  Both would be cheap to compute, would obviously benefit LLVM 
> developers, but they wouldn’t work for everybody, or even the majority of 
> software developers.  Where should they go?  The compiler?  The static 
> analyzer?  Both have discoverability and usability concerns in either 
> location.  I’d argue that these two warnings would be better suited in the 
> compiler because (when the user wants them) they’d get run all the time, but 
> having these warnings counteracts the general guiding principal that compiler 
> warnings should generally be useful to most people, or clear categories of 
> people (e.g, library authors), but not specific groups of people (e.g., LLVM 
> developers).
>  
> This is not a thought out proposal, but I would not be opposed to a new 
> category of warnings, say -Wstyle, and have all style warnings under -Wstyle, 
> perhaps named “-Wstyle-80-columns” or “-Wstyle-spaces-instead-of-tabs”.  None 
> of these warnings would be on by default.  If we want these kind of warnings 
> in the compiler, I think these kind of warnings are worth calling out as 
> being something “different” that people shouldn’t get hung up about if they 
> don’t think the warning applies to them.  Also, having them in a special 
> category of them own allows institutions to set broad warning polices such as 
> “-Werror -Wno-error=style” where they want regular warnings to be treated as 
> error, but not style warnings, etc.
>  
> I agree with Ted. Currently, there is no good way of adding style and highly 
> specialized warnings to clang; given the behavior of Werror and Weverything. 
> For example, we encourage users to learn about new generic warnings by 
> turning on Weverything, so the specialized warnings should not be included 
> there.

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] [StaticAnalyzer] New checker Sizeof on expression

Reply via email to