On Jan 9, 2008, at 2:42 PM, Nuno Lopes wrote: > The PHP interpreter has the following function: > int zend_parse_parameters(int num_args, char *type_spec, ...); > > it is usually used like this: > zend_parse_parameters(ZEND_NUM_ARGS(), "s|l", &str, &str_len, > &number);
OK. This example really helps. > The problem is that the number and type of arguments depend on the > format string. In this case it receives a string (str + length) and > a long (optional). No compiler is currently able (AFAIK) to check if > the function is called correctly. Also, 'number' might not be > initialized, while str and str_len do (if the function doesn't > return FAILURE). So if I understand correctly, zend_parse_parameters has the following postcondition: "return value" != FAILURE => str == INITIALIZED, str_len == INITIALIZED, "return value" == FAILURE => str == UNINITIALIZED, str_len == UNINITIALIZED What you would like to do is expand the "uninitialized values" analysis to take into account the "return value" so that you can flag possible bad uses of "str" and "str_len"? > I implemented a simple checker with clang to verify the parameter > types. I mentioned that I need to port it to the liveness analyzer I think you mean the "uninitialized values" analyzer, not the "liveness analyzer." They are two completely different concepts. Liveness determines if the value in a variable will ever be used after a given point. Uninitialized values determines if the value in a variable is a garbage, regardless of whether or not the value will be used later. Further, in an optimizing compiler both analyses are a form of a dataflow analysis, except that "uninitialized values" is a forward dataflow analysis (information propagates forward in the CFG) and "liveness" is a reverse dataflow analysis (information propagates backwards in the CFG). This is an implementation detail, but it doesn't illustrate that they are two separate concepts. > because I want to check if the parameters after the '|' are used > before initialization Let me see if I understand what you mean. After a call to "zend_parse_parameters", you want to track the possible initialized/ uninitialized state of the "str" and "str_len" arguments (which depends on the "return value" of zend_parse_parameters). If you use "str" or "str_len" (or whatever other variables were used as arguments) if they could be in the "uninitialized" state, you want to flag an error. Is this what you mean? > and if the ones before are not initialized unnecessarily. This one I'm not certain what you mean. I'm not certain what you mean by "not initialized unnecessarily." > I doubt that anytime soon compilers will be able to analyze these > varargs functions automatically (well, you could try to do use some > heuristics, like searching for a switch, but..), so my idea was to > expose some kind of API to the programmers to allow them to specify > some arbitrary function to validate the arguments. > GCC supports the following: > void my_printf(const char *format, ...) > __attribute__((format(printf, 1, 2))); > > but GCC only supports the printf and scanf functions. My idea was to > generalize this, by allowing the user to specify some function > (without touching in the compiler's code). > While the idea seems fairly acceptable, I don't have any syntax > proposal. There was some interesting work on ESC/Java on providing powerful, logic-based annotations to functions and classes. The annotations were injected in comments, and a hacked Java parser would read those comments (similar to parsing Javadoc comments) and use those annotations to describe pre- and postconditions for functions/classes/ whatever. Some of the preconditions/postconditions one could associate with a function were extremely expressive; the downside is that they could require an expensive theorem prover to actual verify that the conditions would hold. On the other hand, the syntax of these annotations was actually not all that gross, although adding parser support for comment-based annotations for C/C++ is much more of a challenge because these languages are far messier in their syntactic structure. Adding attributes to support such stuff might be reasonable as well, as long as the logic-based annotations were embedded in a quoted string within the attributes. I'm not proposing, however, that we implement ESC/Java for clang, although a subset of those features might be extremely useful, as it is better to encode such properties concerning the contract associated with a function's interface in the actual source code (e.g. header files) instead of hardwiring such knowledge into a specific tool. This not only allows the tool to become more extensible as more code is annotated, but also means that the knowledge is more portable, and doesn't die out when a specific tool dies out. The other thing that I would like to mention is that the particular property you are describing is a little more than extending a flow- sensitive uninitialized values analysis. Because the uninitialized/ initialized state of "str" and "str_len" depends on the return value of zend_parse_parameters, it almost inherently becomes a path- sensitive property if you want to check it with any real precision. We will likely extend the uninitialized values analysis to work in the new path-sensitive dataflow engine that we are building; in that case adding such information might actually be pretty easy and should give you the precision that you need to not spit out too much noise to the user. _______________________________________________ cfe-dev mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
