On Mon, 2021-07-05 at 21:45 +0530, Ankur Saini wrote:
> I forgot to send the daily report yesterday, so this one covers the
> work done on both days
>
> AIM :
>
> - make the analyzer call the function with the updated call-string
> representation ( even the ones that doesn’t have a superedge )
> - make the analyzer figure out the point of return from the function
> called without the superedge
> - make the analyser figure out the correct point to return back in the
> caller function
> - make enode and eedge representing the return call
> - test the changes on the example I created before
> - speculate what GCC generates for a vfunc call and discuss how can we
> use it to our advantage
>
> —
>
> PROGRESS ( changes can be seen on
> "refs/users/arsenic/heads/analyzer_extension “ branch of the repository
> ) :
>
> - Thanks to the new call-string representation, I was able to push
> calls to the call stack which doesn’t have a superedge and was
> successfully able to see the calls happening via the function pointer.
>
> - To detect the returning point of the function I used the fact that
> such supernodes would contain an EXIT bb, would not have any return
> superedge and would still have a pending call-stack.
>
> - Now the next part was to find out the destination node of the return,
> for this I again made use of the new call string and created a custom
> accessor to get the caller and callee supernodes of the return call,
> then I extracted the gcall* from the caller supernode to ulpdate the
> program state,
>
> - now that I have got next state and next point, it was time to put the
> final piece of puzzle together and create exploded node and edge
> representing the returning call.
>
> - I tested the changes on the the following program where the analyzer
> was earlier giving a false negative due to not detecting call via a
> function pointer
>
> ```
> #include <stdio.h>
> #include <stdlib.h>
>
> void fun(int *int_ptr)
> {
> free(int_ptr);
> }
>
> int test()
> {
> int *int_ptr = (int*)malloc(sizeof(int));
> void (*fun_ptr)(int *) = &fun;
> (*fun_ptr)(int_ptr);
>
> return 0;
> }
>
> void test_2()
> {
> test();
> }
> ```
> ( compiler explorer link : https://godbolt.org/z/9KfenGET9 <
> https://godbolt.org/z/9KfenGET9> )
>
> and results were showing success where the analyzer was now able to
> successfully detect, call and return from the function that was called
> via the function pointer and no longer reported the memory leak it was
> reporting before. : )
This is great; well done!
It would be good to turn the above into a regression test. I think you
can do that by simply adding it to gcc/testsuite/gcc.dg/analyzer. You
could also add a case where fun_ptr is called twice, and check that it
reports it as a double-free (and add a dg-warning directive to verify
that it correctly complains).
I wonder if your branch has already have fixed:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546
>
> - I think I should point this out, in the process I created a lot of
> custom function to access/alter some data which was not possible
> before.
>
> - now that calls via function pointer are taken care of, it was time
> to see what exactly happen what GCC generates when a function is
> dispatched dynamically, and as planned earlier, I went to ipa-
> devirt.c ( devirtualizer’s implementation of GCC ) to investigate.
>
> - althogh I didn’t understood everything that was happening there but
> here are some of the findings I though might be interesting for the
> project :-
> > the polymorphic call is called with a OBJ_TYPE_REF which
> contains otr_type( a type of class whose method is called) and
> otr_token (the index into virtual table where address is taken)
> > the devirtualizer builds a type inheritance graph to keep
> track of entire inheritance hierarchy
> > the most interesting function I found was
> “possible_polymorphic_call_targets()” which returns the vector of all
> possible targets of polymorphic call represented by a calledge or a
> gcall.
> > what I understood the devirtualizer do is to search in
> these polymorphic calls and filter out the the calls which are more
> likely to be called ( known as likely calls ) and then turn them into
> speculative calls which are later turned into direct calls.
>
> - another thing I was curious to know was, how would analyzer behave
> when encountered with a polymorphic call now that we are splitting
> the superedges at every call.
>
> the results were interesting, I was able to see analyzer splitting
> supernodes for the calls right away but this time they were not
> connected via a intraprocedural edge making the analyzer crashing at
> the callsite ( I would look more into it tomorrow )
>
> the example I used was : -
> ```
> struct A
> {
> virtual int foo (void)
> {
> return 42;
> }
> };
>
> struct B: public A
> {
> int foo (void)
> {
> return 0;
> }
> };
>
> int test()
> {
> struct B b, *bptr=&b;
> bptr->foo();
> return bptr->foo();
> }
> ```
> ( compiler explorer link : https://godbolt.org/z/d986ab7MY <
> https://godbolt.org/z/d986ab7MY> )
>
I can see the crash in gdb:
In state_purge_per_ssa_name::process_point, when
if (snode->m_returning_call)
the code assumes that there will a cgraph_edge, which isn't the case
anymore; it will need to go from the "return" supernode to the "call"
supernode (both within the caller function).
> —
>
> STATUS AT THE END OF THE DAY :-
>
> - make the analyzer call the function with the updated call-string
> representation ( even the ones that doesn’t have a superedge ) (done)
> - make the analyzer figure out the point of return from the function
> called without the superedge (done)
> - make the analyser figure out the correct point to return back in
> the caller function (done)
> - make enode and eedge representing the return call (done)
> - test the changes on the example I created before (done)
> - speculate what GCC generates for a vfunc call and discuss how can
> we use it to our advantage (done)
>
Good work; looks promising.
Dave