Re: [klee-dev] KLEE behaviour with function call to uclibc

2017-05-28 Thread Dan Liew
Hi,

On 28 May 2017 at 00:55, Shehbaz Jaffer  wrote:
> Hi Dan,
>
>
> Thank you for your reply. However, I am struggling with how function not
> defined in the programs bitcode (like getopt_long function) gets emulated by
> KLEE. Please allow me to explain the steps and where I am facing issues in
> more detail:

getopt_long is part of the C library. In KLEE's case the definition
can be found in klee-uclibc.

If you don't link in klee-uclibc (i.e. with `-libc=uclibc`) then the
`getopt_long()` won't get linked into
the final LLVM IR that gets executed. Side note KLEE has the ability
to call external functions (i.e. those
not defined in LLVM IR) but JIT'ing calls to that function but doing
this requires that KLEE
concretize all arguments to the function.

When running KLEE take a look at the `assembly.ll` file in the klee
output directory (e.g. `klee-last`).
That file shows the LLVM IR that KLEE actually executes. This IR
contains the result of KLEE linking
in whatever libraries it decided to link in (e.g. klee-uclibc) and the
result of then applying any transformation
passes to the IR.

___
klee-dev mailing list
klee-dev@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/klee-dev


Re: [klee-dev] KLEE behaviour with function call to uclibc

2017-05-27 Thread Shehbaz Jaffer
Hi Dan,


Thank you for your reply. However, I am struggling with how function not 
defined in the programs bitcode (like getopt_long function) gets emulated by 
KLEE. Please allow me to explain the steps and where I am facing issues in more 
detail:


I extracted bc from executable of "groups" program in coreutils library, as 
explained in the official documentation [1]. when I disassembled the resultant 
bitcode (from groups.bc to groups.ll), I can see only the function declaration 
of getopt_long and not the entire definition. Since KLEE gets only the groups 
bitcode as an input (groups.bc) which does not contain the getopt_long 
definition, how is KLEE able to run all if-else inside the getopt_long 
function? What does KLEE take as an input that lets KLEE know the definition of 
getopt_long. I looked at the referenced code [2] however it explains the KLEE 
parameter initialization only. I would like to know how function calls to 
"missing functions" - or functions that do not have definitions in bitcode - 
such as those in uclibc are emulated by KLEE.


Code pertaining to disassembled groups.ll file


# call getopt_long

 %call4 = tail call i32 @getopt_long(i32 %argc, i8** %argv, i8* getelementptr 
inbounds ([1 x i8]* @.str6, i64 0, i64 0), %struct.option* getelementptr 
inbounds ([3 x %struct.option]* @longopts, i64 0, i64 0), i32* null) #9, !dbg 
!1788

#declare getopt_long

; Function Attrs: nounwind
declare i32 @getopt_long(i32, i8**, i8*, %struct.option*, i32*) #2

I could not find getopt_long definition in the .ll file corresponding to 
extracted .bc file. Please advise.


Thanks,

Shehbaz


[1] http://klee.github.io/docs/coreutils-experiments/

[2] 
https://github.com/klee/klee/blob/bec4ceafe412a08de303678581e07451a1399e1f/runtime/POSIX/klee_init_env.c#L85


From: Dan Liew <d...@su-root.co.uk>
Sent: Saturday, May 27, 2017 4:12:24 AM
To: Shehbaz Jaffer
Cc: klee-dev@imperial.ac.uk
Subject: Re: [klee-dev] KLEE behaviour with function call to uclibc

On 27 May 2017 at 01:36, Shehbaz Jaffer <shehbaz.jaf...@mail.utoronto.ca> wrote:
> Hello KLEE developers,
>
> I am trying to understand how KLEE emulates environment variables. In 
> particular, how do functions like getopt_long get executed with KLEE, when we 
> give --sym-arg as an argument. getopt_long is found in unistd.h, and is 
> available in uclibc. For a coreutils file like groups.c where the following 
> line is encountered:
>
> while ((optc = getopt_long (argc, argv, "", longopts, NULL)) != -1)
>
> how would KLEE generate different inputs? for simplicity, consider the case 
> where we use --sym-arg 1 10 as a command line argument which leads to argv[1] 
> becoming a symbolic buffer of size 10, if my understanding is correct, when 
> getopt_long is called, one of the two things can happen:
>
> 1) argv[1] is declared as a symbolic buffer, we explore all paths of 
> getopt_long code in uclibc library, so all different if's / elses in 
> getopt_long code will get forked and execute to give different optc values.
> 2) getopt_long is skipped, optc variable is made symbolic, and the main 
> function in groups.c treats optc as symbolic value. on any successive if/else 
> condition in groups.c containing optc, a new state is forked with new value 
> of optc.

Take a look at the `klee_init_env()` function [1]. When KLEE is called
with --posix-runtime the program's `main()` is rewritten to call
`klee_init_env()` at the beginning and change `argc` and `argv`.
The `klee_init_env()` function introduces symbolic arguments if
requested. From that point onwards execution of the program just
proceeds as normal so `getopt_long()` will just be interpreted by KLEE
just like any other function.

[1] 
https://github.com/klee/klee/blob/bec4ceafe412a08de303678581e07451a1399e1f/runtime/POSIX/klee_init_env.c#L85
___
klee-dev mailing list
klee-dev@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/klee-dev


Re: [klee-dev] KLEE behaviour with function call to uclibc

2017-05-27 Thread Dan Liew
On 27 May 2017 at 01:36, Shehbaz Jaffer  wrote:
> Hello KLEE developers,
>
> I am trying to understand how KLEE emulates environment variables. In 
> particular, how do functions like getopt_long get executed with KLEE, when we 
> give --sym-arg as an argument. getopt_long is found in unistd.h, and is 
> available in uclibc. For a coreutils file like groups.c where the following 
> line is encountered:
>
> while ((optc = getopt_long (argc, argv, "", longopts, NULL)) != -1)
>
> how would KLEE generate different inputs? for simplicity, consider the case 
> where we use --sym-arg 1 10 as a command line argument which leads to argv[1] 
> becoming a symbolic buffer of size 10, if my understanding is correct, when 
> getopt_long is called, one of the two things can happen:
>
> 1) argv[1] is declared as a symbolic buffer, we explore all paths of 
> getopt_long code in uclibc library, so all different if's / elses in 
> getopt_long code will get forked and execute to give different optc values.
> 2) getopt_long is skipped, optc variable is made symbolic, and the main 
> function in groups.c treats optc as symbolic value. on any successive if/else 
> condition in groups.c containing optc, a new state is forked with new value 
> of optc.

Take a look at the `klee_init_env()` function [1]. When KLEE is called
with --posix-runtime the program's `main()` is rewritten to call
`klee_init_env()` at the beginning and change `argc` and `argv`.
The `klee_init_env()` function introduces symbolic arguments if
requested. From that point onwards execution of the program just
proceeds as normal so `getopt_long()` will just be interpreted by KLEE
just like any other function.

[1] 
https://github.com/klee/klee/blob/bec4ceafe412a08de303678581e07451a1399e1f/runtime/POSIX/klee_init_env.c#L85

___
klee-dev mailing list
klee-dev@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/klee-dev