does cloud9 count as distributed? that is open source at least. 

surely also different CGC systems count as distributed? we used 10k cores and 
TBs of RAM on symbolic execution... 

> On Sep 6, 2015, at 12:02, Halvar Flake <[email protected]> wrote:
> 
> Hey all,
>  
> while I really should not be posting here while I am on my kinda-sabbatical, 
> the ocean
> is entirely flat today and I don't feel like doing real work - so posting to 
> DD is a 
> nice middle ground.
>  
> There was a period in my life where at each and every conference I attended, 
> some
> bright and very motivated youngster would come up to me and excitedly tell me 
> about
> this new reverse engineering framework he was building - usually in Python or 
> Ruby - where
> everything was an object, and it would all be so great once development got a 
> bit further.
>  
> Over the years, I must have heard about 10+ such frameworks, and each time the
> authors eventually ran into the same problem: Binaries are larger than people 
> think,
> and your RAM is more limited than you think.
>  
> A larger real-world application will, once all dependencies are loaded and 
> mapped
> into it's address space, easily exceed 100 megs of executable code. With 
> x86_64
> instructions averaging a bit above 4 bytes, we are quickly talking about 25m+ 
> instructions.
>  
> If, for some bizarre reason, you are confined to a 32-bit process, you have 
> 3GB of
> address space to distribute among 25m+ instructions, which means that in the 
> best
> case you can afford to spend 128 bytes per instruction - not counting heap 
> overhead.
>  
> On my machine, an empty Python dictionary takes 280 bytes, an empty string 37.
>  
> In a more realistic scenario, you have 32 GB of RAM in your machine, which 
> gives you
> a bit more than 1k of memory per instruction. That should be plenty, no?
>  
> Not so much - if you want to perform any sophisticated analysis on code, you 
> will want
> to have some approximation of the program state associated with program 
> points, and
> the number of program points where a reasonable approximation of this can be 
> done
> in 1k or less is not going to be large.
> 
> Where am I going with all this rambling?
>  
> While machine code is not "big data" in the modern, 
> search-enginey-social-networky-sense,
> real-world-programs are "not small data" - as soon as you wish to associate 
> extra
> information with parts of the program, you will quickly exceed the ability to 
> keep it all in
> memory on a single machine - provided you analyse something "real" instead of 
> notepad.
>  
> It is interesting that there are no distributed static analysis frameworks 
> yet - and how easy
> it is to conveniently forget about scale issues when "architecting" (e.g. 
> dreaming about)
> the reverse engineering framework one would like to have.
>  
> Cheers,
> Halvar
> PS: It is possible that the successes of fuzzing are due mainly due to the
> fact that it happens to be embarrassingly parallel.
> _______________________________________________
> Dailydave mailing list
> [email protected]
> https://lists.immunityinc.com/mailman/listinfo/dailydave
_______________________________________________
Dailydave mailing list
[email protected]
https://lists.immunityinc.com/mailman/listinfo/dailydave

Reply via email to