does cloud9 count as distributed? that is open source at least. surely also different CGC systems count as distributed? we used 10k cores and TBs of RAM on symbolic execution...
> On Sep 6, 2015, at 12:02, Halvar Flake <[email protected]> wrote: > > Hey all, > > while I really should not be posting here while I am on my kinda-sabbatical, > the ocean > is entirely flat today and I don't feel like doing real work - so posting to > DD is a > nice middle ground. > > There was a period in my life where at each and every conference I attended, > some > bright and very motivated youngster would come up to me and excitedly tell me > about > this new reverse engineering framework he was building - usually in Python or > Ruby - where > everything was an object, and it would all be so great once development got a > bit further. > > Over the years, I must have heard about 10+ such frameworks, and each time the > authors eventually ran into the same problem: Binaries are larger than people > think, > and your RAM is more limited than you think. > > A larger real-world application will, once all dependencies are loaded and > mapped > into it's address space, easily exceed 100 megs of executable code. With > x86_64 > instructions averaging a bit above 4 bytes, we are quickly talking about 25m+ > instructions. > > If, for some bizarre reason, you are confined to a 32-bit process, you have > 3GB of > address space to distribute among 25m+ instructions, which means that in the > best > case you can afford to spend 128 bytes per instruction - not counting heap > overhead. > > On my machine, an empty Python dictionary takes 280 bytes, an empty string 37. > > In a more realistic scenario, you have 32 GB of RAM in your machine, which > gives you > a bit more than 1k of memory per instruction. That should be plenty, no? > > Not so much - if you want to perform any sophisticated analysis on code, you > will want > to have some approximation of the program state associated with program > points, and > the number of program points where a reasonable approximation of this can be > done > in 1k or less is not going to be large. > > Where am I going with all this rambling? > > While machine code is not "big data" in the modern, > search-enginey-social-networky-sense, > real-world-programs are "not small data" - as soon as you wish to associate > extra > information with parts of the program, you will quickly exceed the ability to > keep it all in > memory on a single machine - provided you analyse something "real" instead of > notepad. > > It is interesting that there are no distributed static analysis frameworks > yet - and how easy > it is to conveniently forget about scale issues when "architecting" (e.g. > dreaming about) > the reverse engineering framework one would like to have. > > Cheers, > Halvar > PS: It is possible that the successes of fuzzing are due mainly due to the > fact that it happens to be embarrassingly parallel. > _______________________________________________ > Dailydave mailing list > [email protected] > https://lists.immunityinc.com/mailman/listinfo/dailydave _______________________________________________ Dailydave mailing list [email protected] https://lists.immunityinc.com/mailman/listinfo/dailydave
