Re: [viff-devel] PRSS zero- and double-sharings
Well, there are many more or less interesting conclusions to draw from your benchmark, Marting. Not surprisingly, matrix multiplication turns out to be expensive. It is worth considering using a non-naive algorithm for this multiplication, but I am not convinced there is very much to gain. One thing I really do find interesting about the table is the amount of time spent in inc_pc_wrapper. Perhaps it is possible to improve this somehow? ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] PRSS zero- and double-sharings
Mikkel Krøigård [EMAIL PROTECTED] writes: Well, there are many more or less interesting conclusions to draw from your benchmark, Martin. Not surprisingly, matrix multiplication turns out to be expensive. Hmm... I did see that there were a bunch of calls to __mul__ in matrix.py, but I thought they came from the initialization of _hyper in ActiveRuntime. The the initialization of _hyper does not use any matrix multiplications, so this is wrong?! When the mul method uses prss_get_triple, then _hyper should never be used or even initialized and so there should be no matrix stuff going on... I think I measured the wrong code somehow :-) One thing I really do find interesting about the table is the amount of time spent in inc_pc_wrapper. Perhaps it is possible to improve this somehow? It only used 0.5 seconds of its own time -- the 21 seconds are the total time spend in the child-calls made by inc_pc_wrapper. Since it wraps all important functions its clear that the cumulative time will be big: ncalls tottime percall cumtime percall 48003/60030.5180.000 21.1950.004 But any optimization would be good -- if we can same a tiny bit for each of the 48000 calls it might sum up :-) -- Martin Geisler ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] PRSS zero- and double-sharings
Well, there are many more or less interesting conclusions to draw from your benchmark, Martin. Not surprisingly, matrix multiplication turns out to be expensive. Hmm... I did see that there were a bunch of calls to __mul__ in matrix.py, but I thought they came from the initialization of _hyper in ActiveRuntime. The the initialization of _hyper does not use any matrix multiplications, so this is wrong?! No the initialization of hyper should definitely not use matrix multiplications. When the mul method uses prss_get_triple, then _hyper should never be used or even initialized and so there should be no matrix stuff going on... I think I measured the wrong code somehow :-) Well somehow, matrices were multiplied. And not only that, a lot of time was spent doing it :) One thing I really do find interesting about the table is the amount of time spent in inc_pc_wrapper. Perhaps it is possible to improve this somehow? It only used 0.5 seconds of its own time -- the 21 seconds are the total time spend in the child-calls made by inc_pc_wrapper. Since it wraps all important functions its clear that the cumulative time will be big: ncalls tottime percall cumtime percall 48003/60030.5180.000 21.1950.004 But any optimization would be good -- if we can same a tiny bit for each of the 48000 calls it might sum up :-) My observation was just that wrapping is somewhat expensive. I do not quite know what to do about this. We have already discussed the alternatives to using this method, and they were not particularly promising. ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] PRSS zero- and double-sharings
Martin Geisler [EMAIL PROTECTED] writes: Strangely the time for preprocessing has not improved... It stayed at an average time of about *20 ms* for a multiplication triple both before and after the change -- I don't understand that :-( I do now! :-) It turned out that the preprocessing was still done with the old code. Using the new PRSS-based code brings the time down to *5 ms* pr triple, so it cuts the time by a factor of four. Thanks to Mikkel for making me think about why there were matrix multiplications in the profile information... -- Martin Geisler ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] PRSS zero- and double-sharings
Mikkel Krøigård [EMAIL PROTECTED] writes: It only used 0.5 seconds of its own time -- the 21 seconds are the total time spend in the child-calls made by inc_pc_wrapper. Since it wraps all important functions its clear that the cumulative time will be big: ncalls tottime percall cumtime percall 48003/60030.5180.000 21.1950.004 But any optimization would be good -- if we can same a tiny bit for each of the 48000 calls it might sum up :-) My observation was just that wrapping is somewhat expensive. I do not quite know what to do about this. We have already discussed the alternatives to using this method, and they were not particularly promising. Function calls are unfortunately quite expensive in Python. One idea is to use Psyco (http://tracker.viff.dk/issue51) since it supposedly should be able to write optimized code on the fly, maybe even do inlining -- I'm not sure. -- Martin Geisler ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] PRSS zero- and double-sharings
Martin Geisler [EMAIL PROTECTED] writes: I'm thinking that there might be some unfortunate overhead in the preprocessing book-keeping. We should try running benchmark.py under a profiler to see where the time is spent. There is now support for a --profile flag, and running benchmark.py with it gives (1000 actively secure multiplications, 4 players): 2706417 function calls (2113774 primitive calls) in 37.755 CPU seconds Ordered by: internal time, call count List reduced from 301 to 40 due to restriction 40 ncalls tottime percall cumtime percall filename:lineno(function) 586.6420.115 34.6400.597 selectreactor.py:82(doSelect) 230033/170146.0110.000 29.8500.002 defer.py:306(_runCallbacks) 305044/2490233.0990.0009.9370.000 defer.py:168(addCallbacks) 350001.6370.0008.0420.000 runtime.py:1088(mul) 465041.3380.0004.2180.000 runtime.py:184(__init__) 1051151.3100.0001.7800.000 field.py:371(__mul__) 1883071.2690.0001.2690.000 field.py:339(__init__) 20001.0530.001 12.0170.006 matrix.py:128(__mul__) 117027/210140.9920.000 27.1290.001 defer.py:283(_startRunCallbacks) 720000.8480.0001.4860.000 field.py:342(__add__) 192023/1580170.8200.0008.2830.000 defer.py:185(addCallback) 117027/210140.8070.000 27.2820.001 defer.py:229(callback) 370000.7990.0002.3820.000 runtime.py:137(clone) 1155120.7620.0001.3680.000 runtime.py:70(__init__) 150060.6030.0000.6030.000 runtime.py:574(_expect_data) 102008/670060.5250.0008.9530.000 runtime.py:208(_callback_fired) 48003/60030.5180.000 21.1950.004 runtime.py:353(inc_pc_wrapper) 360000.5140.0006.2010.000 runtime.py:742(add) 90000.4940.0001.2850.000 shamir.py:95(recombine) 435040.3730.0005.0850.000 runtime.py:216(gather_shares) 190060.3400.0001.0510.000 runtime.py:306(sendData) 700.3360.005 27.8190.397 basic.py:345(dataReceived) 150090.3280.000 27.4830.002 runtime.py:267(stringReceived) 1170260.2870.0000.2870.000 defer.py:162(__init__) 190090.2720.0000.5170.000 abstract.py:164(write) 60000.2630.0000.4550.000 matrix.py:201(transpose) 30000.2490.0002.5070.001 runtime.py:719(exchange) 20000.2490.0001.0530.001 shamir.py:30(share) 360000.2330.0001.1740.000 runtime.py:754(lambda) 15000/40000.2030.0003.7750.001 runtime.py:534(schedule_callback) 150060.1970.0001.2490.000 runtime.py:608(_expect_share) 190090.1950.0000.7120.000 basic.py:357(sendString) 340000.1870.0000.6970.000 runtime.py:1102(lambda) 20000.1780.0001.2540.001 prss.py:69(convert_replicated_shamir) 5040.1760.0000.2630.001 defer.py:458(__init__) 880000.1670.0000.1670.000 matrix.py:72(__getitem__) 60000.1600.0000.1600.000 prss.py:260(__init__) 60000.1550.0000.1550.000 prss.py:326(__call__) 370000.1500.0004.2250.000 runtime.py:144(split_result) 340000.1460.0005.5000.000 runtime.py:105(__rmul__) I'm not really sure how to interpret this... If any of you want to investigate, then here's a link to the manual: http://docs.python.org/lib/module-profile.html :-) -- Martin Geisler ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk