I have access to a cluster if you want me to run some tests. On Wed, Nov 4, 2009 at 12:44 AM, <[email protected]> wrote: > Send A51 mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51 > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of A51 digest..." > > > Today's Topics: > > 1. Re: error while cd obj ; make installing the program (sascha) > 2. Re: GPU choice (sascha) > 3. RT trade-offs (Re: Hosting the tables) (Karsten Nohl) > 4. Re: RT trade-offs (Re: Hosting the tables) (sascha) > 5. Re: RT trade-offs (Re: Hosting the tables) (Karsten Nohl) > 6. Re: RT trade-offs (Re: Hosting the tables) (sascha) > 7. Re: CPU implementation (more code) (sascha) > 8. Re: CPU implementation (more code) (Sascha Krissler) > 9. Re: Fast sorting & compact tables (Frank A. Stevenson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 3 Nov 2009 15:14:25 +0100 > From: sascha <[email protected]> > Subject: Re: [A51] error while cd obj ; make installing the program > To: [email protected] > Message-ID: <20091103141425.ga4...@test> > Content-Type: text/plain; charset=us-ascii > > generator=increment is not enabled by default, so > use > generator=lfsr2::tablesize=32 > > Please provide a gdb backtrace for the segmentation fault > > gdb --args ./c --options --and --more --options generator --foo --bar > > gdb> run > [ Segfaults ] > gdb> ba > > On Tue, Nov 03, 2009 at 01:49:22AM -0800, Sylv1 wrote: >> Hello, >> thanks for the help the make is now working correctly. I've just did a >> stupid error in the paths settings of the makefile... >> >> But i tried to run the test with: >> time ./c --condition rounds:rounds=256 --implementation sharedmem >> --algorithm A51 --roundfunc >> xor:condition=distinguished_point::bits=8:generator=increment --device >> cuda:threads=256:blocks=4:sleep=5000:operations=32768 --logger verbose >> --work random:prefix=10,0 --consume print:results=16 generate >> --chainlength 65536 --chains 1024 >> > > > ------------------------------ > > Message: 2 > Date: Tue, 3 Nov 2009 15:18:03 +0100 > From: sascha <[email protected]> > Subject: Re: [A51] GPU choice > To: [email protected] > Message-ID: <20091103141803.gb4...@test> > Content-Type: text/plain; charset=us-ascii > > the performance can be calculated as number of cores * core clock. > i only have numbers for the 9600M-GT (20 chains/sec at 500mhz) and the > GTX260 (140 chains/sec at 576 mhz). The 9600M-GT has 32 cores. > So a 9600GT with 64 cores and a slightly higher clock should be able to > do 42 chains/sec. (for example) > > > On Tue, Nov 03, 2009 at 02:39:36AM -0800, Sylv1 wrote: >> Hello! >> >> I also want to buy new harder to begin my computations. What is the best >> quality/price in the following set and wich card performs the best for the >> tables computation : >> Nvidia GeForce 8400GS, GeForce 9600GT, GeForce 9400GT, GeForce 9500GT, >> GeForce 9800GX2 and GeForce GTX260.? >> >> Thanks >> sylv1 >> >> --- On Mon, 11/2/09, sascha <[email protected]> wrote: >> >> From: sascha <[email protected]> >> Subject: Re: [A51] GPU choice >> To: [email protected] >> Date: Monday, November 2, 2009, 6:25 PM >> >> On Mon, Nov 02, 2009 at 05:46:13PM +0000, Jonathan wrote: >> > Hello list! >> > Been away for the last month with no internet access in the middle of >> > nowhere, >> > sorry for absence. >> > >> > I am now in the process of taking some hardware... >> > apart from Motherboard, cpu, etc, how does this GPU look? >> > >> > ENGTX295/2DI/1792MD3 - 1792MB Asus GTX 295 55nm, 1998MHz GDDR3, GPU >> > 576 MHz, Shader 1242 MHz, 480 Cores >> >> looks good. performance per dollar is quite bad with a GTX295 though. >> 2 GTX260 with 432 cores together cost 260Eur and a GTX295 with 480 >> cores costs 370Eur. the 295 only occupies 2 slots instead of 4 with >> 2 GTX260 though. >> > >> > Would using 2 increase (double?) performance? >> >> scales linearly. >> http://en.wikipedia.org/wiki/Embarrassingly_parallel >> >> > >> > What countigous Disk size should I look into? >> >> a single table occupies around 6gb, you can expect to fill 12gb per month >> with a GTX295. For the sorting you may need another 6 or 15gb of temp >> storage. >> that depends whether we dump stxxl which is slow and hungry (in the way it >> is used now) for our own sorter implementation. >> >> > >> > Jon >> > >> _______________________________________________ >> A51 mailing list >> [email protected] >> http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51 >> >> >> >> > >> _______________________________________________ >> A51 mailing list >> [email protected] >> http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51 > > > > ------------------------------ > > Message: 3 > Date: Tue, 3 Nov 2009 18:12:00 +0100 > From: Karsten Nohl <[email protected]> > Subject: [A51] RT trade-offs (Re: Hosting the tables) > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > > On Nov 3, 2009, at 4:09 PM, Jonathan wrote: > >> 2009/11/2 Karsten Nohl <[email protected]>: >>>> I phrased the question wrong perhaps. >>>> What is an IDEAL amount of storage for a complete set of tables >>>> with a >>>> decent probability? >>>> >>>> 99% or 96% being decent... >>> >>> The math works out as: >>> >>> 50% failure rate at <3TB >>> Each time you half the failure rate, the storage doubles. >>> 75% = 6TB; ... ; 99% = ~150 TB > > One correction: This calculation ignored collisions and was therefore > even too optimistic. > >> I remember the THC tables that were (allegedly) computed >> were more on the order of 2.2 TB for a 6% I think... >> Seems like a large difference... Maybe there is room for improvement? > > The trade-off goes three ways: storage space vs. attack time vs. > failure rate > > The THC tables -- as I understand them -- had larger coverage in > smaller space at the cost of much increased attack time. For those > tables to be useable with reasonable attack times you have to have an > FPGA cluster. > > Let 2^D be the available storage (in # of chains), 2^M be the hash > size, and 1/2^C be the coverage, then the attack time 2^T (in # of > computations) is derived as: > > Distinguished Point Tables > T = 2M - 3C - 2D + 1 > Read as: > > Rainbow Tables > T = 2M - 2C - 2D - 2 > > We use a mix of these two techniques but stay closer to the > distinguished points scenario. > Parameters for the tables being computed at the moment are: > M = 64 (A5/1) > D = 38 (4 TB) > C = 6 (50% success rate) > => T = 33.5 (2 minutes on a computer with GTX260 that has all tables; > less when the distributed cracking network is used) > > > > > > ------------------------------ > > Message: 4 > Date: Tue, 3 Nov 2009 18:48:44 +0100 > From: sascha <[email protected]> > Subject: Re: [A51] RT trade-offs (Re: Hosting the tables) > To: [email protected] > Message-ID: <20091103174844.ga6...@test> > Content-Type: text/plain; charset=us-ascii > > On Tue, Nov 03, 2009 at 06:12:00PM +0100, Karsten Nohl wrote: >> We use a mix of these two techniques but stay closer to the >> distinguished points scenario. >> Parameters for the tables being computed at the moment are: >> M = 64 (A5/1) >> D = 38 (4 TB) >> C = 6 (50% success rate) >> => T = 33.5 (2 minutes on a computer with GTX260 that has all tables; >> less when the distributed cracking network is used) >> > This number is not correct. For a lookup of a single value, you have to > compute > (1 + 2 + 3 + ... + 32) * 2^15 A5/1 rounds, which is 528 * 2^15 = 17.3 million > To do the lookup in all 400 tables, you need 7 billion rounds. > 7 billion links can be computed in 42 seconds on a GTX260, and you > have to do this for 204 keystream samples, totaling 8500 seconds on > a single node with all the tables and 21 seconds with 400 nodes sharing > the computation. > > > ------------------------------ > > Message: 5 > Date: Tue, 3 Nov 2009 19:09:57 +0100 > From: Karsten Nohl <[email protected]> > Subject: Re: [A51] RT trade-offs (Re: Hosting the tables) > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > > On Nov 3, 2009, at 6:48 PM, sascha wrote: > >> On Tue, Nov 03, 2009 at 06:12:00PM +0100, Karsten Nohl wrote: >>> We use a mix of these two techniques but stay closer to the >>> distinguished points scenario. >>> Parameters for the tables being computed at the moment are: >>> M = 64 (A5/1) >>> D = 38 (4 TB) >>> C = 6 (50% success rate) >>> => T = 33.5 (2 minutes on a computer with GTX260 that has all tables; >>> less when the distributed cracking network is used) >>> >> This number is not correct. For a lookup of a single value, you have >> to compute >> (1 + 2 + 3 + ... + 32) * 2^15 A5/1 rounds, which is 528 * 2^15 = >> 17.3 million >> To do the lookup in all 400 tables, you need 7 billion rounds. >> 7 billion links can be computed in 42 seconds on a GTX260, and you >> have to do this for 204 keystream samples, totaling 8500 seconds on >> a single node with all the tables and 21 seconds with 400 nodes >> sharing >> the computation. > > that's correct -- my bad. I was using the formula for pure > distinguished points tables. > > > ------------------------------ > > Message: 6 > Date: Tue, 3 Nov 2009 19:16:53 +0100 > From: sascha <[email protected]> > Subject: Re: [A51] RT trade-offs (Re: Hosting the tables) > To: [email protected] > Message-ID: <20091103181653.ga6...@test> > Content-Type: text/plain; charset=us-ascii > > On Tue, Nov 03, 2009 at 07:09:57PM +0100, Karsten Nohl wrote: >> >> On Nov 3, 2009, at 6:48 PM, sascha wrote: >> >> > On Tue, Nov 03, 2009 at 06:12:00PM +0100, Karsten Nohl wrote: >> >> We use a mix of these two techniques but stay closer to the >> >> distinguished points scenario. >> >> Parameters for the tables being computed at the moment are: >> >> M = 64 (A5/1) >> >> D = 38 (4 TB) >> >> C = 6 (50% success rate) >> >> => T = 33.5 (2 minutes on a computer with GTX260 that has all tables; >> >> less when the distributed cracking network is used) >> >> >> > This number is not correct. For a lookup of a single value, you have >> > to compute >> > (1 + 2 + 3 + ... + 32) * 2^15 A5/1 rounds, which is 528 * 2^15 = >> > 17.3 million >> > To do the lookup in all 400 tables, you need 7 billion rounds. >> > 7 billion links can be computed in 42 seconds on a GTX260, and you >> > have to do this for 204 keystream samples, totaling 8500 seconds on >> > a single node with all the tables and 21 seconds with 400 nodes >> > sharing >> > the computation. >> >> that's correct -- my bad. I was using the formula for pure >> distinguished points tables. > > And i guess with all tables having no round function which > is impossible to do because of the merges. > To show rationale for rainbow tables again, with 400 * 32 DP tables > (having the same number of merges as 400 rainbow tables), > you have to compute 32 * 2^20 * 400 * 204 A5/1 rounds instead > of 528 * 2^15 * 400 * 204, which is half as much. > > > ------------------------------ > > Message: 7 > Date: Tue, 3 Nov 2009 21:52:53 +0100 > From: sascha <[email protected]> > Subject: Re: [A51] CPU implementation (more code) > To: [email protected] > Message-ID: <20091103205253.ga7...@test> > Content-Type: text/plain; charset=us-ascii > >> Regarding CPU implementations: > Some good news: > > The SSE version achieves 18 chains/second per core of my 2ghz core2duo. > The Cell version runs at 10 chains/second per SPU (3.2ghz). > > some polishing has to be done before the code can be released. > Anyone here paying the rent for roadrunner for 24 hours? > > > > ------------------------------ > > Message: 8 > Date: Wed, 04 Nov 2009 00:06:28 +0100 > From: Sascha Krissler <[email protected]> > Subject: Re: [A51] CPU implementation (more code) > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=iso-8859-15 > > Forward to the list from PM > > >> So you now use m128i registers and do instructions on 128 bit wide > > yes > >> values? I assume that what you are doing is what is usually called bit >> slicing? > > yes > >> For both Cell and SSE2, do you use interlacing of instructions to fill >> up the pipeline (and usually improve speed by a factor 2 or 3)? (not >> sure if that really helps on bit sliced calculations though) > > i did not pay special attention to the instruction scheduling. > The heavy load loop is unrolled and gcc will have plenty of opportunity > to reorder the instructions. > Also the loop body is executed ~ 650m times per second on a 2ghz core, > effectivly using 3 clocks per loop for the 7 instructions (unrolled) > > on the cell spu it is half as fast despite higher clocks, so i should try to > optimize there. > (but the SPU is also not fully multiscalar, so should be half as fast) > > with the SPU being in-order, instruction instruction reordering and > scheduling may pay off. > > This function does 90% of the work: > nshift == 0, lenght= 19 or 22 or 23, RT=ssevector(uint64, uint64) > > template <int base, int length, int nshift, typename RT> > void lsh_reg(RT * regs, RT clock3) { > RT clock2 = ~clock3; > int i; > for (i = base - length + 1; i < base; ++i) { > regs[i] = regs[i + nshift] & clock2 | regs[i + nshift + 1] & clock3; > } > > > > Before loop unrolling: > > xmm0: clock2 > xmm6: ~clock2 > 720(...): regs[i + nshift] > 736(...): regs[i + nshift + 1] > > L41: > .loc 2 110 0 > movdqa 720(%ecx,%eax), %xmm1 > movdqa 736(%ecx,%eax), %xmm2 > L18: > .loc 2 111 0 > pand %xmm0, %xmm1 > pand %xmm6, %xmm2 > por %xmm2, %xmm1 > movdqa %xmm1, 720(%ecx,%eax) > addl $16, %eax > .loc 2 110 0 > cmpl $288, %eax > jne L41 > > After loop unrolling this pattern repeats: > xmm2, xmm6: clock2, ~clock2, 720(), 736() regs[...] > > movdqa %xmm2, %xmm0 > movdqa %xmm6, %xmm1 > pand 720(%ebx,%eax), %xmm0 > pand 736(%ebx,%eax), %xmm1 > por %xmm1, %xmm0 > movdqa %xmm0, 720(%ebx,%eax) > leal 32(%edx), %eax > > > ______________________________________________________ > GRATIS f?r alle WEB.DE-Nutzer: Die maxdome Movie-FLAT! > Jetzt freischalten unter http://movieflat.web.de > > > > ------------------------------ > > Message: 9 > Date: Wed, 04 Nov 2009 06:43:55 +0100 > From: "Frank A. Stevenson" <[email protected]> > Subject: Re: [A51] Fast sorting & compact tables > To: A51 Mailing List <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Frank A. Stevenson wrote: >> I wrote a table sorting function, ... >> >> Tarball is available for inspection at: http://traxme.net/a5/ (native >> endianness is assumed when reading the tables) >> >> F >> > I have done more research on efficient table storage. My initial > suggestion of using 1 million distict files would simply overwhelm a > ext2 forametted flash device with inodes, so I had to reduce the number > of files to a more manageble 4096. I have updated my tables sorting code > accordingly (on the website for now, will add to svn later ) I have also > added code that does the lookup into the compressed tables (not finished) > > The interesting figures from a 8GB Kingston datatraveler with 5MB/sec > write speed are: > > 1000 lookups in a smaller table (40million chains) can be performed in > 2.1 seconds from a cold device (newly inserted, with no locally cached > inodes) - access speed is not an issue :-) > > Time to sort a complete table would be limited to the write speed of the > device (~15 minutes). It is also interesting to note that the suggested > table size should fit in < 4GB in compressed format. So an even > multiples of tables can easily be stored on devices in commonly > available sizes. > > The big advantage with flash based storage is that it won't bog down > your machine with a torrent of disk requests when doing th lookup work. > > F > > > > > ------------------------------ > > _______________________________________________ > A51 mailing list > [email protected] > http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51 > > > End of A51 Digest, Vol 6, Issue 5 > ********************************* >
-- The Elata Foundation www.elata.org _______________________________________________ A51 mailing list [email protected] http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
