Nice work Frank. Just wondering, did anyone ever look at using SSE2 for faster CPU code (or maybe even tried)? Haven't looked at A5/1 very well, but skipping through A5Cpu.cpp (this is the relevant code, right?) the main loop seems to consist of mostly bitshifts, xors, ors and ands. That should perform alright in SSE.
Daniƫl Frank A. Stevenson schreef: > I cleaned up the ATI brook code, and made a new version of the shared > library, that uses only the CPU for generating chains. The code can be > found here (linux only ATM): > > http://traxme.net/a5/a5_cpu.tar.gz > > There is a small python frontend that will generate real chains, (follow > the instructions in my previous post about using the python script) - > also the script can be edited to set the number of threads (cores) you > wish to use. > > An AMD Phenom x4 @ 3.2 GHz makes around 16 chains / second. I suppose > there is room for assembly optimization here, but that isn't really the > point. I am writing this code, to look into efficient table lookup once > that tables have been generated. The idea is to spread the lookup part > to machines that may not have a GPU. > > On my machine the code will cause 32 lookups to disk / second, hardly a > cause for alarm, so a bog standard hard disk will do. > > But if the GPU is used for lookup, the rate will be much higher > (320/sec) - and I am currently copying sorted tables to a slow USB flash > drive, to determine of it can keep up with the pace with respect to > lookup / reads. (It takes some hours to copy 1 million files down to > this device, so I may change my approach to sorting, to something that > is more in line with the 64kb block size commonly found on 8GB flash disks > > cheers, > Frank > > > > > > > _______________________________________________ > A51 mailing list > [email protected] > http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51 > > _______________________________________________ A51 mailing list [email protected] http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
