Hi all, KMC packaging is done.
http://anonscm.debian.org/cgit/debian-med/kmc.git/ Steffen, would you be able to sponsor this package? Regards, Jorge On Mon, Nov 17, 2014 at 8:50 AM, Jorge Sebastião Soares < [email protected]> wrote: > Hi Marek, > > Sorry for the time it took for me to get back to you. > I am at the point that I have the kmc package ready to be submitted. > Just doing final tidy ups and automating man page creation for the two > binaries (kmc and kmc_dump). > To do this I use help2man which has been proving great for turning any > kind of usage into man pages. > I should be done with this by lunch time today. > > At the moment, for the creaiton of the kmc debian package I'm using my > github repository. > I hope that's ok for you for the moment. > Once you guys are done with writing the new version of kmc we can point > the package to look at your own VCS repository. > Let me know if this ok for now. > > On Wed, Nov 12, 2014 at 6:14 PM, Marek Kokot <[email protected]> wrote: > >> Hello Jorge, >> >> Hope all is well. >> >> Its fine, thanks. How are you? ;-) >> >> > Great! > I'm very well as well. :) > > >> Thank you for your tests. I have made a couple of tests on my own and >> the results was pretty similar to your. As I said it was only couple of >> tests but on reative big data (human genome about 100 GB gzipped). >> > > Cool! > > >> >> My question is: Have you used HDD or SSD? I have used HDD, for SSD it >> is possible that differences in time may be bigger. I will try to test it >> for SSD. >> > > I was running it on HDD. > Did you run it on SSD already? Did you get any significant improvement? > > >> >> I spoke with Sebastian today and we agree that it is a good option to >> make compilation without asmlib possible. I will deal with this and I let >> you know when it is ready. >> > > Awesome! > I'm so glad to hear this. > This will also mean that KMC will be able to be built for other chips > other than the intel ones. > This feature will be available only with the new verison of KMC? > > >> >> KMC in current version has also "boost" dependency, but we think to go >> native, because g++ supports c++11 threads. Other functionalities from >> "boost" that we use are easy to replace with c++11. It should make >> compilation time shorter for the ones who hasn't "boost" installed. >> > > I see. It does take a while ot compile, but dependency on Boost is > definitely not an issue from where I'm sitting. > The package builds properly with the Debian boost, zlib and libbz2 > dependencies. > I do appreciate that it does take a while to compile. > > >> >> In one of earlier e-mails you asked about Agner (asmlib author): >> >> Do you think it would make sense for me to approach him through the >> Debian Med team? >> >> We would prefer to deal with it on our own. >> >> No problem at all. > > > >> If there is something I ommited and I didn't answer to any of your >> questions please let me know. >> > > I think you answered everything. I do appologise for stressing you guys > out with these requests, especially at a time when you're concentrating on > writing a new release. > > Kind regards, > > Jorge Soares > > > ------------------------------ > *Od:* Jorge Sebastião Soares [[email protected]] > *Wysłano:* 11 listopada 2014 18:00 > *Do:* Marek Kokot; Sebastian Deorowicz > *DW:* Debian Med Project List > *Temat:* [KMC + asmlib] KMC Debian package progress > > Hi guys, > > I had to send this again as I used my Sanger email, but that is not > subscribed to the Debian Med Mailing list. > > So here it goes again (if you can ignore the previous one and respond to > this one). > > Hope all is well. > Have you given any thought to my proposal of a compile time option that > won't use asmlib? > > I have included the Debian Med team on this email as they are aware of the > packaging of KMC and the whole issue with asmlib. > > I have been doing some benchmarking on KMC for the past couple of days. > I have compiled KMC in three ways: > > kmc_original - kmc code compiled against the version of asmlib distributed > with KMC- alibelf64.a > kmc_native - kmc code compiled against the native OS libraries > kmc_js21 - kmc code compiled against the new version of asmlib, compiled > on my machine with my Unix makefile - libaelf64.a > > I have also used the executables provided in your website in the benchmark. > > kmc_exe > > The machine I used for this is a Debian Virtual Machine running on Vagrant. > > Here are the architecture details: > > vagrant@debian:~$ cat /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz > stepping : 10 > microcode : 0x60b > cpu MHz : 1426.514 > cache size : 6144 KB > physical id : 0 > siblings : 1 > core id : 0 > cpu cores : 1 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc > rep_good nopl pni monitor ssse3 lahf_lm > bogomips : 2853.02 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > > To do the benchmark I used a fastq file that has a fair bit of > contamination (many different kmers). The file is about 227M in size. > > Here are some of the results: > > For an average time in seconds over 10 runs for all the differently > compiled executables: > > [1] kmc_original - 'average_duration' => '41.400' > [2] kmc_native - 'average_duration' => '41.675' > [3] kmc_js21 - 'average_duration' => '41.249' > [4] kmc_exe - 'average_duration' => '44.049' > > The cumulative time for 10 runs for all the differently compiled > executables: > > [5] kmc_original - 'time_taken' => '414 wallclock secs ( 0.26 usr 0.76 > sys + 347.94 cusr 61.00 csys = 409.96 CPU) @ 0.02/s (n=10)' > [6] kmc_native - 'time_taken' => '412 wallclock secs ( 0.15 usr 0.85 sys > + 345.37 cusr 61.17 csys = 407.54 CPU) @ 0.02/s (n=10)' > [7] kmc_js21 - 'time_taken' => '423 wallclock secs ( 0.11 usr 0.82 sys + > 355.10 cusr 61.95 csys = 417.98 CPU) @ 0.02/s (n=10)' > [8] kmc_exe - 'time_taken' => '434 wallclock secs ( 0.06 usr 0.78 sys + > 368.63 cusr 60.14 csys = 429.61 CPU) @ 0.02/s (n=10) > > *Note:* More detailed results at the end of the email. > > From what I can see, [1] generally runs faster than [2] and [3], albeit, > only 1 or 2%. > > Looking at the cumulative times for a set of 10 runs, the difference > between implementations is still small and in this case the native > implementation was actually faster. The machine I was doing the benchmark > on wasn't fully dedicated to the benchmark. > So there will be slight variation. I would still like to run the benchmark > for 100 runs for both methods. I would like to try this over night, just > for the 2 main kmc implementations: [1] and [2]. > But I'm looking at 23 hours, so essentially one whole day really. I'm not > sure I'll do it tonight. Perhaps from Friday to Saturday. > > Anywa, I understand that this performance increase might mean a lot to > you, but our group here at the Sanger and Debian can definitely live with > the native implementation. > > Since the author of asmlib is taking a while to reply, our suggestion > would be to package KMC in one of two ways: > > 1- an implementation on your side which allows KMC to be built without > using asmlib; (preferred); > > 2- using the compilation that does not use asmlib at all (which can be > done on my side, as a code patch, at package creation time). > > This would make the packaging job slightly easier and faster and would > allow me to reach my goal. Packaging the virus assembler written by my > colleague here at the Sanger. > > If/when Agner (author of KMC) replies we can always package asmlib and > state it as a dependency for the KMC package. > > > Detailed Results: > > The benchmarking results were done through a perl script using 2 perl > Modules: > > [9] Time::HiRes => High resolution alarm, sleep, gettimeofday, interval > timers > [10] Benchmark => benchmark running times of Perl code > > The options used for the KMC runs are the same as the ones used for the > Virus Assembler (IVA) runs: > > kmc -k100 -m4 -ci10 -cs100000000 -fq foo.fastq kmc.res bar/ > > Results with [9] - > > Average time in seconds for 10 runs logged under the 'average_duration' > attribute. > > 'commandline_parameters' => { > 'module' => 'time_hires', > 'number_of_runs' => 10, > 'file_type' => 'fq', > 'fastaq_filename' => > '12950_1#10_1.fastq' > }, > 'kmc_exe' => { > 'cmd' => '../kmc_exe/kmc -k100 -m4 -ci10 > -cs100000000 -fq 12950_1#10_1.fastq ke_out.res > /home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/', > 'output_filename' => 'ke_out.res', > 'analysis_dir' => > '/home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/', > 'kmc_exe' => '../kmc_exe/kmc', > 'duration' => '46.303', > 'average_duration' => '44.049' > }, > 'kmc_native' => { > 'analysis_dir' => > '/home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/', > 'output_filename' => 'kn_out.res', > 'average_duration' => '41.675', > 'kmc_exe' => '../kmc_native/kmc', > 'duration' => '41.688', > 'cmd' => '../kmc_native/kmc -k100 -m4 > -ci10 -cs100000000 -fq 12950_1#10_1.fastq kn_out.res > /home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/' > }, > 'kmc_js21' => { > 'cmd' => '../kmc_js21/kmc -k100 -m4 -ci10 > -cs100000000 -fq 12950_1#10_1.fastq kj_out.res > /home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/', > 'output_filename' => 'kj_out.res', > 'analysis_dir' => > '/home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/', > 'kmc_exe' => '../kmc_js21/kmc', > 'duration' => '40.714', > 'average_duration' => '41.249' > }, > 'kmc_original' => { > 'output_filename' => 'kk_out.res', > 'analysis_dir' => > '/home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/', > 'duration' => '42.670', > 'kmc_exe' => '../kmc_kmc/kmc', > 'average_duration' => '41.400', > 'cmd' => '../kmc_kmc/kmc -k100 -m4 -ci10 > -cs100000000 -fq 12950_1#10_1.fastq kk_out.res > /home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/' > }, > }; > > > The times are consistent with the times reported by the KMC instances when > they run on verbose mode. > > Results with [10] - > > Cumulative time in seconds logged under the 'time_taken' attribute. > > 'commandline_parameters' => { > 'number_of_runs' => 10, > 'fastaq_filename' => > '12950_1#10_1.fastq', > 'file_type' => 'fq', > 'module' => 'bench' > }, > 'kmc_original' => { > 'output_filename' => 'kk_out.res', > 'time_taken' => '414 wallclock secs ( 0.26 > usr 0.76 sys + 347.94 cusr 61.00 csys = 409.96 CPU) @ 0.02/s (n=10)', > 'kmc_exe' => '../kmc_kmc/kmc', > 'analysis_dir' => > '/home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/', > 'cmd' => '../kmc_kmc/kmc -k100 -m4 -ci10 > -cs100000000 -fq 12950_1#10_1.fastq kk_out.res > /home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/', > 'duration' => '' > }, > 'kmc_native' => { > 'duration' => '', > 'cmd' => '../kmc_native/kmc -k100 -m4 > -ci10 -cs100000000 -fq 12950_1#10_1.fastq kn_out.res > /home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/', > 'analysis_dir' => > '/home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/', > 'kmc_exe' => '../kmc_native/kmc', > 'time_taken' => '412 wallclock secs ( > 0.15 usr 0.85 sys + 345.37 cusr 61.17 csys = 407.54 CPU) @ 0.02/s (n=10)', > 'output_filename' => 'kn_out.res' > }, > 'kmc_js21' => { > 'kmc_exe' => '../kmc_js21/kmc', > 'time_taken' => '423 wallclock secs ( 0.11 usr > 0.82 sys + 355.10 cusr 61.95 csys = 417.98 CPU) @ 0.02/s (n=10)', > 'output_filename' => 'kj_out.res', > 'duration' => '', > 'cmd' => '../kmc_js21/kmc -k100 -m4 -ci10 > -cs100000000 -fq 12950_1#10_1.fastq kj_out.res > /home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/', > 'analysis_dir' => > '/home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/' > }, > 'kmc_exe' => { > 'duration' => '', > 'cmd' => '../kmc_exe/kmc -k100 -m4 -ci10 > -cs100000000 -fq 12950_1#10_1.fastq ke_out.res > /home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/', > 'analysis_dir' => > '/home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/', > 'kmc_exe' => '../kmc_exe/kmc', > 'time_taken' => '434 wallclock secs ( 0.06 usr > 0.78 sys + 368.63 cusr 60.14 csys = 429.61 CPU) @ 0.02/s (n=10)', > 'output_filename' => 'ke_out.res' > } > }; > > > Let me know what you think. > > Kind regards, > > Jorge > >> >

