Hi all,

KMC packaging is done.

http://anonscm.debian.org/cgit/debian-med/kmc.git/

Steffen, would you be able to sponsor this package?

Regards,

Jorge

On Mon, Nov 17, 2014 at 8:50 AM, Jorge Sebastião Soares <
[email protected]> wrote:

> Hi Marek,
>
> Sorry for the time it took for me to get back to you.
> I am at the point that I have the kmc package ready to be submitted.
> Just doing final tidy ups and automating man page creation for the two
> binaries (kmc and kmc_dump).
> To do this I use help2man which has been proving great for turning any
> kind of usage into man pages.
> I should be done with this by lunch time today.
>
> At the moment, for the creaiton of the kmc debian package I'm using my
> github repository.
> I hope that's ok for you for the moment.
> Once you guys are done with writing the new version of kmc we can point
> the package to look at your own VCS repository.
> Let me know if this ok for now.
>
> On Wed, Nov 12, 2014 at 6:14 PM, Marek Kokot <[email protected]> wrote:
>
>>  Hello Jorge,
>>
>>  Hope all is well.
>>
>> Its fine, thanks. How are you? ;-)
>>
>>
> Great!
> I'm very well as well. :)
>
>
>>  Thank you for your tests. I have made a couple of tests on my own and
>> the results was pretty similar to your. As I said it was only couple of
>> tests but on reative big data (human genome about 100 GB gzipped).
>>
>
> Cool!
>
>
>>
>>  My question is: Have you used HDD or SSD? I have used HDD, for SSD it
>> is possible that differences in time may be bigger. I will try to test it
>> for SSD.
>>
>
> I was running it on HDD.
> Did you run it on SSD already? Did you get any significant improvement?
>
>
>>
>>  I spoke with Sebastian today and we agree that it is a good option to
>> make compilation without asmlib possible. I will deal with this and I let
>> you know when it is ready.
>>
>
> Awesome!
> I'm so glad to hear this.
> This will also mean that KMC will be able to be built for other chips
> other than the intel ones.
> This feature will be available only with the new verison of KMC?
>
>
>>
>>  KMC in current version has also "boost" dependency, but we think to go
>> native, because g++ supports c++11 threads. Other functionalities from
>> "boost" that we use are easy to replace with c++11. It should make
>> compilation time shorter for the ones who hasn't "boost" installed.
>>
>
> I see. It does take a while ot compile, but dependency on Boost is
> definitely not an issue from where I'm sitting.
> The package builds properly with the Debian boost, zlib and libbz2
> dependencies.
> I do appreciate that it does take a while to compile.
>
>
>>
>>  In one of earlier e-mails you asked about Agner (asmlib author):
>>
>> Do you think it would make sense for me to approach him through the
>> Debian Med team?
>>
>> We would prefer to deal with it on our own.
>>
>> No problem at all.
>
>
>
>>  If there is something I ommited and I didn't answer to any of your
>> questions please let me know.
>>
>
> I think you answered everything. I do appologise for stressing you guys
> out with these requests, especially at a time when you're concentrating on
> writing a new release.
>
> Kind regards,
>
> Jorge Soares
>
>
> ------------------------------
> *Od:* Jorge Sebastião Soares [[email protected]]
> *Wysłano:* 11 listopada 2014 18:00
> *Do:* Marek Kokot; Sebastian Deorowicz
> *DW:* Debian Med Project List
> *Temat:* [KMC + asmlib] KMC Debian package progress
>
>   Hi guys,
>
>  I had to send this again as I used my Sanger email, but that is not
> subscribed to the Debian Med Mailing list.
>
>  So here it goes again (if you can ignore the previous one and respond to
> this one).
>
> Hope all is well.
> Have you given any thought to my proposal of a compile time option that
> won't use asmlib?
>
> I have included the Debian Med team on this email as they are aware of the
> packaging of KMC and the whole issue with asmlib.
>
> I have been doing some benchmarking on KMC for the past couple of days.
> I have compiled KMC in three ways:
>
> kmc_original - kmc code compiled against the version of asmlib distributed
> with KMC- alibelf64.a
> kmc_native - kmc code compiled against the native OS libraries
> kmc_js21 - kmc code compiled against the new version of asmlib, compiled
> on my machine with my Unix makefile - libaelf64.a
>
> I have also used the executables provided in your website in the benchmark.
>
> kmc_exe
>
> The machine I used for this is a Debian Virtual Machine running on Vagrant.
>
> Here are the architecture details:
>
> vagrant@debian:~$ cat /proc/cpuinfo
> processor    : 0
> vendor_id    : GenuineIntel
> cpu family    : 6
> model        : 23
> model name    : Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz
> stepping    : 10
> microcode    : 0x60b
> cpu MHz        : 1426.514
> cache size    : 6144 KB
> physical id    : 0
> siblings    : 1
> core id        : 0
> cpu cores    : 1
> apicid        : 0
> initial apicid    : 0
> fpu        : yes
> fpu_exception    : yes
> cpuid level    : 5
> wp        : yes
> flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc
> rep_good nopl pni monitor ssse3 lahf_lm
> bogomips    : 2853.02
> clflush size    : 64
> cache_alignment    : 64
> address sizes    : 36 bits physical, 48 bits virtual
>
> To do the benchmark I used a fastq file that has a fair bit of
> contamination (many different kmers). The file is about 227M in size.
>
> Here are some of the results:
>
> For an average time in seconds over 10 runs for all the differently
> compiled executables:
>
> [1] kmc_original - 'average_duration' => '41.400'
> [2] kmc_native - 'average_duration' => '41.675'
> [3] kmc_js21 - 'average_duration' => '41.249'
> [4] kmc_exe - 'average_duration' => '44.049'
>
> The cumulative time for 10 runs for all the differently compiled
> executables:
>
> [5] kmc_original - 'time_taken' => '414 wallclock secs ( 0.26 usr  0.76
> sys + 347.94 cusr 61.00 csys = 409.96 CPU) @  0.02/s (n=10)'
> [6] kmc_native - 'time_taken' => '412 wallclock secs ( 0.15 usr  0.85 sys
> + 345.37 cusr 61.17 csys = 407.54 CPU) @  0.02/s (n=10)'
> [7] kmc_js21 - 'time_taken' => '423 wallclock secs ( 0.11 usr  0.82 sys +
> 355.10 cusr 61.95 csys = 417.98 CPU) @  0.02/s (n=10)'
> [8] kmc_exe - 'time_taken' => '434 wallclock secs ( 0.06 usr  0.78 sys +
> 368.63 cusr 60.14 csys = 429.61 CPU) @  0.02/s (n=10)
>
> *Note:* More detailed results at the end of the email.
>
> From what I can see, [1] generally runs faster than [2] and [3], albeit,
> only 1 or 2%.
>
> Looking at the cumulative times for a set of 10 runs, the difference
> between implementations is still small and in this case the native
> implementation was actually faster. The machine I was doing the benchmark
> on wasn't fully dedicated to the benchmark.
> So there will be slight variation. I would still like to run the benchmark
> for 100 runs for both methods. I would like to try this over night, just
> for the 2 main kmc implementations: [1] and [2].
> But I'm looking at 23 hours, so essentially one whole day really. I'm not
> sure I'll do it tonight. Perhaps from Friday to Saturday.
>
> Anywa, I understand that this performance increase might mean a lot to
> you, but our group here at the Sanger and Debian can definitely live with
> the native implementation.
>
> Since the author of asmlib is taking a while to reply, our suggestion
> would be to package KMC in one of two ways:
>
> 1- an implementation on your side which allows KMC to be built without
> using asmlib; (preferred);
>
> 2- using the compilation that does not use asmlib at all (which can be
> done on my side, as a code patch, at package creation time).
>
> This would make the packaging job slightly easier and faster and would
> allow me to reach my goal. Packaging the virus assembler written by my
> colleague here at the Sanger.
>
> If/when Agner (author of KMC) replies we can always package asmlib and
> state it as a dependency for the KMC package.
>
>
> Detailed Results:
>
> The benchmarking results were done through a perl script using 2 perl
> Modules:
>
> [9] Time::HiRes => High resolution alarm, sleep, gettimeofday, interval
> timers
> [10] Benchmark => benchmark running times of Perl code
>
> The options used for the KMC runs are the same as the ones used for the
> Virus Assembler (IVA) runs:
>
> kmc -k100 -m4 -ci10 -cs100000000 -fq foo.fastq kmc.res bar/
>
> Results with [9] -
>
> Average time in seconds for 10 runs logged under the 'average_duration'
> attribute.
>
>           'commandline_parameters' => {
>                                         'module' => 'time_hires',
>                                         'number_of_runs' => 10,
>                                         'file_type' => 'fq',
>                                         'fastaq_filename' =>
> '12950_1#10_1.fastq'
>                                       },
>          'kmc_exe' => {
>                          'cmd' => '../kmc_exe/kmc -k100 -m4 -ci10
> -cs100000000 -fq 12950_1#10_1.fastq ke_out.res
> /home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/',
>                          'output_filename' => 'ke_out.res',
>                          'analysis_dir' =>
> '/home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/',
>                          'kmc_exe' => '../kmc_exe/kmc',
>                          'duration' => '46.303',
>                          'average_duration' => '44.049'
>                        },
>           'kmc_native' => {
>                                     'analysis_dir' =>
> '/home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/',
>                                     'output_filename' => 'kn_out.res',
>                                     'average_duration' => '41.675',
>                                     'kmc_exe' => '../kmc_native/kmc',
>                                     'duration' => '41.688',
>                                     'cmd' => '../kmc_native/kmc -k100 -m4
> -ci10 -cs100000000 -fq 12950_1#10_1.fastq kn_out.res
> /home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/'
>                                   },
>           'kmc_js21' => {
>                           'cmd' => '../kmc_js21/kmc -k100 -m4 -ci10
> -cs100000000 -fq 12950_1#10_1.fastq kj_out.res
> /home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/',
>                           'output_filename' => 'kj_out.res',
>                           'analysis_dir' =>
> '/home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/',
>                           'kmc_exe' => '../kmc_js21/kmc',
>                           'duration' => '40.714',
>                           'average_duration' => '41.249'
>                         },
>           'kmc_original' => {
>                               'output_filename' => 'kk_out.res',
>                               'analysis_dir' =>
> '/home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/',
>                               'duration' => '42.670',
>                               'kmc_exe' => '../kmc_kmc/kmc',
>                               'average_duration' => '41.400',
>                               'cmd' => '../kmc_kmc/kmc -k100 -m4 -ci10
> -cs100000000 -fq 12950_1#10_1.fastq kk_out.res
> /home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/'
>                             },
>         };
>
>
> The times are consistent with the times reported by the KMC instances when
> they run on verbose mode.
>
> Results with [10] -
>
> Cumulative time in seconds logged under the 'time_taken' attribute.
>
>           'commandline_parameters' => {
>                                         'number_of_runs' => 10,
>                                         'fastaq_filename' =>
> '12950_1#10_1.fastq',
>                                         'file_type' => 'fq',
>                                         'module' => 'bench'
>                                       },
>             'kmc_original' => {
>                               'output_filename' => 'kk_out.res',
>                               'time_taken' => '414 wallclock secs ( 0.26
> usr  0.76 sys + 347.94 cusr 61.00 csys = 409.96 CPU) @  0.02/s (n=10)',
>                               'kmc_exe' => '../kmc_kmc/kmc',
>                               'analysis_dir' =>
> '/home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/',
>                               'cmd' => '../kmc_kmc/kmc -k100 -m4 -ci10
> -cs100000000 -fq 12950_1#10_1.fastq kk_out.res
> /home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/',
>                               'duration' => ''
>                             },
>           'kmc_native' => {
>                                     'duration' => '',
>                                     'cmd' => '../kmc_native/kmc -k100 -m4
> -ci10 -cs100000000 -fq 12950_1#10_1.fastq kn_out.res
> /home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/',
>                                     'analysis_dir' =>
> '/home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/',
>                                     'kmc_exe' => '../kmc_native/kmc',
>                                     'time_taken' => '412 wallclock secs (
> 0.15 usr  0.85 sys + 345.37 cusr 61.17 csys = 407.54 CPU) @  0.02/s (n=10)',
>                                     'output_filename' => 'kn_out.res'
>                                   },
>           'kmc_js21' => {
>                           'kmc_exe' => '../kmc_js21/kmc',
>                           'time_taken' => '423 wallclock secs ( 0.11 usr
> 0.82 sys + 355.10 cusr 61.95 csys = 417.98 CPU) @  0.02/s (n=10)',
>                           'output_filename' => 'kj_out.res',
>                           'duration' => '',
>                           'cmd' => '../kmc_js21/kmc -k100 -m4 -ci10
> -cs100000000 -fq 12950_1#10_1.fastq kj_out.res
> /home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/',
>                           'analysis_dir' =>
> '/home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/'
>                         },
>           'kmc_exe' => {
>                          'duration' => '',
>                          'cmd' => '../kmc_exe/kmc -k100 -m4 -ci10
> -cs100000000 -fq 12950_1#10_1.fastq ke_out.res
> /home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/',
>                          'analysis_dir' =>
> '/home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/',
>                          'kmc_exe' => '../kmc_exe/kmc',
>                          'time_taken' => '434 wallclock secs ( 0.06 usr
> 0.78 sys + 368.63 cusr 60.14 csys = 429.61 CPU) @  0.02/s (n=10)',
>                          'output_filename' => 'ke_out.res'
>                        }
>         };
>
>
> Let me know what you think.
>
> Kind regards,
>
> Jorge
>
>>
>

Reply via email to