Hello Vadim, That sounds like fine work for improving efficiency on ARM. Pls feel free to send me a patch, perhaps including a patch for README that describes the ./configure options.
I am not very good at ./configure (autoconf/configure.in to autogen.sh/configure.ac) but I am happy to follow your lead. Thanks, David On Wed, 2012-11-14 at 19:28 +0400, Markovtsev Vadim wrote: > Hi all, > > > > I managed to improve codec2 performance even 10% more on ARM NEON. I > replaced some math functions with those from math-neon project (my > libc version is 2.13). So overall ARM speedup becomes 25% in my case. > > > > Here are the oprofile reports on Exynos 4. > > > > Vanilla: > > samples % linenr info image name > symbol name > > 16436 45.8453 kiss_fft.c:246 libcodec2.so.0.0.0 > kf_work > > 3089 8.6162 s_floor.c:44 libm-2.13.so > floorl > > 2166 6.0417 nlp.c:209 libcodec2.so.0.0.0 > nlp > > 1760 4.9092 e_atan2.c:80 libm-2.13.so > __ieee754_atan2 > > 1741 4.8562 fft.c:84 libcodec2.so.0.0.0 > fft > > 1306 3.6429 s_sin.c:353 libm-2.13.so > cosl > > 956 2.6666 (no location information) > no-vmlinux /no-vmlinux > > 877 2.4462 sine.c:288 libcodec2.so.0.0.0 > hs_pitch_refinement > > 691 1.9274 lsp.c:143 libcodec2.so.0.0.0 > lpc_to_lsp > > 682 1.9023 lpc.c:75 libcodec2.so.0.0.0 > autocorrelate > > 657 1.8326 phase.c:61 libcodec2.so.0.0.0 > aks_to_H > > 626 1.7461 quantise.c:479 libcodec2.so.0.0.0 > aks_to_M2 > > 624 1.7405 s_sin.c:90 libm-2.13.so > sinl > > 449 1.2524 sine.c:395 libcodec2.so.0.0.0 > est_voicing_mbe > > 326 0.9093 e_log.c:69 libm-2.13.so > __ieee754_log > > 322 0.8982 sine.c:564 libcodec2.so.0.0.0 > synthesise > > 276 0.7699 sine.c:351 libcodec2.so.0.0.0 > estimate_amplitudes > > 263 0.7336 random.c:293 libc-2.13.so > random > > 228 0.6360 sine.c:207 libcodec2.so.0.0.0 > dft_speech > > > > math-neon: > > samples % linenr info image name > symbol name > > 3369 49.2976 kiss_fft.c:246 libcodec2.so.0.0.0 > kf_work > > 438 6.4091 nlp.c:209 libcodec2.so.0.0.0 > nlp > > 413 6.0433 sine.c:288 libcodec2.so.0.0.0 > hs_pitch_refinement > > 347 5.0776 fft.c:84 libcodec2.so.0.0.0 > fft > > 339 4.9605 math_floorf.c:39 libmath_neon.so.0.0.0 > floorf_neon_hfp > > 227 3.3216 (no location information) > no-vmlinux /no-vmlinux > > 146 2.1364 lpc.c:78 libcodec2.so.0.0.0 > autocorrelate > > 140 2.0486 s_sin.c:353 libm-2.13.so > cosl > > 133 1.9462 math_floorf.c:54 libmath_neon.so.0.0.0 > floorf_neon_sfp > > 132 1.9315 lsp.c:143 libcodec2.so.0.0.0 > lpc_to_lsp > > 131 1.9169 quantise.c:479 libcodec2.so.0.0.0 > aks_to_M2 > > 121 1.7706 math_sinf.c:73 libmath_neon.so.0.0.0 > sinf_neon_hfp > > 98 1.4340 e_log.c:69 libm-2.13.so > __ieee754_log > > 81 1.1853 math_atan2f.c:96 libmath_neon.so.0.0.0 > atan2f_neon_hfp > > 78 1.1414 phase.c:61 libcodec2.so.0.0.0 > aks_to_H > > 62 0.9072 sine.c:564 libcodec2.so.0.0.0 > synthesise > > 58 0.8487 phase.c:200 libcodec2.so.0.0.0 > phase_synth_zero_order > > 43 0.6292 sine.c:206 libcodec2.so.0.0.0 > dft_speech > > 41 0.5999 random.c:293 libc-2.13.so > random > > > > math-neon+libavcodec FFT: > > samples % linenr info image name > symbol name > > 665 36.1610 (no location information) > libavcodec.so.53.7.0 /usr/lib/libavcodec.so.53.7.0 > > 225 12.2349 (no location information) > no-vmlinux /no-vmlinux > > 131 7.1234 sine.c:288 libcodec2.so.0.0.0 > hs_pitch_refinement > > 127 6.9059 nlp.c:209 libcodec2.so.0.0.0 > nlp > > 103 5.6009 fft.c:183 libcodec2.so.0.0.0 > fft > > 85 4.6221 math_floorf.c:39 libmath_neon.so.0.0.0 > floorf_neon_hfp > > 42 2.2838 lsp.c:143 libcodec2.so.0.0.0 > lpc_to_lsp > > 42 2.2838 s_sin.c:353 libm-2.13.so > cosl > > 41 2.2295 math_floorf.c:54 libmath_neon.so.0.0.0 > floorf_neon_sfp > > 39 2.1207 quantise.c:479 libcodec2.so.0.0.0 > aks_to_M2 > > 39 2.1207 lpc.c:75 libcodec2.so.0.0.0 > autocorrelate > > 34 1.8488 math_sinf.c:73 libmath_neon.so.0.0.0 > sinf_neon_hfp > > 22 1.1963 e_log.c:69 libm-2.13.so > __ieee754_log > > 22 1.1963 math_atan2f.c:96 libmath_neon.so.0.0.0 > atan2f_neon_hfp > > 18 0.9788 phase.c:200 libcodec2.so.0.0.0 > phase_synth_zero_order > > 17 0.9244 interp.c:0 libc-2.13.so > memcpy > > 16 0.8700 sine.c:206 libcodec2.so.0.0.0 > dft_speech > > 16 0.8700 math_sinf.c:114 libmath_neon.so.0.0.0 > sinf_neon_sfp > > 15 0.8157 sine.c:564 libcodec2.so.0.0.0 > synthesise > > > > The github code was updated. > > > > I wonder, what if one could profile speex and do the same math-neon > trick⦠> > > > Regards, > > Vadim Markovtsev, > > Engineer, Algorithmic Lab, > > Moscow R&D center, Samsung Electronics > > > > > > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > _______________________________________________ Freetel-codec2 mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/freetel-codec2 ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ Freetel-codec2 mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/freetel-codec2
