Hi all,

 

I managed to improve codec2 performance even 10% more on ARM NEON. I
replaced some math functions with those from math-neon project (my libc
version is 2.13). So overall ARM speedup becomes 25% in my case.

 

Here are the oprofile reports on Exynos 4.

 

Vanilla:

samples  %        linenr info                 image name
symbol name

16436    45.8453  kiss_fft.c:246              libcodec2.so.0.0.0
kf_work

3089      8.6162  s_floor.c:44                libm-2.13.so
floorl

2166      6.0417  nlp.c:209                   libcodec2.so.0.0.0       nlp

1760      4.9092  e_atan2.c:80                libm-2.13.so
__ieee754_atan2

1741      4.8562  fft.c:84                    libcodec2.so.0.0.0       fft

1306      3.6429  s_sin.c:353                 libm-2.13.so             cosl

956       2.6666  (no location information)   no-vmlinux
/no-vmlinux

877       2.4462  sine.c:288                  libcodec2.so.0.0.0
hs_pitch_refinement

691       1.9274  lsp.c:143                   libcodec2.so.0.0.0
lpc_to_lsp

682       1.9023  lpc.c:75                    libcodec2.so.0.0.0
autocorrelate

657       1.8326  phase.c:61                  libcodec2.so.0.0.0
aks_to_H

626       1.7461  quantise.c:479              libcodec2.so.0.0.0
aks_to_M2

624       1.7405  s_sin.c:90                  libm-2.13.so             sinl

449       1.2524  sine.c:395                  libcodec2.so.0.0.0
est_voicing_mbe

326       0.9093  e_log.c:69                  libm-2.13.so
__ieee754_log

322       0.8982  sine.c:564                  libcodec2.so.0.0.0
synthesise

276       0.7699  sine.c:351                  libcodec2.so.0.0.0
estimate_amplitudes

263       0.7336  random.c:293                libc-2.13.so
random

228       0.6360  sine.c:207                  libcodec2.so.0.0.0
dft_speech

 

math-neon:

samples  %        linenr info                 image name
symbol name

3369     49.2976  kiss_fft.c:246              libcodec2.so.0.0.0
kf_work

438       6.4091  nlp.c:209                   libcodec2.so.0.0.0       nlp

413       6.0433  sine.c:288                  libcodec2.so.0.0.0
hs_pitch_refinement

347       5.0776  fft.c:84                    libcodec2.so.0.0.0       fft

339       4.9605  math_floorf.c:39            libmath_neon.so.0.0.0
floorf_neon_hfp

227       3.3216  (no location information)   no-vmlinux
/no-vmlinux

146       2.1364  lpc.c:78                    libcodec2.so.0.0.0
autocorrelate

140       2.0486  s_sin.c:353                 libm-2.13.so             cosl

133       1.9462  math_floorf.c:54            libmath_neon.so.0.0.0
floorf_neon_sfp

132       1.9315  lsp.c:143                   libcodec2.so.0.0.0
lpc_to_lsp

131       1.9169  quantise.c:479              libcodec2.so.0.0.0
aks_to_M2

121       1.7706  math_sinf.c:73              libmath_neon.so.0.0.0
sinf_neon_hfp

98        1.4340  e_log.c:69                  libm-2.13.so
__ieee754_log

81        1.1853  math_atan2f.c:96            libmath_neon.so.0.0.0
atan2f_neon_hfp

78        1.1414  phase.c:61                  libcodec2.so.0.0.0
aks_to_H

62        0.9072  sine.c:564                  libcodec2.so.0.0.0
synthesise

58        0.8487  phase.c:200                 libcodec2.so.0.0.0
phase_synth_zero_order

43        0.6292  sine.c:206                  libcodec2.so.0.0.0
dft_speech

41        0.5999  random.c:293                libc-2.13.so
random

 

math-neon+libavcodec FFT:

samples  %        linenr info                 image name
symbol name

665      36.1610  (no location information)   libavcodec.so.53.7.0
/usr/lib/libavcodec.so.53.7.0

225      12.2349  (no location information)   no-vmlinux
/no-vmlinux

131       7.1234  sine.c:288                  libcodec2.so.0.0.0
hs_pitch_refinement

127       6.9059  nlp.c:209                   libcodec2.so.0.0.0       nlp

103       5.6009  fft.c:183                   libcodec2.so.0.0.0       fft

85        4.6221  math_floorf.c:39            libmath_neon.so.0.0.0
floorf_neon_hfp

42        2.2838  lsp.c:143                   libcodec2.so.0.0.0
lpc_to_lsp

42        2.2838  s_sin.c:353                 libm-2.13.so             cosl

41        2.2295  math_floorf.c:54            libmath_neon.so.0.0.0
floorf_neon_sfp

39        2.1207  quantise.c:479              libcodec2.so.0.0.0
aks_to_M2

39        2.1207  lpc.c:75                    libcodec2.so.0.0.0
autocorrelate

34        1.8488  math_sinf.c:73              libmath_neon.so.0.0.0
sinf_neon_hfp

22        1.1963  e_log.c:69                  libm-2.13.so
__ieee754_log

22        1.1963  math_atan2f.c:96            libmath_neon.so.0.0.0
atan2f_neon_hfp

18        0.9788  phase.c:200                 libcodec2.so.0.0.0
phase_synth_zero_order

17        0.9244  interp.c:0                  libc-2.13.so
memcpy

16        0.8700  sine.c:206                  libcodec2.so.0.0.0
dft_speech

16        0.8700  math_sinf.c:114             libmath_neon.so.0.0.0
sinf_neon_sfp

15        0.8157  sine.c:564                  libcodec2.so.0.0.0
synthesise

 

The github code was updated.

 

I wonder, what if one could profile speex and do the same math-neon trick:

 

Regards,

Vadim Markovtsev,

Engineer, Algorithmic Lab,

Moscow R&D center, Samsung Electronics

 

 

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to