Re: [asterisk-dev] Suggestion on Packet Loss Concealment Algorithm

Steve Underwood Fri, 19 May 2006 19:11:29 -0700

zuo bf wrote:

Hi Steve,
inline
On 5/19/06, *Steve Underwood* <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
    [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> wrote:

    > Hi,
    >
    > The theory is not based on the music, it's based on that given
    by the
    > ITU G.711 Appendix I (BTW: the music is converted to
    8K/mono/16bit by
    > CoolEdit).
    >
    What works well for music is very different from what works well for
voice.
yeah, but i don't think the difference is so big unless you give me avoice file to prove me wrong.And again the reason i prolong it based on theory given by G.711Appendix I, which is said to be
derived from experimentation of BELL.

Just because its derived from Bell, doesn't make it the word of God. Forexample, the pitch search only goes down to 66Hz. The F0 of my voice cango well below 50Hz, and the pitch is completely messed up. As I said,for music a far slower decay characteristic works a lot better. Also,windowing before the AMDF will give better temporal localisation of thepitch estimate. This is pretty much a waste of time for voice, but ithelps stabilise the pitch for music reducing the watery quality of thesynthetic sound on higher pitches. All good modern codecs do some formof fractional pitch search to reduce wateriness in female (i.e. highpitched) voices. This PLC algorithm does everything in whole samples. Isuspect the fractional pitch approach would noticeably help quality, butat substantial computational expense.

G.711 Appendix 1 and my code fade to silence over 50ms. For music

    much greater sustain to fill in the gaps works much better. With
    speech,
that badly affects intelligibility.
I didn't change this, BTW, G.711 Appendix I fade to silence over 60msbecause it doesn'tfade for the first erasure but you did and i think as you can't knowthe wave are going torise or down you'd better keep the same level for the first erasure.

Ah, I forgot about this. Its something that isn't very sane in Appendix1, and I never went back to experiment with. Several areas of Appendix 1are very much oriented to 10ms packets. In the real world hardly anyoneuses 10ms packets. I suspect the decay rate should be different for 20msor 30ms packets, and that requires investigation.

////////////////////////////////////////////////////////////////////////////////////////////////
G.711 Appendix I
I.2.4 Synthetic signal generation for first 10 ms
For the first 10 ms of the erasure, the best results are obtained bygenerating the synthesized signal
from the last pitch period with no attenuation.
/////////////////////////////////////////////////////////////////////////////////////////////////////

    I used the Appendix 1 approach
    without experimenting. I suspect something other than linear
    attenuation
    would behave better.
By experimentation, i think as long as the algorithm aimed at GenericLinear concealment,probably you cann't find one much better than this, unless you analysesome voice parameters from
previous samples.

Actually, there are rather better concealment algorithms, but theyrequire greater amounts of computation. Try a Google search. Severalpeople have reported results using LPC analysis and synthesis which seembetter, especially for longer erasures.

    > And the current plc algorithm is similar to the G.711 Appendix I
    except:
    > 1. The pitch detection algorithm : G.711 Appendix I uses cross
    > correlation, but Asterisk uses AMDF which is simpler and also
    performs
    > well
    >
    Correct.

    > 2. The OLA window: G.711 update the OLA window length when burst
    loss
    > occurs, but Asterisk didn't
    >
    Wrong. They both use the same OLA strategy - 1/4 pitch period overlap.
G.711 will prolong the OLA window by 4ms until it reached 10ms, butthe Asterisk one doesn't?
////////////////////////////////////////////////////////////////////////////////////////////////
G.711 Appendix I
I.2.7 First good frame after an erasure
At the first good frame after an erasure, a smooth transition isneeded between the synthesizederasure speech and the real signal. To do this, the synthesized speechfrom the pitch buffer iscontinued beyond the end of the erasure, and then mixed with the realsignal using an OLA. Thelength of the OLA depends on both the pitch period and the length ofthe erasure. For short, 10 mserasures, a 1/4 wavelength window is used. For longer erasures thewindow is increased by 4 ms per
10 ms of erasure, up to a maximum of the frame size, 10 ms.
////////////////////////////////////////////////////////////////////////////////////////////////

    > 3. The nearby field of the first erasure: G.711 delays the
    output for
    > 3.75 ms to compensate the probable loss, but Asterisk just use the
    > symmetrical
    >
    > part before the lost to do the OLA. The one G.711 Appendix I
    utilized
    > should be better, but it's not very important as human being's ears
    > are really anti-jamming.
    >
    That 3.75ms delay is so the Appendix 1 algorithm can do a 1/4 pitch
    period of OLA when erasure commences. However, it incurs lots of
    buffer
    copying when there are no lost packets. What my code does is time
    reverse the last 1/4 pitch period and OLA with that. It sounds nasty,
    but listening tests with speech showed it was very close to the
    sound of
    the G.711 appendix 1 algorithm, and improves efficiency a lot in the
    common case - no packets being lost.
Yeah, the result are similar, but the difference is just 3.75 msdelay, i didn't seemore buffer copying than necessary, both algorithm save the samehistory (although G.711 keeps
a longer one and delay for 3.75ms)
BTW: packet loss is very common at least in China, and the burst losscan last very long.For example, as the bandwith between the two major carriers are verylow, two user from eachwill experience packet loss very often if they use the public internetnot some softswitch network.

There is a lot more copying in the Appendix 1 algorithm. It not onlysaves a copy of the audio. It has to rearrange the output buffer to bedelayed by 30 samples. When there are no erasures the difference incompute requirements is substantial. Enough to make me rework thealgorithm to optimise the common case. If you don't think no erasures isthe common case you have real problems. :-)

In Southern China my experience has been of very very low packet loss,and the full bandwidth of ADSL connections being available most of thetime. International comms can be more congested, but there is a lot oflocal overcapacity. I don't know much about Northern China.

    > I prolong the pitch period to a maximum of 3 pitch period, but
    > Asterisk only uses one which
    >
    > saves memory but behave bad at burst loss.
    >
    For ptolonged erasures G.711 Appendix 1 and my code act in exactly the
    same way. They linearly attenuate to zero over the first 50ms. In
    that
    period they repeat the last 1.25 pitch periods of real speech, with a
    quarter pitch period of overlap. When real speech restarts they
    both do
    a 1/4 pitch period of OLA, based on the last known pitch. The
    algorithms
    are identical beyond the initial 1/4 pitch period of OLA. Why would
    anyone want to save memory here? It only uses a small amount. The
    algorithmic changes were to reduce the buffer manipulation in the
    common
case.
> 4. whether prolong the pitch period during burst loss: G.711 Appendix

Not the same.

////////////////////////////////////////////////////////////////////////////////////////////////
G.711 Appendix I
I.2.5 Synthetic signal generation after 10 ms
If the next frame is also erased, the erasure will be at least 20 mslong and further action is required.While repeating a single pitch period works well for short erasures(e.g. 10 ms), on long erasures itintroduces unnatural harmonic artifacts (beeps). This is especiallynoticeable if the erasure lands inan unvoiced region of speech, or in a region of rapid transition suchas a stop. It was discovered byexperimentation that these artifacts are significantly reduced byincreasing the number of pitchperiods used to synthesize the signal as the erasure progresses.Playing more pitch periods increasesthe variation in the signal. Although the pitch periods are not playedin the order they occurred in theoriginal signal, the resulting output still sounds natural. At 10 msinto the erasure the number of pitchperiods used to synthesize the speech is increased to two, and at 20ms a third pitch period is added.For erasures longer than 20 ms no additional modifications to thepitch buffer are made.
////////////////////////////////////////////////////////////////////////////////////////////////

Actually, people complain Appendix 1 PLC implementations also beep.You'll find that improvements in that area are one of the main claimsfor the LPC based PLC algorithms. I'd have to go back and check on this.Its a while since I wrote the code. If I diverged from the Appendix 1algorithm I must have done so for a good reason, like it simplifiedsomething without noticeable impact on qualtity.

I think the documentation for my PLC code is missing from the Asterisk

No, it's available  in plc.h under  asterisk/include. :)

    source code, but you can find it at
    http://www.soft-switch.org/spandsp-doc/plc_page.html

    Regards,
    Steve

As I said before you really have to try voice, and not music. It makes ahuge difference. If you try a continuous tone the PLC algorithm behavesterribly, but that's another case nobody cares about. :-)


Regards,
Steve

_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Suggestion on Packet Loss Concealment Algorithm

Reply via email to