> I can't understand the difference between  SOLA, PSOLA and WSOLA.

I'll attempt a partial answer:

I think PSOLA and WSOLA are clearly distinct.

PSOLA involves identifying a time varying pitch (fundamental frequency) track for the input, segmenting the input signal into (possibly overlapping) windowed grains which are synchronous to this fundamental frequency (e.g. grains that are centered on glottal pulses) and then altering the rate at which the grains are assembled in the output stream.

WSOLA involves breaking the signal into grains using some method (e.g. constant duration grains), then concatenating input grains to the output stream with relative phase adjusted according to two criteria: (1) on average, the input must be consumed at a rate that maintains the timescaling factor; (2) the source material should be mixed (with windowing) into the output stream in a way that minimizes local error over the crossfade region (i.e. to minimize phase cancellation) -- if the signal is strongly periodic, and the parameters are just right, this will fairly nicely keep the period of the source waveform, but it lacks sub-sample-accurate phase alignment I think. You can add enhancements such as trying to avoid mixing the same transient into the output stream more than once.

Not sure what SOLA is.

