[email protected] (Niels Möller) writes:
> Next steps:
>
> 1. Do the msb/lsb reassembly.
>
> 2. Figure out how to actually generate any output to the codec user, and
> d any needed channel permutation.
Now I've done 2 too. A couple of problems.
I'm still testing with the file "Master Audio 5.0 96khz.dts", which has
residual_encode = 0.
* Audio level is really low. When I parse the rice sample values from
the file, I get input samples with a maximum around 1000. Not sure how
the inverse prediction should affect the magnitude (but see below).
And there are almost no lsb bits, only a few of the frames have any
lsb part at all, and then only a single bit per value. And these are
supposed to be 24-bit values, so if, e.g., converted to 16-bit wav
file, all the samples are rounded to 0 or maybe -1.
* The core data is 48kHz, 512 samples per frame, while the xll data is
96kHz, 1024 samples per frame. Since residual_encode = 0, I'm not
entirely sure what I'm supposed to do with the core data. Should it be
ignored by encoders which understand xll? For now, I added some code
to check for sample rate mismatch, and then set avctx->sample_rate,
avctx->channels and frame->nb_samples based on what's in the xll
stream.
* The code in dca_parser is not aware of xll, so it claims the stream is
48 kHz, not 96 kHz. So when I let avconv convert the stream to wav
format, I get a 48 kHz file, and I think avconv down-samples the data
I generate.
* The inverse linear prediction. I use this code
/* Apply predictor. */
/* NOTE: Processing samples in this order means that the
predictor is applied to the newly reconstructed samples. */
for (i = order; i < nsamples; i++) {
/* FIXME: How large accumulator do we really need? */
int64_t s;
unsigned j;
for (j = s = 0; j < order; j++)
s += c[j] * samples[i-1-j];
/* NOTE: Equations seem to imply addition, while the
* pseudocode seems to use subtraction.*/
samples[i] -= av_clip ((s + 0x8000) >> 16, -0x1000000, 0xffffff);
}
where c are the coefficients, and samples is the vector of (msbs of)
samples. A difference to the pseudocode is that I don't reverse the
coefficients, I follow the convention of the equation
\sum_{k=1}^{order} LP[k] s[n-k]
Looping in this order and adding the predicted value in-place implies
that any errors are accumulated. I get quite a lot of frames with
small samples at the start of the frame and large samples at the end,
which can't be right.
Thinking of it, I'd expect that the prediction errors are smaller
than the audio samples. Then for frames that use prediction, the
first few raw samples ("part0", not using prediction) should be
larger than the following values which are only prediction errors.
And these following samples should be blown up by the inverse
prediction to the same order of magnitude as the initial samples.
I'll go on debugging the inverse prediction. I'll be grateful for any
hints on that or on the other problems.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel