The observations Takehiro made are really interesting,
but I think he came to a wrong conclusion. There is no
problem with the ath masking threshold, but with the 
calculated distortion in calc_noise. We would have to
detect such peaks he describes, and calculate the 
resulting distortion different than the usual way.
I would suggest the following:
        if in a scalefactor band the maximum distortion
          of a frequency line is larger than .25 of the sum 
        then the distortion of that band is the maximum
          distortion within that band
        else the distortion is the average distortion of
          that band (just like it is now)

A patch against lame3.55 follows. This patch solves the
problems with "Glockenspiel" gspi35_1.wav. 
(as far as I have tested it yet) 

Robert

Lame A Mpeg-audio Experience


diff -c lame3.55/quantize.c lame3.55.w4/quantize.c

*** lame3.55/quantize.c Thu Nov 11 18:24:05 1999
--- lame3.55.w4/quantize.c      Wed Nov 17 22:43:36 1999
***************
*** 1374,1379 ****
--- 1374,1381 ----
      step = pow( 2.0, (cod_info->quantizerStepSize) * 0.25 );
      for ( sfb = 0; sfb < cod_info->sfb_lmax; sfb++ )
      {
+         FLOAT8 max_sfb_noise = 0;
+         
          start = scalefac_band_long[ sfb ];
          end   = scalefac_band_long[ sfb+1 ];
          bw = end - start;
***************
*** 1382,1390 ****
          {
              FLOAT8 temp;
              temp = fabs(xr[l]) - pow43[ix[l]] * step;
!             sum += temp * temp;
          }
!         xfsf[0][sfb] = sum / bw;
        /* max -30db noise below threshold */
        noise = 10*log10(Max(.001,xfsf[0][sfb] / l3_xmin->l[gr][ch][sfb]));
          distort[0][sfb] = noise;
--- 1384,1394 ----
          {
              FLOAT8 temp;
              temp = fabs(xr[l]) - pow43[ix[l]] * step;
!             temp*= temp;
!             sum += temp;
!             max_sfb_noise = Max(max_sfb_noise,temp);
          }
!         xfsf[0][sfb] = max_sfb_noise < 0.25*sum ? sum/bw : max_sfb_noise;
        /* max -30db noise below threshold */
        noise = 10*log10(Max(.001,xfsf[0][sfb] / l3_xmin->l[gr][ch][sfb]));
          distort[0][sfb] = noise;
***************
*** 1408,1413 ****
--- 1412,1419 ----
  
          for ( sfb = cod_info->sfb_smax; sfb < 12; sfb++ )
          {
+             FLOAT8 max_sfb_noise = 0;
+             
              start = scalefac_band_short[ sfb ];
              end   = scalefac_band_short[ sfb+1 ];
              bw = end - start;
***************
*** 1416,1424 ****
              {
                  FLOAT8 temp;
                  temp = fabs((*xr_s)[l][i]) - pow43[(*ix_s)[l][i]] * step;
!                 sum += temp * temp;
              }       
!             xfsf[i+1][sfb] = sum / bw;
            /* max -30db noise below threshold */
            noise = 10*log10(Max(.001,xfsf[i+1][sfb] / l3_xmin->s[gr][ch][sfb][i] ));
              distort[i+1][sfb] = noise;
--- 1422,1432 ----
              {
                  FLOAT8 temp;
                  temp = fabs((*xr_s)[l][i]) - pow43[(*ix_s)[l][i]] * step;
!                 temp*= temp;
!                 sum += temp;
!                 max_sfb_noise = Max(max_sfb_noise,temp);
              }       
!             xfsf[i+1][sfb] = max_sfb_noise < 0.25*sum ? sum/bw : max_sfb_noise;
            /* max -30db noise below threshold */
            noise = 10*log10(Max(.001,xfsf[i+1][sfb] / l3_xmin->s[gr][ch][sfb][i] ));
              distort[i+1][sfb] = noise;
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )

Reply via email to