Re: [MP3 ENCODER] --voice

2000-09-15 Thread Gabriel Bouvigne

 I've just been trying to help someone with re-encoding from 160/128 down
to
 96 kbps for his portable player so I offered -mj -b 96 --mp3input. This
 works fine but took longer than expected (perhaps because Lame seems to
 automatically resample down to 32 kHz ?), -  are these the best options
for
 optimal quality/filesize  at 96 kbps ?

I would add -h

Regards,

--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] --voice

2000-09-14 Thread Eric Howgate

- Original Message -
From: Mark Taylor [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 13, 2000 9:30 AM
Subject: Re: [MP3 ENCODER] --voice

Another question :) Is there a preferred (or even mandatory)
sequence to command-line opions for Lame ? I remember how quirky
DOS could be in this respect
 

 The only problem is if you use incompatiable options - LAME
 does not check for this.  Examples would be -h and -f
 together, or -k and --lowpass together.

Thanks - I'll watch out for that.

I've just been trying to help someone with re-encoding from 160/128 down to
96 kbps for his portable player so I offered -mj -b 96 --mp3input. This
works fine but took longer than expected (perhaps because Lame seems to
automatically resample down to 32 kHz ?), -  are these the best options for
optimal quality/filesize  at 96 kbps ?

The quality sounds reasonable on my cheap PC speakers - about the same as FM
radio.

Sorry about wandering off topic - I'll cease and desist after this :)

Eric





--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] --voice

2000-09-13 Thread Mark Taylor

 
 

   Thanks - I'll revisit your website.
 
   Another question :) Is there a preferred (or even mandatory)
   sequence to command-line opions for Lame ? I remember how quirky
   DOS could be in this respect
 

The only problem is if you use incompatiable options - LAME
does not check for this.  Examples would be -h and -f
together, or -k and --lowpass together.

Mark

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] --voice

2000-09-13 Thread Robert Hegemann

Mark Taylor schrieb am Mit, 13 Sep 2000:
  
  
 
Thanks - I'll revisit your website.
  
Another question :) Is there a preferred (or even mandatory)
sequence to command-line opions for Lame ? I remember how quirky
DOS could be in this respect
  
 
 The only problem is if you use incompatiable options - LAME
 does not check for this.  Examples would be -h and -f
 together, or -k and --lowpass together.
 
 Mark

As a general rule, the last given option overrides the previous:
-f -h = -h would be used, same goes if you want to override
a few preset settings. So I don't see a problem, but a feature ;-)


Ciao Robert


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] --voice

2000-09-12 Thread Frank Klemm

On Mon, Sep 11, 2000 at 10:01:58PM -, Eric Howgate wrote:
 Could someone satisfy my curiosity about this option (descibed as 'experimenatl' in 
the docs for ver 3.85, but I see that it is available in RazorLame) ?
 
 Does it have fixed default parameters like the --preset voice option, and if so what 
are they ?
 
 What is the thinking behinfd this option - audio books perhaps ?
 

--voice do exactly the same as --preset voice.
You got good voice quality at about 56 kbps.
These are the best settings I found for voice and about 56 kbps.
I took a poem and now I can ricite the poem without any problem ;-)

The same can be done with: phone, voice, radio, tape, cd and studio.

-- 
Frank Klemm

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] --voice

2000-09-12 Thread Gabriel Bouvigne





  Could someone satisfy my curiosity about this 
  option (descibed as 'experimenatl' in the docs for ver 3.85, butI see 
  that it is available in RazorLame) ?
  
  Does it have fixed default parameters like the 
  --preset voice option, and if so what are they ?
  
  What is the thinking behinfd this option - audio 
  books perhaps ?
  
  Many thanks
  
  Eric

--voice was made before presets. Now that we have presets and 
filters in Lame, --preset voice is the same as --voice, and --voice is usable 
for any sampling rates. I personnaly think that now --voice should be removed in 
the benefit of --preset voice, but some people disagree about it.
If you're really interested about what was behind this option 
when it wascreated, I still have a webpage about it.


Regards,

--

Gabriel Bouvigne - France[EMAIL PROTECTED]icq: 
12138873

MP3' Tech: www.mp3-tech.org


Re: [MP3 ENCODER] --voice

2000-09-12 Thread Eric Howgate





  - Original Message - 
  From: 
  Gabriel 
  Bouvigne 
  To: [EMAIL PROTECTED] 
  Sent: Tuesday, September 12, 2000 10:10 
  AM
  Subject: Re: [MP3 ENCODER] --voice
  
  
  
Could someone satisfy my curiosity about this 
option (descibed as 'experimenatl' in the docs for ver 3.85, butI see 
that it is available in RazorLame) ?

Does it have fixed default parameters like the 
--preset voice option, and if so what are they ?

What is the thinking behinfd this option - 
audio books perhaps ?

Many thanks

Eric
  
  --voice was made before presets. Now that we have presets 
  and filters in Lame, --preset voice is the same as --voice, and --voice is 
  usable for any sampling rates.I personnaly think 
  that now --voice should be removed in the benefit of --preset voice, but some 
  people disagree about it.
  If you're really interested about what was behind this 
  option when it wascreated, I still have a webpage about it.
  
  
  Regards,
  
  --
  
  Gabriel Bouvigne - France[EMAIL PROTECTED]icq: 
  12138873
  
  MP3' Tech: www.mp3-tech.org
  
  
  Thanks - I'll revisit your website.
  
  Another question :) Is there a preferred (or even 
  mandatory) sequence to command-line opions for Lame ? I remember how quirky 
  DOS could be in this respect
  
  I think better than tinkering with presets is to 
  construct a command line of my own.
  
  Eric


Re: [MP3 ENCODER] --voice

2000-09-12 Thread Eric Howgate





  - Original Message - 
  From: 
  Gabriel 
  Bouvigne 
  To: [EMAIL PROTECTED] 
  Sent: Tuesday, September 12, 2000 10:10 
  AM
  Subject: Re: [MP3 ENCODER] --voice
  
  
  
Could someone satisfy my curiosity about this 
option (descibed as 'experimenatl' in the docs for ver 3.85, butI see 
that it is available in RazorLame) ?

Does it have fixed default parameters like the 
--preset voice option, and if so what are they ?

What is the thinking behinfd this option - 
audio books perhaps ?

Many thanks

Eric
  
  --voice was made before presets. Now that we have presets 
  and filters in Lame, --preset voice is the same as --voice, and --voice is 
  usable for any sampling rates.I personnaly think 
  that now --voice should be removed in the benefit of --preset voice, but some 
  people disagree about it.
  If you're really interested about what was behind this 
  option when it wascreated, I still have a webpage about it.
  
  
  Regards,
  
  --
  
  Gabriel Bouvigne - France[EMAIL PROTECTED]icq: 
  12138873
  
  MP3' Tech: www.mp3-tech.org
  
  
  Thanks - I'll revisit your website.
  
  Another question :) Is there a preferred (or even 
  mandatory) sequence to command-line opions for Lame ? I remember how quirky 
  DOS could be in this respect
  
  I think better than tinkering with presets is to 
  construct a command line of my own.
  
  Eric


Re: [MP3 ENCODER] Voice encoding questions

2000-08-07 Thread Jaroslav Lukesh

| 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. 
(Man,
| is it slow, though.)  Again, without the forced MPEG-1 sampling rate, the
| mp3enc31 will attempt to use 22050.
 
...

| So my question(s) are:  Is the solution to my problem to
filter/downsample
| (and use joint, when I get around to coding it up)?  That seems to be
what
| is making the difference in the case of LAME; I assume that FhG is using
| some filtering as well, though there's no way to disable it and see for

use option -bw 22050 as bandwidth in Hz



 Jaroslav Lukesh
--
 note: (Bill) Gates to Hell!

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding with low bit rates

2000-08-06 Thread Mark Taylor

 
 I have briefly tried the "--voice" mode and the "normal" mode when
 encoding a purely voice signal (with background noise) at 8kbps, and
 have been very impressed with the difference. I would like to compress
 the signal more... but 8 is as low as it goes.
 
 The "nomal" mode renders the voice absolutely unintelligible (I assume
 the encoder tries too hard to preserve the background).
 

As others have pointed out, --voice is the same as

lame --noshort --lowpass 12

and I imagine the main differece between this and the default is that
that at 8kbs, the default lowpass value is not very good.  It is based
on a simple formula.  

Alfred Weyers did some detailed tests a while ago suggesting some low
bitrate corrections, but this hasn't made it back into LAME

If you dont mind, can you try:

lame -h --noshort --lowpass 12
lame -h --lowpass 12
lame -h --lowpass 10
lame -h --lowpass 8

and let us know which sounds the best?  I'd like to verify the best
filter level for this bitrate and also verify that --noshort really is
helpfull for voice encoding.  The short block encoding is much better
now than when Gabriel first added --voice.


 
 I have read most of the past articles on "--voice" but they don't tell
 me all I wish to know. I am also starting with a 11K/samples per sec
 file (mono) and having to up-sample it to 44.1K before I can process it.
 Has anyone considered allowing different input sample rates (ie: the
 standard 16, 22.05, 24, 32, 48) as well as 44.1 ?
 

Which version are you using?  LAME can take any samplerate for input.
(if you are feeding lame raw pcm data, it will assume the sample rate
is 44.1, unless you add -s 11)


Mark









--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: Re[2]: [MP3 ENCODER] Voice encoding questions

2000-08-05 Thread Robert Hegemann

Frank Klemm schrieb am Sam, 05 Aug 2000:
 We should support an option (-ma for Mode Auto) which switches between -a -mm
 for highly correlated channels (r  0.98 = mono), -mj for a normal
 correlated signals (r = -1.00...-0.20, 0.20...0.91 = stereo) and -ms for nearly not
 correlated signals (dual channel audio with independent audio, i.e. movies
 with english/german audio track , r=-0.20...+0.20).

The joint stereo coding (-mj) in LAME switches automatically between Stereo and 
Mid-Side Stereo. Uncorrelated signals will be LR Stereo coded and correlated
parts of your waves in MS stereo. Given L=left channel and R=right channel:
 M = (L+R)/SQRT2
 S = (L-R)/SQRT2

note: to get your left right channels back:
 L = (M+S)/SQRT2
 R = (M-S)/SQRT2

As you can see, if your input signal is mono (L=R), only the mid channel 
carries information, the side channel is empty. The difference to a true
mono coding in this situation is, that we now need some bits for our empty
side channel which we could use in mono mode for the mid channel too.

My observations on old mono like sounds are, that it is a bad idea to 
let LAME make the side channel really empty. If this happens, it is likely
to get an audible glitch. 



 There are a lot of MP3s out there with mono recordings coded with -mj and
 also -ms.
 
 -- 
 Mit freundlichen Grüßen
 Frank Klemm
 
 PS: What's the difference between '-mm' and '-mm -a' ?
  
 eMail | [EMAIL PROTECTED]   home: [EMAIL PROTECTED]
 phone | +49 (3641) 64-2721home: +49 (3641) 390545
 sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany


Ciao Robert
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-05 Thread Gabriel Bouvigne

 F We should support an option (-ma for Mode Auto) which switches
 F between -a -mm for highly correlated channels (r  0.98 =
 F mono), -mj for a normal correlated signals (r = -1.00...-0.20,
 F 0.20...0.91 = stereo) and -ms for nearly not

 I am afraid most of decoders can't treat an mp3 file correctly
 whose mode(stereo - mono) is changing during one file.

Switching between any stereo modes (stereo, m/s, is, ms and is) is allowed,
but switching between stereo, mono and dual is forbidden by the standard.


Regards,
--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: Re[2]: [MP3 ENCODER] Voice encoding questions

2000-08-05 Thread Frank Klemm

::  Frank Klemm schrieb am Sam, 05 Aug 2000:
::   We should support an option (-ma for Mode Auto) which switches between -a -mm
::   for highly correlated channels (r  0.98 = mono), -mj for a normal
::   correlated signals (r = -1.00...-0.20, 0.20...0.91 = stereo) and -ms for nearly 
:not
::   correlated signals (dual channel audio with independent audio, i.e. movies
::   with english/german audio track , r=-0.20...+0.20).
::  
::  The joint stereo coding (-mj) in LAME switches automatically between Stereo and 
::  Mid-Side Stereo. Uncorrelated signals will be LR Stereo coded and correlated
::  parts of your waves in MS stereo. Given L=left channel and R=right channel:
::   M = (L+R)/SQRT2
::   S = (L-R)/SQRT2
::  
::  note: to get your left right channels back:
::   L = (M+S)/SQRT2
::   R = (M-S)/SQRT2
::
That's clear. 


::  As you can see, if your input signal is mono (L=R), only the mid channel 
::  carries information, the side channel is empty. 
::
Take some mono recordings and prove this. It's very seldom that L=R.
See also alt.binaries.sounds.mp3.* .

FM radio mono recordings (tuner set to mono):
  * differences between the channels from the MPX decoder - AD converter

FM radio mono recordings (tuner set to stereo, p.e. News):
  * differences between the channels from the MPX decoder - AD converter
  * additional noise, distortion, whistle, ... in the X signal

Mono CDs:
  * Both channels are converted by different AD converters with different
parameters (offset, amplification).

Records:
  * a lot of noise and rumble

::  The difference to a true mono coding in this situation is, that we now
::  need some bits for our empty side channel which we could use in mono
::  mode for the mid channel too.
::
I've never seen a True Mono Coding. It is mono (historic reasons),
it sounds like mono, statistic says it's mono, but L != R.

Example: 
  CD:   Jazz - Lyrik - Prosa  
  No:   (Amiga 74321326192)
  Title:My Bonnie is over the Ocean
  Interpreter:  Jazz-Optimisten Berlin
  Length:   11239032 samples [19114 CD frames, 4:14.64]

  Correlation:  r = 0,99879

Coder:
  Lame: 3.86 alpha
  Options:  -V0 -d -q1 --cwlimit 11.5 -X6

Results:
  -mm  3642584 bytes   114.3 kbps
  -mm -a   3642584 bytes   114.3 kbps (bitwise identically with -mm)
  -mj  5637588 bytes   177.0 kbps (+55%)
  -ms  7223049 bytes   226.7 kbps (+98%) 

::  
::  My observations on old mono like sounds are, that it is a bad idea to 
::  let LAME make the side channel really empty. If this happens, it is likely
::  to get an audible glitch.
::
-mm Use Mono
-mi Use Intensity Stereo, MS-Stereo and LR-Stereo
-mj Use MS-Stereo and LR-Stereo
-ms Use LR-Stereo
-ma Analyze FIle before any converting, select -mm, -mj or -ms


Another question:
  Is there any tool to analyze the number of SI, MS and LR frames in a MP3?

-- 
Mit freundlichen Grüßen
Frank Klemm
 
eMail | [EMAIL PROTECTED]   home: [EMAIL PROTECTED]
phone | +49 (3641) 64-2721home: +49 (3641) 390545
sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: Re[2]: [MP3 ENCODER] Voice encoding questions

2000-08-05 Thread Robert Hegemann

Frank Klemm schrieb am Sam, 05 Aug 2000:
snip
 Another question:
   Is there any tool to analyze the number of SI, MS and LR frames in a MP3?

I don't know if there is any, but you can write a simple tool scanning
all mp3 headers and counting the type of each frame. Or you can look 
out for a tool called mp3check and let it make a dump, all you would 
have to do is count the different stereo mode extensions.

 
 -- 
 Mit freundlichen Grüßen
 Frank Klemm
  
 eMail | [EMAIL PROTECTED]   home: [EMAIL PROTECTED]
 phone | +49 (3641) 64-2721home: +49 (3641) 390545
 sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany


Ciao Robert
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] --voice Modus

2000-08-05 Thread Robert Hegemann

Frank Klemm schrieb am Die, 01 Aug 2000:
 For = 56 kbit/s Voice-Mode of Lame always sounds better than the normal
 mode. I've tested several kind of music and also spoken words.
 
 So there are 2 questions:
 
   * Does voice mode become standard for low bit rates?
   * What makes voice mode (it's not IS, it also works with mono)?
   * is IS for lame planned?
 
 -- 
 Frank Klemm

LAME's voice modes turn off short blocks and apply a lowpass filter
around 12 kHz. In its first draft some lower MDCT coefficients were
dropped (something like a highpass filtering, we had no filter code
at that time), but not nowadays.

I can't tell you if someone is working on an intensity stereo mode for LAME,
but I don't think so.

Maybe for lower constant bitrate modes we have to rework the bit allocation
scheme.


Ciao Robert
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-05 Thread Mark Taylor



 Another question:
   Is there any tool to analyze the number of SI, MS and LR frames in a MP3?
 
Frank, you just need a GTK enabled version of lame :-)
run lame -g on the mp3 file, scroll to the end, and then
click 'show' under the 'stats' pull down menu.
It shows the info you want, and any additional statistics
would be easy to add.  You can also use to to examine
the mid/side bit allocation frame by frame. 

You could test your ideas about near mono files 
via the following:  

Modify reduce_side() function in quantize-pvt.c to
be more aggressive.  Right now it allocates at most
a 33/66 split between side channel and mid channel,
based on the side_channel_energy/total_energy ratio.

As Robert mentioned, a more aggressive split can
create artifacts.  I think the problem is that 
allocating just a few bits to the side channel
can produce audible glitches which will sound worse
than if 0 bits were used.  But no one has done a
detailed study of this.  



 -mm   Use Mono
 -mi   Use Intensity Stereo, MS-Stereo and LR-Stereo
 -mj   Use MS-Stereo and LR-Stereo
 -ms   Use LR-Stereo
 -ma   Analyze FIle before any converting, select -mm, -mj or -ms
 
 

I think -ma would be beyond the scope of LAME. A 
seperate analysis program should be written, and then a 
GUI front end should run the analysis and make the selection.

This is similar to automatic level adjustment.  A couple people
have expressed interest in adding a volume adjustment to
LAME, which is a fine, but the additional step of runing
some analysis on the file to determine the adjustment
should be left to a seperate program.

Mark






--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



[MP3 ENCODER] Voice encoding with low bit rates

2000-08-04 Thread engdev

I have briefly tried the "--voice" mode and the "normal" mode when
encoding a purely voice signal (with background noise) at 8kbps, and
have been very impressed with the difference. I would like to compress
the signal more... but 8 is as low as it goes.

The "nomal" mode renders the voice absolutely unintelligible (I assume
the encoder tries too hard to preserve the background).

The "--voice" mode actually seems to reduce the background garbage
(noise) where there is no speech, and to also concentrate on the speech
when it is present.

I have looked at the spectrogram for each, and there is a BIG
difference.

My question (after all this guff) is "does LAME perform any smarts (like
looking for particular frequency domain patterns), and if so, what?"

I have read most of the past articles on "--voice" but they don't tell
me all I wish to know. I am also starting with a 11K/samples per sec
file (mono) and having to up-sample it to 44.1K before I can process it.
Has anyone considered allowing different input sample rates (ie: the
standard 16, 22.05, 24, 32, 48) as well as 44.1 ?


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding with low bit rates

2000-08-04 Thread Gabriel Bouvigne

 I have briefly tried the "--voice" mode and the "normal" mode when
 encoding a purely voice signal (with background noise) at 8kbps, and
 have been very impressed with the difference. I would like to compress
 the signal more... but 8 is as low as it goes.

 The "nomal" mode renders the voice absolutely unintelligible (I assume
 the encoder tries too hard to preserve the background).

 The "--voice" mode actually seems to reduce the background garbage
 (noise) where there is no speech, and to also concentrate on the speech
 when it is present.

 I have looked at the spectrogram for each, and there is a BIG
 difference.

 My question (after all this guff) is "does LAME perform any smarts (like
 looking for particular frequency domain patterns), and if so, what?"

 I have read most of the past articles on "--voice" but they don't tell
 me all I wish to know. I am also starting with a 11K/samples per sec
 file (mono) and having to up-sample it to 44.1K before I can process it.
 Has anyone considered allowing different input sample rates (ie: the
 standard 16, 22.05, 24, 32, 48) as well as 44.1 ?


I first wrote the voice mode, mainly by using some supposition about the
signal and a lot of listening tests. You can read what was done at the
beginning by this option here:
http://www.multimania.com/bouvigne/lame/voice.html
Because of a lack of time, and the lack of good filtereing solution at this
time in Lame, I only tuned it for 44.1kHz files.
But now, Robert introduced the presets in Lame, including --preset voice,
and Lame got some good filters. So I think that "--preset voice" is now
doing the same thing as --voice, and can be used for any sampling rate, but
I'd like Robert confirmation to know if the behaviour is the same
as --voice.
If it's doing the same, I'd suggest you to stop using --voice and start
using --preset voice instead.
I personnaly think that now the --voice switch should be removed, as there
is --preset voice.


Regards,
--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding with low bit rates

2000-08-04 Thread Robert Hegemann

  I have briefly tried the "--voice" mode and the "normal" mode when
  encoding a purely voice signal (with background noise) at 8kbps, and
  have been very impressed with the difference. I would like to compress
  the signal more... but 8 is as low as it goes.
 
  The "nomal" mode renders the voice absolutely unintelligible (I assume
  the encoder tries too hard to preserve the background).
 
  The "--voice" mode actually seems to reduce the background garbage
  (noise) where there is no speech, and to also concentrate on the
  speech when it is present.


the voice mode applies a lowpass filter at 12 kHz


  I have looked at the spectrogram for each, and there is a BIG
  difference.


and does not use short blocks


  My question (after all this guff) is "does LAME perform any smarts
  (like looking for particular frequency domain patterns), and if so,
  what?"


no, LAME does nothing special for voice signals automagically


  I have read most of the past articles on "--voice" but they don't tell
  me all I wish to know. I am also starting with a 11K/samples per sec
  file (mono) and having to up-sample it to 44.1K before I can process
  it. Has anyone considered allowing different input sample rates 
  (ie: the standard 16, 22.05, 24, 32, 48) as well as 44.1 ?


you don't need to up-sample your waves.

valid output samplerates are 8, 11.025, 12, 16, 22.05, 24, 32, 44.1, 48
kHz


 
 
 I first wrote the voice mode, mainly by using some supposition about the
 signal and a lot of listening tests. You can read what was done at the
 beginning by this option here:
 http://www.multimania.com/bouvigne/lame/voice.html
 Because of a lack of time, and the lack of good filtereing solution at
 this time in Lame, I only tuned it for 44.1kHz files.


One point has changed since we have *good filters* in LAME, the highpass
filtering had to be dropped for the voice mode, because the filters are
too rough.


 But now, Robert introduced the presets in Lame, including --preset
 voice, and Lame got some good filters. 
 So I think that "--preset voice" is now doing the same thing as --voice,

 and can be used for any sampling rate,
 but I'd like Robert confirmation to know if the behaviour is the same
 as --voice.


No, the voice preset and the voice option are not 100% identical.
By the way, the --voice option is a shortcut for "--lowpass 12 --noshort".
I would suggest looking at "lame --preset help" and experimenting with
--preset phone / --preset sw etc. 


 If it's doing the same, I'd suggest you to stop using --voice and start
 using --preset voice instead.
 I personnaly think that now the --voice switch should be removed, as
 there is --preset voice.

I think we should keep it :-)

 
 
 Regards,
 --
 
 Gabriel Bouvigne - France
 [EMAIL PROTECTED]
 icq: 12138873
 
 MP3' Tech: www.mp3-tech.org


Ciao Robert

-- 
Sent through GMX FreeMail - http://www.gmx.net
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



[MP3 ENCODER] Voice encoding questions

2000-08-04 Thread alex . broadhead

Howdy All,

In testing my (comparatively naive) hack of the dist10 encoder, I have
discovered that, while it does OK for music, it has real problems with
speech signals.  (Caveat:  at our lowest overall bitrate of 300kbps for
combined video/audio, we run the audio at 32kbit mono - though we go way up
to 64kbps mono for higher overall bitrate signals, and are aiming to default
at 64kbps stereo [not joint].)  In particular, the broadband noise bursts
associated with fricatives really wreak havoc.

My test signal here is spfe49_1 from the AAC SQAM test suite, which is a
female English speaker going on about giving pills to animals.  I ran it
through 1) my encoder, 2) LAME (3.85 w/ frame analyzer), 3) mp3enc31, and 4)
our current Layer-II encoder.

1) With my encoder (64kbps stereo CRC), every fricative is almost painful to
listen to, as the pink noise bursts end up being narrow band filtered (due
to lack of bits - only the MDCT coeffs closest to the pole are making it
into the bitstream), and there are occasional weird high frequency blips and
arpeggiation which are very annoying.

2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't enabled
LSF yet) sounds pretty good.  There are occasional minor glitches, but
that's to be expected at this bitrate.  However, LAME (as above plus -k to
turn off the filters) sounds pretty similar to what I'm getting.  I note
that without the forced resampling, LAME will attempt to downsample to
22050.

3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good.  (Man,
is it slow, though.)  Again, without the forced MPEG-1 sampling rate, the
mp3enc31 will attempt to use 22050.

4) Layer-II (64 kbps stereo CRC) sounds good.

So my question(s) are:  Is the solution to my problem to filter/downsample
(and use joint, when I get around to coding it up)?  That seems to be what
is making the difference in the case of LAME; I assume that FhG is using
some filtering as well, though there's no way to disable it and see for
sure.  Are there really just not enough bits for this type of signal at this
bitrate?  Why does Layer-II do so much better a job with this type of
signal?  Do other codecs (AAC/MPEG-4) hand this kind of signal better as
well?  And what is the capital of Assyria?

Inquiring minds wanna know,
Alex

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Gabriel Bouvigne


- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, August 04, 2000 4:14 PM
Subject: [MP3 ENCODER] Voice encoding questions


 Howdy All,

 In testing my (comparatively naive) hack of the dist10 encoder, I have
 discovered that, while it does OK for music, it has real problems with
 speech signals.  (Caveat:  at our lowest overall bitrate of 300kbps for
 combined video/audio, we run the audio at 32kbit mono - though we go way
up
 to 64kbps mono for higher overall bitrate signals, and are aiming to
default
 at 64kbps stereo [not joint].)  In particular, the broadband noise bursts
 associated with fricatives really wreak havoc.

 My test signal here is spfe49_1 from the AAC SQAM test suite, which is a
 female English speaker going on about giving pills to animals.  I ran it
 through 1) my encoder, 2) LAME (3.85 w/ frame analyzer), 3) mp3enc31, and
4)
 our current Layer-II encoder.

 1) With my encoder (64kbps stereo CRC), every fricative is almost painful
to
 listen to, as the pink noise bursts end up being narrow band filtered (due
 to lack of bits - only the MDCT coeffs closest to the pole are making it
 into the bitstream), and there are occasional weird high frequency blips
and
 arpeggiation which are very annoying.

 2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't
enabled
 LSF yet) sounds pretty good.  There are occasional minor glitches, but
 that's to be expected at this bitrate.  However, LAME (as above plus -k to
 turn off the filters) sounds pretty similar to what I'm getting.  I note
 that without the forced resampling, LAME will attempt to downsample to
 22050.

If you want to encode voice signals, I'd suggest you to use --voice
or --preset voice


 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good.  (Man,
 is it slow, though.)  Again, without the forced MPEG-1 sampling rate, the
 mp3enc31 will attempt to use 22050.

You're disabling intensity stereo, but not joint stereo. With those
settings, mp3enc is using m/s stereo. This is an advantage over Lame that
you forced to use plain stereo.


 4) Layer-II (64 kbps stereo CRC) sounds good.

The layer II encoder is probably using joint stereo. In Layer II, joint
stereo is quite similar to the intensity stereo of layer III



And what is the capital of Assyria?
The first assyrian capital was Assur, and it was later replaced by Kalah.

--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Gabriel Bouvigne


 So my question(s) are:  Is the solution to my problem to filter/downsample
 (and use joint, when I get around to coding it up)?  That seems to be what
 is making the difference in the case of LAME; I assume that FhG is using
 some filtering as well, though there's no way to disable it and see for
 sure.  Are there really just not enough bits for this type of signal at
this
 bitrate?  Why does Layer-II do so much better a job with this type of
 signal?  Do other codecs (AAC/MPEG-4) hand this kind of signal better as
 well?

I forget something: the sample you're using is very closed to mono, so joint
stereo helps a lot.

For your problem, there are mainly 2 soulutions:
a: downsampling
b: using joint stereo. For voice signal, the best joint mode would probably
be intensity stereo. But it's not implemented in Lame.

You mentionned that you use crc. Are you aware that the ISO crc code is
brocken?

Regards,


--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Mark Taylor


 1) With my encoder (64kbps stereo CRC), every fricative is almost painful to
 listen to, as the pink noise bursts end up being narrow band filtered (due
 to lack of bits - only the MDCT coeffs closest to the pole are making it
 into the bitstream), and there are occasional weird high frequency blips and
 arpeggiation which are very annoying.
 
 2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't enabled
 LSF yet) sounds pretty good.  There are occasional minor glitches, but
 that's to be expected at this bitrate.  However, LAME (as above plus -k to
 turn off the filters) sounds pretty similar to what I'm getting.  I note
 that without the forced resampling, LAME will attempt to downsample to
 22050.
 
 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good.  (Man,
 is it slow, though.)  Again, without the forced MPEG-1 sampling rate, the
 mp3enc31 will attempt to use 22050.
 

The main difference between FhG and LAME is probably the lowpass
filters.  Try different values of --lowpass.  The compression ratio
you are using (about 22x) is not commonly used, and the LAME's
default guess at a lowpass setting wont be very good.

Why do you disable the 22050 downsampling?  This is done based on the
idea that encoding at 22khz is better than encoding at 44khz and
removing have the specturm with filters.

FhG is probably using joint stereo?  This will increase the
bandwidth by 10-20%.  

The main difference between LAME and ISO is that the ISO
code has serious flaws in several major components.  jstereo, 
filtering and other advanced features help, but you gotta fix
the bugs first!


 some filtering as well, though there's no way to disable it and see for
 sure.  Are there really just not enough bits for this type of signal at this
 bitrate?  Why does Layer-II do so much better a job with this type of
 signal?  Do other codecs (AAC/MPEG-4) hand this kind of signal better as

You rate FhG as 'very good', and Layer II as 'good'.  So I'm assuming
layer III beats layer II.  The thing layer III adds to layer II is: 1)
MDCT transform (lossless to roundoff), 2) entropy coding (lossless),
3) bitreservoir (prevents wasting of bits) and 4) the ability to do
more advanced noise shaping.  #1,2 and 3 can only improve the
quality. The only way I can see layer II out-perform layer III is if
#4 is not tuned properly for the desired compression.


 well?  And what is the capital of Assyria?
 
during which century?

Mark
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



RE: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread alex . broadhead

Howdy All,

Thanks for the quick replies!

Gabriel Bouvigne wrote:

 If you want to encode voice signals, I'd suggest you to use --voice
 or --preset voice

Actually, I want to encode general signals (mostly TV and movies), many of
which have significant voice components, and, unfortunately, many of which
do not.  My coded is doing OK on music, and sucking at voice, so what I'm
really trying to do is figure out why _voice_ signals are a problem for
_general purpose_ encoders.  Otherwise I would just bandpass 300-3000 Hz.

  3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds
 very good.  (Man,
  is it slow, though.)  Again, without the forced MPEG-1
 sampling rate, the
  mp3enc31 will attempt to use 22050.

 You're disabling intensity stereo, but not joint stereo. With those
 settings, mp3enc is using m/s stereo. This is an advantage
 over Lame that
 you forced to use plain stereo.

Yeah, I noticed that.  As I'm sure you have already discovered, there is no
way to disable M/S in mp3enc, so the comparison is bad.

 I forget something: the sample you're using is very closed to
 mono, so joint
 stereo helps a lot.

A very good point.  I would hate to give FhG more credit than they deserve.

  4) Layer-II (64 kbps stereo CRC) sounds good.

 The layer II encoder is probably using joint stereo. In Layer
 II, joint
 stereo is quite similar to the intensity stereo of layer III

Actually, there is no joint stereo code in our Layer-II encoder, so I'm sure
it's not using it.

I should probably qualify my rating of 'good' to say that there are no
obvious and distracting high frequency artifacts.  Of course, the whole
thing sounds like AM radio, but, in my experience, that is the difference
between Layer-II and Layer-III degradation.  Layer-II has an initial series
of 'non-linear' (to pervert a term) distortions at a relatively low
compression ratio, after which it just starts evenly raising the noise floor
('linear' distortion).  Distortions in Layer-III are almost always
'non-linear' (wateriness, blips, missing frequencys, lowpass), though the
noise floor stays consistently low.  At low bitrates, I find 'linear'
distortion infinitely preferable to the 'non-linear', though this is, of
course, purely a matter of taste.

 For your problem, there are mainly 2 soulutions:
 a: downsampling
 b: using joint stereo. For voice signal, the best joint mode
 would probably
 be intensity stereo. But it's not implemented in Lame.

This was my suspicion, I was really just looking for confirmation.  Thanks.

 You mentionned that you use crc. Are you aware that the ISO
 crc code is
 brocken?

It may well have been broken (though I seem to remember that it was simply
not present for Layer-III) - I wouldn't know, since I removed it and wrote
my own, which is not.  (For realtime multicast, it was a feature we had to
have.)

Greg Maxwell wrote:

 The dist10 encoder has a bug in the short block code which
 makes it stink
 on fricatives in speech.

Does anyone have any more info on this?  The frame analyzer doesn't indicate
that I'm using short blocks on the fricatives in question - or is that the
bug?

Mark Taylor wrote:

 Why do you disable the 22050 downsampling?  This is done based on the
 idea that encoding at 22khz is better than encoding at 44khz and
 removing have the specturm with filters.

Because I was trying to compare apples to apples (MPEG-1 to MPEG-1) and my
encoder doesn't use LSF yet.

 FhG is probably using joint stereo?  This will increase the
 bandwidth by 10-20%.

Yes, as discussed above, this is definitely cooking the books.

 The main difference between LAME and ISO is that the ISO
 code has serious flaws in several major components.  jstereo,
 filtering and other advanced features help, but you gotta fix
 the bugs first!

I like to think that I have fixed at least a few.  Now that I've finished a
first pass clean, rewrite, overhaul, and verify, I'm taking a closer look at
algorithmic (as opposed to purely implementational) problems, starting with
the main loop, and probably ending with the #^@% psych model.  Of course,
if advanced features are going to make a bigger difference, though, they may
gain a higher priority.

 You rate FhG as 'very good', and Layer II as 'good'.  So I'm assuming
 layer III beats layer II.  The thing layer III adds to layer II is: 1)
 MDCT transform (lossless to roundoff), 2) entropy coding (lossless),
 3) bitreservoir (prevents wasting of bits) and 4) the ability to do
 more advanced noise shaping.  #1,2 and 3 can only improve the
 quality. The only way I can see layer II out-perform layer III is if
 #4 is not tuned properly for the desired compression.

Your assumption is correct.  And, based on my observations about distortion
above, I would concur with your analysis; the noise shaping seems to be
breaking down pretty badly at this (ridiculously high, I am aware)
compression ratio.

-

I'd just like to say that I really appreciate the feedback that this list
provides - I don't 

Re: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Gabriel Bouvigne


 I like to think that I have fixed at least a few.  Now that I've finished
a
 first pass clean, rewrite, overhaul, and verify, I'm taking a closer look
at
 algorithmic (as opposed to purely implementational) problems, starting
with
 the main loop, and probably ending with the #^@% psych model.  Of course,
 if advanced features are going to make a bigger difference, though, they
may
 gain a higher priority.


I'd suggest you to look at the archives of this list, and to look at Lame
3.00. It's code was probably a lot easier, and it was mainly bugfixed ISO
with addition of joint stereo.

Regards,

--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re[2]: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Roel VdB

Hello alex,

abcc I feel guilty using a list mainly devoted to an open source codec (LAME) to
abcc further the development of ClearBand's 'proprietary' codec.  (Is a standards
abcc based codec implementation proprietary?  We don't sell the codec - we sell a
abcc multicast system, mostly to ISPs and corporations, and the proprietary part
abcc is the multicast part.  My superiors just didn't want to license FhG's
abcc source, I guess...)

I don't know if that would help.  If you look at
http://www.mp3licensing.com , you would see it costs $1M to bring any
mp3 encoder on the market.  FhG+Thomson have patents on mp3, not only
their code.

http://www.mp3licensing.com/royalty/swenc.html
http://www.mp3licensing.com/royalty/broadcast.html
We do not charge royalties for mp3 streaming or mp3 broadcasting
(e.g. Internet Radio) until the end of the year 2000. Beyond this
date we anticipate to charge a small annual minimum and a percentage
of revenue. However, this model is not yet fully developed because we
cannot yet oversee where this new market is going.

best inform your superiors to rip off Ogg Vorbis, which is not patented
;-)

-- 
Best regards,
 Roelmailto:[EMAIL PROTECTED]


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: Re[2]: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Frank Klemm

We should support an option (-ma for Mode Auto) which switches between -a -mm
for highly correlated channels (r  0.98 = mono), -mj for a normal
correlated signals (r = -1.00...-0.20, 0.20...0.91 = stereo) and -ms for nearly not
correlated signals (dual channel audio with independent audio, i.e. movies
with english/german audio track , r=-0.20...+0.20).

There are a lot of MP3s out there with mono recordings coded with -mj and
also -ms.

-- 
Mit freundlichen Grüßen
Frank Klemm

PS: What's the difference between '-mm' and '-mm -a' ?
 
eMail | [EMAIL PROTECTED]   home: [EMAIL PROTECTED]
phone | +49 (3641) 64-2721home: +49 (3641) 390545
sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



[MP3 ENCODER] --voice Modus

2000-07-31 Thread Frank Klemm

For = 56 kbit/s Voice-Mode of Lame always sounds better than the normal
mode. I've tested several kind of music and also spoken words.

So there are 2 questions:

  * Does voice mode become standard for low bit rates?
  * What makes voice mode (it's not IS, it also works with mono)?
  * is IS for lame planned?

-- 
Frank Klemm

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice mode

1999-10-15 Thread mikecheng

Hi all
On 15-Oct-99 Greg Maxwell wrote:
 Are you aware you can use lame for image compression? :)
 compression.. Check out http://linuxpower.cx/~greg/mp3crap/ for some
 examples of this kind of perversion..
This is so perverse it's actually neat. :)
Anyone tried the reverse? Image compression on sound files?

later
mike
(I vote for a switch to the LGPL)

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice mode

1999-10-15 Thread Gabriel Bouvigne

 Duh.. I see, M/S is define with addition/subtraction not multiplication.

 So, is there a downloadably doc that describes intensity mode, are
 decoders a good source of info?

There is a few lines (only a few) about it in the iso docs. If you don't
have them yet, they are available on mp3tech.org


  Unfortunately, it's unlikely that I'll have the time to work on this. As
it
  does not seems to be much difficult to make (at least it's easier than
m/s
  stereo), perhaps Patrick could allow some of his students to work on
this. I
  can't do it during my own student project, as mine must be about image
  processing or image synthesis.

 Are you aware you can use lame for image compression? :)

 First create an greyscale image ((576*x)*n, or (1152*x)*n for best
 results) save it as ascii ppm.
 Cut off the header.
 Use sox to go from 8bit unsigned to a 16 bit mono wav file set at 44100.
 Encode, reverse..

 Looks preety good, perhaps if you do some kind of transform on the input
 first to make it less 'transiant' (some kind of reversiable convolution)
 you might get better compression then jpg (it's not too far as is).

 Kinda gives you respect for how tough audio compression is VS image
 compression.. Check out http://linuxpower.cx/~greg/mp3crap/ for some
 examples of this kind of perversion..

 (Why did I do this? I was hoping nasty artifacts might be more easily
 found in a picture rather then listening to a sample)

That's strange... I'll have a look at it


Regards,

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice mode

1999-10-15 Thread Greg Maxwell

On Fri, 15 Oct 1999 [EMAIL PROTECTED] wrote:

 Hi all
 On 15-Oct-99 Greg Maxwell wrote:
  Are you aware you can use lame for image compression? :)
  compression.. Check out http://linuxpower.cx/~greg/mp3crap/ for some
  examples of this kind of perversion..
 This is so perverse it's actually neat. :)
 Anyone tried the reverse? Image compression on sound files?

I did a long time ago, it doesn't work too well (using jpg at least)
mostly because it only works on 8-bit and 2d quant doesn't go well with
sound.

 
 later
 mike
 (I vote for a switch to the LGPL)


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice mode

1999-10-14 Thread Greg Maxwell

On Thu, 14 Oct 1999, Gabriel Bouvigne wrote:

 
  Panned stereo mode!
 
  For each frame you examine a spectrally weighed (ignore low freqs) energy
  ratio of left/right to pick the stereo panning at two points in time
  (middle, and right) then interpoate from old_left to middle then right,
  enforcing a maximum rate of change.
 
  Then mix to mono, encode as mid, and the position as side, you only have
  to encode lower scalfactors because it should consist of low freqs only
  (because of your slow interpolation). I suppose you could shape your side
  wav to MDCT well too..
 
  Am I missing something?
 
  I'd think that this would allow you to get mono quality at about the same
  bitrate, but still preserve panning which would be help at differentiating
  between people speaking.
 
 
 According to me, this can't be done using m/s stereo, because in the case of
 someone speaking on side, the side channel would be as high as the middle
 channel.
 What you're describing here looks lot like the intensity stereo mode, where
 the signal is encoded as mono on the left channel, and location is encoded
 on the right one. This would help a lot voice encoding, but also music at
 low bitrates. To my mind, it's something missing to Lame in order to be able
 to compete FhG at low bitrates.

Duh.. I see, M/S is define with addition/subtraction not multiplication.

So, is there a downloadably doc that describes intensity mode, are
decoders a good source of info?

 
 Unfortunately, it's unlikely that I'll have the time to work on this. As it
 does not seems to be much difficult to make (at least it's easier than m/s
 stereo), perhaps Patrick could allow some of his students to work on this. I
 can't do it during my own student project, as mine must be about image
 processing or image synthesis.

Are you aware you can use lame for image compression? :)

First create an greyscale image ((576*x)*n, or (1152*x)*n for best
results) save it as ascii ppm.
Cut off the header.
Use sox to go from 8bit unsigned to a 16 bit mono wav file set at 44100. 
Encode, reverse..

Looks preety good, perhaps if you do some kind of transform on the input
first to make it less 'transiant' (some kind of reversiable convolution)
you might get better compression then jpg (it's not too far as is). 

Kinda gives you respect for how tough audio compression is VS image
compression.. Check out http://linuxpower.cx/~greg/mp3crap/ for some
examples of this kind of perversion..

(Why did I do this? I was hoping nasty artifacts might be more easily
found in a picture rather then listening to a sample)

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice mode

1999-10-13 Thread Greg Maxwell

On Wed, 13 Oct 1999, Gabriel Bouvigne wrote:

 The voice mode is made using 3 tricks:
 *using only long blocks
 *limiting bitrate when vbr
 *using a band-pass filter

For a more sophicated hack:

Panned stereo mode!

For each frame you examine a spectrally weighed (ignore low freqs) energy
ratio of left/right to pick the stereo panning at two points in time
(middle, and right) then interpoate from old_left to middle then right,
enforcing a maximum rate of change.

Then mix to mono, encode as mid, and the position as side, you only have
to encode lower scalfactors because it should consist of low freqs only
(because of your slow interpolation). I suppose you could shape your side
wav to MDCT well too..

Am I missing something? 

I'd think that this would allow you to get mono quality at about the same
bitrate, but still preserve panning which would be help at differentiating
between people speaking.

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )