Re: [MP3 ENCODER] --voice
I've just been trying to help someone with re-encoding from 160/128 down to 96 kbps for his portable player so I offered -mj -b 96 --mp3input. This works fine but took longer than expected (perhaps because Lame seems to automatically resample down to 32 kHz ?), - are these the best options for optimal quality/filesize at 96 kbps ? I would add -h Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] --voice
- Original Message - From: Mark Taylor [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, September 13, 2000 9:30 AM Subject: Re: [MP3 ENCODER] --voice Another question :) Is there a preferred (or even mandatory) sequence to command-line opions for Lame ? I remember how quirky DOS could be in this respect The only problem is if you use incompatiable options - LAME does not check for this. Examples would be -h and -f together, or -k and --lowpass together. Thanks - I'll watch out for that. I've just been trying to help someone with re-encoding from 160/128 down to 96 kbps for his portable player so I offered -mj -b 96 --mp3input. This works fine but took longer than expected (perhaps because Lame seems to automatically resample down to 32 kHz ?), - are these the best options for optimal quality/filesize at 96 kbps ? The quality sounds reasonable on my cheap PC speakers - about the same as FM radio. Sorry about wandering off topic - I'll cease and desist after this :) Eric -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] --voice
Thanks - I'll revisit your website. Another question :) Is there a preferred (or even mandatory) sequence to command-line opions for Lame ? I remember how quirky DOS could be in this respect The only problem is if you use incompatiable options - LAME does not check for this. Examples would be -h and -f together, or -k and --lowpass together. Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] --voice
Mark Taylor schrieb am Mit, 13 Sep 2000: Thanks - I'll revisit your website. Another question :) Is there a preferred (or even mandatory) sequence to command-line opions for Lame ? I remember how quirky DOS could be in this respect The only problem is if you use incompatiable options - LAME does not check for this. Examples would be -h and -f together, or -k and --lowpass together. Mark As a general rule, the last given option overrides the previous: -f -h = -h would be used, same goes if you want to override a few preset settings. So I don't see a problem, but a feature ;-) Ciao Robert -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] --voice
On Mon, Sep 11, 2000 at 10:01:58PM -, Eric Howgate wrote: Could someone satisfy my curiosity about this option (descibed as 'experimenatl' in the docs for ver 3.85, but I see that it is available in RazorLame) ? Does it have fixed default parameters like the --preset voice option, and if so what are they ? What is the thinking behinfd this option - audio books perhaps ? --voice do exactly the same as --preset voice. You got good voice quality at about 56 kbps. These are the best settings I found for voice and about 56 kbps. I took a poem and now I can ricite the poem without any problem ;-) The same can be done with: phone, voice, radio, tape, cd and studio. -- Frank Klemm -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] --voice
Could someone satisfy my curiosity about this option (descibed as 'experimenatl' in the docs for ver 3.85, butI see that it is available in RazorLame) ? Does it have fixed default parameters like the --preset voice option, and if so what are they ? What is the thinking behinfd this option - audio books perhaps ? Many thanks Eric --voice was made before presets. Now that we have presets and filters in Lame, --preset voice is the same as --voice, and --voice is usable for any sampling rates. I personnaly think that now --voice should be removed in the benefit of --preset voice, but some people disagree about it. If you're really interested about what was behind this option when it wascreated, I still have a webpage about it. Regards, -- Gabriel Bouvigne - France[EMAIL PROTECTED]icq: 12138873 MP3' Tech: www.mp3-tech.org
Re: [MP3 ENCODER] --voice
- Original Message - From: Gabriel Bouvigne To: [EMAIL PROTECTED] Sent: Tuesday, September 12, 2000 10:10 AM Subject: Re: [MP3 ENCODER] --voice Could someone satisfy my curiosity about this option (descibed as 'experimenatl' in the docs for ver 3.85, butI see that it is available in RazorLame) ? Does it have fixed default parameters like the --preset voice option, and if so what are they ? What is the thinking behinfd this option - audio books perhaps ? Many thanks Eric --voice was made before presets. Now that we have presets and filters in Lame, --preset voice is the same as --voice, and --voice is usable for any sampling rates.I personnaly think that now --voice should be removed in the benefit of --preset voice, but some people disagree about it. If you're really interested about what was behind this option when it wascreated, I still have a webpage about it. Regards, -- Gabriel Bouvigne - France[EMAIL PROTECTED]icq: 12138873 MP3' Tech: www.mp3-tech.org Thanks - I'll revisit your website. Another question :) Is there a preferred (or even mandatory) sequence to command-line opions for Lame ? I remember how quirky DOS could be in this respect I think better than tinkering with presets is to construct a command line of my own. Eric
Re: [MP3 ENCODER] --voice
- Original Message - From: Gabriel Bouvigne To: [EMAIL PROTECTED] Sent: Tuesday, September 12, 2000 10:10 AM Subject: Re: [MP3 ENCODER] --voice Could someone satisfy my curiosity about this option (descibed as 'experimenatl' in the docs for ver 3.85, butI see that it is available in RazorLame) ? Does it have fixed default parameters like the --preset voice option, and if so what are they ? What is the thinking behinfd this option - audio books perhaps ? Many thanks Eric --voice was made before presets. Now that we have presets and filters in Lame, --preset voice is the same as --voice, and --voice is usable for any sampling rates.I personnaly think that now --voice should be removed in the benefit of --preset voice, but some people disagree about it. If you're really interested about what was behind this option when it wascreated, I still have a webpage about it. Regards, -- Gabriel Bouvigne - France[EMAIL PROTECTED]icq: 12138873 MP3' Tech: www.mp3-tech.org Thanks - I'll revisit your website. Another question :) Is there a preferred (or even mandatory) sequence to command-line opions for Lame ? I remember how quirky DOS could be in this respect I think better than tinkering with presets is to construct a command line of my own. Eric
Re: [MP3 ENCODER] Voice encoding questions
| 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. (Man, | is it slow, though.) Again, without the forced MPEG-1 sampling rate, the | mp3enc31 will attempt to use 22050. ... | So my question(s) are: Is the solution to my problem to filter/downsample | (and use joint, when I get around to coding it up)? That seems to be what | is making the difference in the case of LAME; I assume that FhG is using | some filtering as well, though there's no way to disable it and see for use option -bw 22050 as bandwidth in Hz Jaroslav Lukesh -- note: (Bill) Gates to Hell! -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding with low bit rates
I have briefly tried the "--voice" mode and the "normal" mode when encoding a purely voice signal (with background noise) at 8kbps, and have been very impressed with the difference. I would like to compress the signal more... but 8 is as low as it goes. The "nomal" mode renders the voice absolutely unintelligible (I assume the encoder tries too hard to preserve the background). As others have pointed out, --voice is the same as lame --noshort --lowpass 12 and I imagine the main differece between this and the default is that that at 8kbs, the default lowpass value is not very good. It is based on a simple formula. Alfred Weyers did some detailed tests a while ago suggesting some low bitrate corrections, but this hasn't made it back into LAME If you dont mind, can you try: lame -h --noshort --lowpass 12 lame -h --lowpass 12 lame -h --lowpass 10 lame -h --lowpass 8 and let us know which sounds the best? I'd like to verify the best filter level for this bitrate and also verify that --noshort really is helpfull for voice encoding. The short block encoding is much better now than when Gabriel first added --voice. I have read most of the past articles on "--voice" but they don't tell me all I wish to know. I am also starting with a 11K/samples per sec file (mono) and having to up-sample it to 44.1K before I can process it. Has anyone considered allowing different input sample rates (ie: the standard 16, 22.05, 24, 32, 48) as well as 44.1 ? Which version are you using? LAME can take any samplerate for input. (if you are feeding lame raw pcm data, it will assume the sample rate is 44.1, unless you add -s 11) Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: Re[2]: [MP3 ENCODER] Voice encoding questions
Frank Klemm schrieb am Sam, 05 Aug 2000: We should support an option (-ma for Mode Auto) which switches between -a -mm for highly correlated channels (r 0.98 = mono), -mj for a normal correlated signals (r = -1.00...-0.20, 0.20...0.91 = stereo) and -ms for nearly not correlated signals (dual channel audio with independent audio, i.e. movies with english/german audio track , r=-0.20...+0.20). The joint stereo coding (-mj) in LAME switches automatically between Stereo and Mid-Side Stereo. Uncorrelated signals will be LR Stereo coded and correlated parts of your waves in MS stereo. Given L=left channel and R=right channel: M = (L+R)/SQRT2 S = (L-R)/SQRT2 note: to get your left right channels back: L = (M+S)/SQRT2 R = (M-S)/SQRT2 As you can see, if your input signal is mono (L=R), only the mid channel carries information, the side channel is empty. The difference to a true mono coding in this situation is, that we now need some bits for our empty side channel which we could use in mono mode for the mid channel too. My observations on old mono like sounds are, that it is a bad idea to let LAME make the side channel really empty. If this happens, it is likely to get an audible glitch. There are a lot of MP3s out there with mono recordings coded with -mj and also -ms. -- Mit freundlichen Grüßen Frank Klemm PS: What's the difference between '-mm' and '-mm -a' ? eMail | [EMAIL PROTECTED] home: [EMAIL PROTECTED] phone | +49 (3641) 64-2721home: +49 (3641) 390545 sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany Ciao Robert -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
F We should support an option (-ma for Mode Auto) which switches F between -a -mm for highly correlated channels (r 0.98 = F mono), -mj for a normal correlated signals (r = -1.00...-0.20, F 0.20...0.91 = stereo) and -ms for nearly not I am afraid most of decoders can't treat an mp3 file correctly whose mode(stereo - mono) is changing during one file. Switching between any stereo modes (stereo, m/s, is, ms and is) is allowed, but switching between stereo, mono and dual is forbidden by the standard. Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: Re[2]: [MP3 ENCODER] Voice encoding questions
:: Frank Klemm schrieb am Sam, 05 Aug 2000: :: We should support an option (-ma for Mode Auto) which switches between -a -mm :: for highly correlated channels (r 0.98 = mono), -mj for a normal :: correlated signals (r = -1.00...-0.20, 0.20...0.91 = stereo) and -ms for nearly :not :: correlated signals (dual channel audio with independent audio, i.e. movies :: with english/german audio track , r=-0.20...+0.20). :: :: The joint stereo coding (-mj) in LAME switches automatically between Stereo and :: Mid-Side Stereo. Uncorrelated signals will be LR Stereo coded and correlated :: parts of your waves in MS stereo. Given L=left channel and R=right channel: :: M = (L+R)/SQRT2 :: S = (L-R)/SQRT2 :: :: note: to get your left right channels back: :: L = (M+S)/SQRT2 :: R = (M-S)/SQRT2 :: That's clear. :: As you can see, if your input signal is mono (L=R), only the mid channel :: carries information, the side channel is empty. :: Take some mono recordings and prove this. It's very seldom that L=R. See also alt.binaries.sounds.mp3.* . FM radio mono recordings (tuner set to mono): * differences between the channels from the MPX decoder - AD converter FM radio mono recordings (tuner set to stereo, p.e. News): * differences between the channels from the MPX decoder - AD converter * additional noise, distortion, whistle, ... in the X signal Mono CDs: * Both channels are converted by different AD converters with different parameters (offset, amplification). Records: * a lot of noise and rumble :: The difference to a true mono coding in this situation is, that we now :: need some bits for our empty side channel which we could use in mono :: mode for the mid channel too. :: I've never seen a True Mono Coding. It is mono (historic reasons), it sounds like mono, statistic says it's mono, but L != R. Example: CD: Jazz - Lyrik - Prosa No: (Amiga 74321326192) Title:My Bonnie is over the Ocean Interpreter: Jazz-Optimisten Berlin Length: 11239032 samples [19114 CD frames, 4:14.64] Correlation: r = 0,99879 Coder: Lame: 3.86 alpha Options: -V0 -d -q1 --cwlimit 11.5 -X6 Results: -mm 3642584 bytes 114.3 kbps -mm -a 3642584 bytes 114.3 kbps (bitwise identically with -mm) -mj 5637588 bytes 177.0 kbps (+55%) -ms 7223049 bytes 226.7 kbps (+98%) :: :: My observations on old mono like sounds are, that it is a bad idea to :: let LAME make the side channel really empty. If this happens, it is likely :: to get an audible glitch. :: -mm Use Mono -mi Use Intensity Stereo, MS-Stereo and LR-Stereo -mj Use MS-Stereo and LR-Stereo -ms Use LR-Stereo -ma Analyze FIle before any converting, select -mm, -mj or -ms Another question: Is there any tool to analyze the number of SI, MS and LR frames in a MP3? -- Mit freundlichen Grüßen Frank Klemm eMail | [EMAIL PROTECTED] home: [EMAIL PROTECTED] phone | +49 (3641) 64-2721home: +49 (3641) 390545 sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: Re[2]: [MP3 ENCODER] Voice encoding questions
Frank Klemm schrieb am Sam, 05 Aug 2000: snip Another question: Is there any tool to analyze the number of SI, MS and LR frames in a MP3? I don't know if there is any, but you can write a simple tool scanning all mp3 headers and counting the type of each frame. Or you can look out for a tool called mp3check and let it make a dump, all you would have to do is count the different stereo mode extensions. -- Mit freundlichen Grüßen Frank Klemm eMail | [EMAIL PROTECTED] home: [EMAIL PROTECTED] phone | +49 (3641) 64-2721home: +49 (3641) 390545 sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany Ciao Robert -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] --voice Modus
Frank Klemm schrieb am Die, 01 Aug 2000: For = 56 kbit/s Voice-Mode of Lame always sounds better than the normal mode. I've tested several kind of music and also spoken words. So there are 2 questions: * Does voice mode become standard for low bit rates? * What makes voice mode (it's not IS, it also works with mono)? * is IS for lame planned? -- Frank Klemm LAME's voice modes turn off short blocks and apply a lowpass filter around 12 kHz. In its first draft some lower MDCT coefficients were dropped (something like a highpass filtering, we had no filter code at that time), but not nowadays. I can't tell you if someone is working on an intensity stereo mode for LAME, but I don't think so. Maybe for lower constant bitrate modes we have to rework the bit allocation scheme. Ciao Robert -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
Another question: Is there any tool to analyze the number of SI, MS and LR frames in a MP3? Frank, you just need a GTK enabled version of lame :-) run lame -g on the mp3 file, scroll to the end, and then click 'show' under the 'stats' pull down menu. It shows the info you want, and any additional statistics would be easy to add. You can also use to to examine the mid/side bit allocation frame by frame. You could test your ideas about near mono files via the following: Modify reduce_side() function in quantize-pvt.c to be more aggressive. Right now it allocates at most a 33/66 split between side channel and mid channel, based on the side_channel_energy/total_energy ratio. As Robert mentioned, a more aggressive split can create artifacts. I think the problem is that allocating just a few bits to the side channel can produce audible glitches which will sound worse than if 0 bits were used. But no one has done a detailed study of this. -mm Use Mono -mi Use Intensity Stereo, MS-Stereo and LR-Stereo -mj Use MS-Stereo and LR-Stereo -ms Use LR-Stereo -ma Analyze FIle before any converting, select -mm, -mj or -ms I think -ma would be beyond the scope of LAME. A seperate analysis program should be written, and then a GUI front end should run the analysis and make the selection. This is similar to automatic level adjustment. A couple people have expressed interest in adding a volume adjustment to LAME, which is a fine, but the additional step of runing some analysis on the file to determine the adjustment should be left to a seperate program. Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
[MP3 ENCODER] Voice encoding with low bit rates
I have briefly tried the "--voice" mode and the "normal" mode when encoding a purely voice signal (with background noise) at 8kbps, and have been very impressed with the difference. I would like to compress the signal more... but 8 is as low as it goes. The "nomal" mode renders the voice absolutely unintelligible (I assume the encoder tries too hard to preserve the background). The "--voice" mode actually seems to reduce the background garbage (noise) where there is no speech, and to also concentrate on the speech when it is present. I have looked at the spectrogram for each, and there is a BIG difference. My question (after all this guff) is "does LAME perform any smarts (like looking for particular frequency domain patterns), and if so, what?" I have read most of the past articles on "--voice" but they don't tell me all I wish to know. I am also starting with a 11K/samples per sec file (mono) and having to up-sample it to 44.1K before I can process it. Has anyone considered allowing different input sample rates (ie: the standard 16, 22.05, 24, 32, 48) as well as 44.1 ? -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding with low bit rates
I have briefly tried the "--voice" mode and the "normal" mode when encoding a purely voice signal (with background noise) at 8kbps, and have been very impressed with the difference. I would like to compress the signal more... but 8 is as low as it goes. The "nomal" mode renders the voice absolutely unintelligible (I assume the encoder tries too hard to preserve the background). The "--voice" mode actually seems to reduce the background garbage (noise) where there is no speech, and to also concentrate on the speech when it is present. I have looked at the spectrogram for each, and there is a BIG difference. My question (after all this guff) is "does LAME perform any smarts (like looking for particular frequency domain patterns), and if so, what?" I have read most of the past articles on "--voice" but they don't tell me all I wish to know. I am also starting with a 11K/samples per sec file (mono) and having to up-sample it to 44.1K before I can process it. Has anyone considered allowing different input sample rates (ie: the standard 16, 22.05, 24, 32, 48) as well as 44.1 ? I first wrote the voice mode, mainly by using some supposition about the signal and a lot of listening tests. You can read what was done at the beginning by this option here: http://www.multimania.com/bouvigne/lame/voice.html Because of a lack of time, and the lack of good filtereing solution at this time in Lame, I only tuned it for 44.1kHz files. But now, Robert introduced the presets in Lame, including --preset voice, and Lame got some good filters. So I think that "--preset voice" is now doing the same thing as --voice, and can be used for any sampling rate, but I'd like Robert confirmation to know if the behaviour is the same as --voice. If it's doing the same, I'd suggest you to stop using --voice and start using --preset voice instead. I personnaly think that now the --voice switch should be removed, as there is --preset voice. Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding with low bit rates
I have briefly tried the "--voice" mode and the "normal" mode when encoding a purely voice signal (with background noise) at 8kbps, and have been very impressed with the difference. I would like to compress the signal more... but 8 is as low as it goes. The "nomal" mode renders the voice absolutely unintelligible (I assume the encoder tries too hard to preserve the background). The "--voice" mode actually seems to reduce the background garbage (noise) where there is no speech, and to also concentrate on the speech when it is present. the voice mode applies a lowpass filter at 12 kHz I have looked at the spectrogram for each, and there is a BIG difference. and does not use short blocks My question (after all this guff) is "does LAME perform any smarts (like looking for particular frequency domain patterns), and if so, what?" no, LAME does nothing special for voice signals automagically I have read most of the past articles on "--voice" but they don't tell me all I wish to know. I am also starting with a 11K/samples per sec file (mono) and having to up-sample it to 44.1K before I can process it. Has anyone considered allowing different input sample rates (ie: the standard 16, 22.05, 24, 32, 48) as well as 44.1 ? you don't need to up-sample your waves. valid output samplerates are 8, 11.025, 12, 16, 22.05, 24, 32, 44.1, 48 kHz I first wrote the voice mode, mainly by using some supposition about the signal and a lot of listening tests. You can read what was done at the beginning by this option here: http://www.multimania.com/bouvigne/lame/voice.html Because of a lack of time, and the lack of good filtereing solution at this time in Lame, I only tuned it for 44.1kHz files. One point has changed since we have *good filters* in LAME, the highpass filtering had to be dropped for the voice mode, because the filters are too rough. But now, Robert introduced the presets in Lame, including --preset voice, and Lame got some good filters. So I think that "--preset voice" is now doing the same thing as --voice, and can be used for any sampling rate, but I'd like Robert confirmation to know if the behaviour is the same as --voice. No, the voice preset and the voice option are not 100% identical. By the way, the --voice option is a shortcut for "--lowpass 12 --noshort". I would suggest looking at "lame --preset help" and experimenting with --preset phone / --preset sw etc. If it's doing the same, I'd suggest you to stop using --voice and start using --preset voice instead. I personnaly think that now the --voice switch should be removed, as there is --preset voice. I think we should keep it :-) Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org Ciao Robert -- Sent through GMX FreeMail - http://www.gmx.net -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
[MP3 ENCODER] Voice encoding questions
Howdy All, In testing my (comparatively naive) hack of the dist10 encoder, I have discovered that, while it does OK for music, it has real problems with speech signals. (Caveat: at our lowest overall bitrate of 300kbps for combined video/audio, we run the audio at 32kbit mono - though we go way up to 64kbps mono for higher overall bitrate signals, and are aiming to default at 64kbps stereo [not joint].) In particular, the broadband noise bursts associated with fricatives really wreak havoc. My test signal here is spfe49_1 from the AAC SQAM test suite, which is a female English speaker going on about giving pills to animals. I ran it through 1) my encoder, 2) LAME (3.85 w/ frame analyzer), 3) mp3enc31, and 4) our current Layer-II encoder. 1) With my encoder (64kbps stereo CRC), every fricative is almost painful to listen to, as the pink noise bursts end up being narrow band filtered (due to lack of bits - only the MDCT coeffs closest to the pole are making it into the bitstream), and there are occasional weird high frequency blips and arpeggiation which are very annoying. 2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't enabled LSF yet) sounds pretty good. There are occasional minor glitches, but that's to be expected at this bitrate. However, LAME (as above plus -k to turn off the filters) sounds pretty similar to what I'm getting. I note that without the forced resampling, LAME will attempt to downsample to 22050. 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. (Man, is it slow, though.) Again, without the forced MPEG-1 sampling rate, the mp3enc31 will attempt to use 22050. 4) Layer-II (64 kbps stereo CRC) sounds good. So my question(s) are: Is the solution to my problem to filter/downsample (and use joint, when I get around to coding it up)? That seems to be what is making the difference in the case of LAME; I assume that FhG is using some filtering as well, though there's no way to disable it and see for sure. Are there really just not enough bits for this type of signal at this bitrate? Why does Layer-II do so much better a job with this type of signal? Do other codecs (AAC/MPEG-4) hand this kind of signal better as well? And what is the capital of Assyria? Inquiring minds wanna know, Alex -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
- Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, August 04, 2000 4:14 PM Subject: [MP3 ENCODER] Voice encoding questions Howdy All, In testing my (comparatively naive) hack of the dist10 encoder, I have discovered that, while it does OK for music, it has real problems with speech signals. (Caveat: at our lowest overall bitrate of 300kbps for combined video/audio, we run the audio at 32kbit mono - though we go way up to 64kbps mono for higher overall bitrate signals, and are aiming to default at 64kbps stereo [not joint].) In particular, the broadband noise bursts associated with fricatives really wreak havoc. My test signal here is spfe49_1 from the AAC SQAM test suite, which is a female English speaker going on about giving pills to animals. I ran it through 1) my encoder, 2) LAME (3.85 w/ frame analyzer), 3) mp3enc31, and 4) our current Layer-II encoder. 1) With my encoder (64kbps stereo CRC), every fricative is almost painful to listen to, as the pink noise bursts end up being narrow band filtered (due to lack of bits - only the MDCT coeffs closest to the pole are making it into the bitstream), and there are occasional weird high frequency blips and arpeggiation which are very annoying. 2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't enabled LSF yet) sounds pretty good. There are occasional minor glitches, but that's to be expected at this bitrate. However, LAME (as above plus -k to turn off the filters) sounds pretty similar to what I'm getting. I note that without the forced resampling, LAME will attempt to downsample to 22050. If you want to encode voice signals, I'd suggest you to use --voice or --preset voice 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. (Man, is it slow, though.) Again, without the forced MPEG-1 sampling rate, the mp3enc31 will attempt to use 22050. You're disabling intensity stereo, but not joint stereo. With those settings, mp3enc is using m/s stereo. This is an advantage over Lame that you forced to use plain stereo. 4) Layer-II (64 kbps stereo CRC) sounds good. The layer II encoder is probably using joint stereo. In Layer II, joint stereo is quite similar to the intensity stereo of layer III And what is the capital of Assyria? The first assyrian capital was Assur, and it was later replaced by Kalah. -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
So my question(s) are: Is the solution to my problem to filter/downsample (and use joint, when I get around to coding it up)? That seems to be what is making the difference in the case of LAME; I assume that FhG is using some filtering as well, though there's no way to disable it and see for sure. Are there really just not enough bits for this type of signal at this bitrate? Why does Layer-II do so much better a job with this type of signal? Do other codecs (AAC/MPEG-4) hand this kind of signal better as well? I forget something: the sample you're using is very closed to mono, so joint stereo helps a lot. For your problem, there are mainly 2 soulutions: a: downsampling b: using joint stereo. For voice signal, the best joint mode would probably be intensity stereo. But it's not implemented in Lame. You mentionned that you use crc. Are you aware that the ISO crc code is brocken? Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
1) With my encoder (64kbps stereo CRC), every fricative is almost painful to listen to, as the pink noise bursts end up being narrow band filtered (due to lack of bits - only the MDCT coeffs closest to the pole are making it into the bitstream), and there are occasional weird high frequency blips and arpeggiation which are very annoying. 2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't enabled LSF yet) sounds pretty good. There are occasional minor glitches, but that's to be expected at this bitrate. However, LAME (as above plus -k to turn off the filters) sounds pretty similar to what I'm getting. I note that without the forced resampling, LAME will attempt to downsample to 22050. 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. (Man, is it slow, though.) Again, without the forced MPEG-1 sampling rate, the mp3enc31 will attempt to use 22050. The main difference between FhG and LAME is probably the lowpass filters. Try different values of --lowpass. The compression ratio you are using (about 22x) is not commonly used, and the LAME's default guess at a lowpass setting wont be very good. Why do you disable the 22050 downsampling? This is done based on the idea that encoding at 22khz is better than encoding at 44khz and removing have the specturm with filters. FhG is probably using joint stereo? This will increase the bandwidth by 10-20%. The main difference between LAME and ISO is that the ISO code has serious flaws in several major components. jstereo, filtering and other advanced features help, but you gotta fix the bugs first! some filtering as well, though there's no way to disable it and see for sure. Are there really just not enough bits for this type of signal at this bitrate? Why does Layer-II do so much better a job with this type of signal? Do other codecs (AAC/MPEG-4) hand this kind of signal better as You rate FhG as 'very good', and Layer II as 'good'. So I'm assuming layer III beats layer II. The thing layer III adds to layer II is: 1) MDCT transform (lossless to roundoff), 2) entropy coding (lossless), 3) bitreservoir (prevents wasting of bits) and 4) the ability to do more advanced noise shaping. #1,2 and 3 can only improve the quality. The only way I can see layer II out-perform layer III is if #4 is not tuned properly for the desired compression. well? And what is the capital of Assyria? during which century? Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
RE: [MP3 ENCODER] Voice encoding questions
Howdy All, Thanks for the quick replies! Gabriel Bouvigne wrote: If you want to encode voice signals, I'd suggest you to use --voice or --preset voice Actually, I want to encode general signals (mostly TV and movies), many of which have significant voice components, and, unfortunately, many of which do not. My coded is doing OK on music, and sucking at voice, so what I'm really trying to do is figure out why _voice_ signals are a problem for _general purpose_ encoders. Otherwise I would just bandpass 300-3000 Hz. 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. (Man, is it slow, though.) Again, without the forced MPEG-1 sampling rate, the mp3enc31 will attempt to use 22050. You're disabling intensity stereo, but not joint stereo. With those settings, mp3enc is using m/s stereo. This is an advantage over Lame that you forced to use plain stereo. Yeah, I noticed that. As I'm sure you have already discovered, there is no way to disable M/S in mp3enc, so the comparison is bad. I forget something: the sample you're using is very closed to mono, so joint stereo helps a lot. A very good point. I would hate to give FhG more credit than they deserve. 4) Layer-II (64 kbps stereo CRC) sounds good. The layer II encoder is probably using joint stereo. In Layer II, joint stereo is quite similar to the intensity stereo of layer III Actually, there is no joint stereo code in our Layer-II encoder, so I'm sure it's not using it. I should probably qualify my rating of 'good' to say that there are no obvious and distracting high frequency artifacts. Of course, the whole thing sounds like AM radio, but, in my experience, that is the difference between Layer-II and Layer-III degradation. Layer-II has an initial series of 'non-linear' (to pervert a term) distortions at a relatively low compression ratio, after which it just starts evenly raising the noise floor ('linear' distortion). Distortions in Layer-III are almost always 'non-linear' (wateriness, blips, missing frequencys, lowpass), though the noise floor stays consistently low. At low bitrates, I find 'linear' distortion infinitely preferable to the 'non-linear', though this is, of course, purely a matter of taste. For your problem, there are mainly 2 soulutions: a: downsampling b: using joint stereo. For voice signal, the best joint mode would probably be intensity stereo. But it's not implemented in Lame. This was my suspicion, I was really just looking for confirmation. Thanks. You mentionned that you use crc. Are you aware that the ISO crc code is brocken? It may well have been broken (though I seem to remember that it was simply not present for Layer-III) - I wouldn't know, since I removed it and wrote my own, which is not. (For realtime multicast, it was a feature we had to have.) Greg Maxwell wrote: The dist10 encoder has a bug in the short block code which makes it stink on fricatives in speech. Does anyone have any more info on this? The frame analyzer doesn't indicate that I'm using short blocks on the fricatives in question - or is that the bug? Mark Taylor wrote: Why do you disable the 22050 downsampling? This is done based on the idea that encoding at 22khz is better than encoding at 44khz and removing have the specturm with filters. Because I was trying to compare apples to apples (MPEG-1 to MPEG-1) and my encoder doesn't use LSF yet. FhG is probably using joint stereo? This will increase the bandwidth by 10-20%. Yes, as discussed above, this is definitely cooking the books. The main difference between LAME and ISO is that the ISO code has serious flaws in several major components. jstereo, filtering and other advanced features help, but you gotta fix the bugs first! I like to think that I have fixed at least a few. Now that I've finished a first pass clean, rewrite, overhaul, and verify, I'm taking a closer look at algorithmic (as opposed to purely implementational) problems, starting with the main loop, and probably ending with the #^@% psych model. Of course, if advanced features are going to make a bigger difference, though, they may gain a higher priority. You rate FhG as 'very good', and Layer II as 'good'. So I'm assuming layer III beats layer II. The thing layer III adds to layer II is: 1) MDCT transform (lossless to roundoff), 2) entropy coding (lossless), 3) bitreservoir (prevents wasting of bits) and 4) the ability to do more advanced noise shaping. #1,2 and 3 can only improve the quality. The only way I can see layer II out-perform layer III is if #4 is not tuned properly for the desired compression. Your assumption is correct. And, based on my observations about distortion above, I would concur with your analysis; the noise shaping seems to be breaking down pretty badly at this (ridiculously high, I am aware) compression ratio. - I'd just like to say that I really appreciate the feedback that this list provides - I don't
Re: [MP3 ENCODER] Voice encoding questions
I like to think that I have fixed at least a few. Now that I've finished a first pass clean, rewrite, overhaul, and verify, I'm taking a closer look at algorithmic (as opposed to purely implementational) problems, starting with the main loop, and probably ending with the #^@% psych model. Of course, if advanced features are going to make a bigger difference, though, they may gain a higher priority. I'd suggest you to look at the archives of this list, and to look at Lame 3.00. It's code was probably a lot easier, and it was mainly bugfixed ISO with addition of joint stereo. Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re[2]: [MP3 ENCODER] Voice encoding questions
Hello alex, abcc I feel guilty using a list mainly devoted to an open source codec (LAME) to abcc further the development of ClearBand's 'proprietary' codec. (Is a standards abcc based codec implementation proprietary? We don't sell the codec - we sell a abcc multicast system, mostly to ISPs and corporations, and the proprietary part abcc is the multicast part. My superiors just didn't want to license FhG's abcc source, I guess...) I don't know if that would help. If you look at http://www.mp3licensing.com , you would see it costs $1M to bring any mp3 encoder on the market. FhG+Thomson have patents on mp3, not only their code. http://www.mp3licensing.com/royalty/swenc.html http://www.mp3licensing.com/royalty/broadcast.html We do not charge royalties for mp3 streaming or mp3 broadcasting (e.g. Internet Radio) until the end of the year 2000. Beyond this date we anticipate to charge a small annual minimum and a percentage of revenue. However, this model is not yet fully developed because we cannot yet oversee where this new market is going. best inform your superiors to rip off Ogg Vorbis, which is not patented ;-) -- Best regards, Roelmailto:[EMAIL PROTECTED] -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: Re[2]: [MP3 ENCODER] Voice encoding questions
We should support an option (-ma for Mode Auto) which switches between -a -mm for highly correlated channels (r 0.98 = mono), -mj for a normal correlated signals (r = -1.00...-0.20, 0.20...0.91 = stereo) and -ms for nearly not correlated signals (dual channel audio with independent audio, i.e. movies with english/german audio track , r=-0.20...+0.20). There are a lot of MP3s out there with mono recordings coded with -mj and also -ms. -- Mit freundlichen Grüßen Frank Klemm PS: What's the difference between '-mm' and '-mm -a' ? eMail | [EMAIL PROTECTED] home: [EMAIL PROTECTED] phone | +49 (3641) 64-2721home: +49 (3641) 390545 sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
[MP3 ENCODER] --voice Modus
For = 56 kbit/s Voice-Mode of Lame always sounds better than the normal mode. I've tested several kind of music and also spoken words. So there are 2 questions: * Does voice mode become standard for low bit rates? * What makes voice mode (it's not IS, it also works with mono)? * is IS for lame planned? -- Frank Klemm -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice mode
Hi all On 15-Oct-99 Greg Maxwell wrote: Are you aware you can use lame for image compression? :) compression.. Check out http://linuxpower.cx/~greg/mp3crap/ for some examples of this kind of perversion.. This is so perverse it's actually neat. :) Anyone tried the reverse? Image compression on sound files? later mike (I vote for a switch to the LGPL) -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice mode
Duh.. I see, M/S is define with addition/subtraction not multiplication. So, is there a downloadably doc that describes intensity mode, are decoders a good source of info? There is a few lines (only a few) about it in the iso docs. If you don't have them yet, they are available on mp3tech.org Unfortunately, it's unlikely that I'll have the time to work on this. As it does not seems to be much difficult to make (at least it's easier than m/s stereo), perhaps Patrick could allow some of his students to work on this. I can't do it during my own student project, as mine must be about image processing or image synthesis. Are you aware you can use lame for image compression? :) First create an greyscale image ((576*x)*n, or (1152*x)*n for best results) save it as ascii ppm. Cut off the header. Use sox to go from 8bit unsigned to a 16 bit mono wav file set at 44100. Encode, reverse.. Looks preety good, perhaps if you do some kind of transform on the input first to make it less 'transiant' (some kind of reversiable convolution) you might get better compression then jpg (it's not too far as is). Kinda gives you respect for how tough audio compression is VS image compression.. Check out http://linuxpower.cx/~greg/mp3crap/ for some examples of this kind of perversion.. (Why did I do this? I was hoping nasty artifacts might be more easily found in a picture rather then listening to a sample) That's strange... I'll have a look at it Regards, Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice mode
On Fri, 15 Oct 1999 [EMAIL PROTECTED] wrote: Hi all On 15-Oct-99 Greg Maxwell wrote: Are you aware you can use lame for image compression? :) compression.. Check out http://linuxpower.cx/~greg/mp3crap/ for some examples of this kind of perversion.. This is so perverse it's actually neat. :) Anyone tried the reverse? Image compression on sound files? I did a long time ago, it doesn't work too well (using jpg at least) mostly because it only works on 8-bit and 2d quant doesn't go well with sound. later mike (I vote for a switch to the LGPL) -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice mode
On Thu, 14 Oct 1999, Gabriel Bouvigne wrote: Panned stereo mode! For each frame you examine a spectrally weighed (ignore low freqs) energy ratio of left/right to pick the stereo panning at two points in time (middle, and right) then interpoate from old_left to middle then right, enforcing a maximum rate of change. Then mix to mono, encode as mid, and the position as side, you only have to encode lower scalfactors because it should consist of low freqs only (because of your slow interpolation). I suppose you could shape your side wav to MDCT well too.. Am I missing something? I'd think that this would allow you to get mono quality at about the same bitrate, but still preserve panning which would be help at differentiating between people speaking. According to me, this can't be done using m/s stereo, because in the case of someone speaking on side, the side channel would be as high as the middle channel. What you're describing here looks lot like the intensity stereo mode, where the signal is encoded as mono on the left channel, and location is encoded on the right one. This would help a lot voice encoding, but also music at low bitrates. To my mind, it's something missing to Lame in order to be able to compete FhG at low bitrates. Duh.. I see, M/S is define with addition/subtraction not multiplication. So, is there a downloadably doc that describes intensity mode, are decoders a good source of info? Unfortunately, it's unlikely that I'll have the time to work on this. As it does not seems to be much difficult to make (at least it's easier than m/s stereo), perhaps Patrick could allow some of his students to work on this. I can't do it during my own student project, as mine must be about image processing or image synthesis. Are you aware you can use lame for image compression? :) First create an greyscale image ((576*x)*n, or (1152*x)*n for best results) save it as ascii ppm. Cut off the header. Use sox to go from 8bit unsigned to a 16 bit mono wav file set at 44100. Encode, reverse.. Looks preety good, perhaps if you do some kind of transform on the input first to make it less 'transiant' (some kind of reversiable convolution) you might get better compression then jpg (it's not too far as is). Kinda gives you respect for how tough audio compression is VS image compression.. Check out http://linuxpower.cx/~greg/mp3crap/ for some examples of this kind of perversion.. (Why did I do this? I was hoping nasty artifacts might be more easily found in a picture rather then listening to a sample) -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice mode
On Wed, 13 Oct 1999, Gabriel Bouvigne wrote: The voice mode is made using 3 tricks: *using only long blocks *limiting bitrate when vbr *using a band-pass filter For a more sophicated hack: Panned stereo mode! For each frame you examine a spectrally weighed (ignore low freqs) energy ratio of left/right to pick the stereo panning at two points in time (middle, and right) then interpoate from old_left to middle then right, enforcing a maximum rate of change. Then mix to mono, encode as mid, and the position as side, you only have to encode lower scalfactors because it should consist of low freqs only (because of your slow interpolation). I suppose you could shape your side wav to MDCT well too.. Am I missing something? I'd think that this would allow you to get mono quality at about the same bitrate, but still preserve panning which would be help at differentiating between people speaking. -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )