I don't think it's going to work. AudioDevice operates at a lower level than AudioUnit. In principle, there's no way for the AudioDevice to access the filtered audio stream of the voice processing AudioUnit. Why VAD is part of the low-level AudioDevice API is a good question, that only Apple can answer.
You'll have to use your own VAD algorithm and feed it the audio input from the voice processing unit. Regards, Tamás Zahola > On 15 Oct 2024, at 18:08, π via Coreaudio-api <coreaudio-api@lists.apple.com> > wrote: > > Dear Audio Engineers, > > I'm writing an app to interact with OpenAI's 'realtime' API (bidirectional > realtime audio over websocket with AI serverside). > > To do this, I need to be careful that the AI-speak doesn't make its way out > of the speakers, back in thru the mic, and back to their server (else it > starts to talk to itself, and gets very confused). > > So I need AEC, which I've actually got working, using > kAudioUnitSubType_VoiceProcessingIO and > AudioUnitSetProperty(kAUVoiceIOProperty_BypassVoiceProcessing, setting to > False). > > Now I also wish to detect when the speaker (me) is speaking or not speaking, > which I've also managed to do via > kAudioDevicePropertyVoiceActivityDetectionEnable. > > But getting them to play together is another matter, and I'm struggling hard > here. > > I've rigged up a simple test > (https://gist.github.com/p-i-/d262e492073d20338e8fcf9273a355b4), where a > 440Hz sinewave is generated in the render-callback, and mic-input is recorded > to file in the input-callback. > > So the AEC works delightfully, subtracting the sinewave and recording my > voice. > And if I turn the sine-wave amplitude down to 0, the VAD correctly triggers > the speech-started and speech-stopped events. > > But if I turn up the sine-wave, it messes up the VAD. > > Presumably the VAD is working over the pre-EchoCancelled audio, which is most > undesirable. > > How can I progress here? > > My thought was to create an audio pipeline, using AUGraph, but my efforts > have thus far been unsuccessful, and I lack confidence that I'm even pushing > in the right direction. > > My thought was to have an IO unit that interfaces with the hardware > (mic/spkr), which plugs into an AEC unit, which plugs into a VAD unit. > > But I can't see how to set this up. > > On iOS there's a RemoteIO unit to deal with the hardware, but I can't see any > such unit on macOS. It seems the VoiceProcessing unit wants to do that itself. > > And then I wonder: Could I make a second VoiceProcessing unit, and have > vp1_aec split send its bus[1(mic)].outputScope to vp2_vad.bus[1].inputScope? > > Can I do this kind of work by routing audio, or do I need to get my hands > dirty with input/render callbacks? > > It feels like I'm going hard against the grain if I am faffing with these > callbacks. > > If there's anyone out there that would care to offer me some guidance here, I > am most grateful! > > π > > PS Is it not a serious problem that VAD can't operate on post-AEC input? > _______________________________________________ > Do not post admin requests to the list. They will be ignored. > Coreaudio-api mailing list (Coreaudio-api@lists.apple.com) > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/coreaudio-api/tzahola%40gmail.com > > This email sent to tzah...@gmail.com
_______________________________________________ Do not post admin requests to the list. They will be ignored. Coreaudio-api mailing list (Coreaudio-api@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/coreaudio-api/archive%40mail-archive.com This email sent to arch...@mail-archive.com