[music-dsp] 20k

Sampo Syreeni Mon, 31 Aug 2015 20:02:05 -0700

On 2015-08-31, [email protected] wrote:

Real-time auralization of dynamic sound sources in VirtualEnvironments would be one application. Myself coming more from thegraphics/interaction side, our fellow acoustician colleagues do thisin order to couple arbitrarily moving sound sources with a potentiallydynamic room acoustics simulation.

That is a good point. But then, in the past many spatial audio folkshave also optimized such computations by relying on head relatedtransfer functions, which are precomputed to the Fourier domain, andthen even relied on linear interpolation between them. That's beenpossible since e.g. the Kemar dolly sets of HRTFs are pretty dense,bringing the interpolation error down considerably.

As such, I'm not too sure using brute force FIR is really necessaryhere. But since I haven't gone through the papers, yet, do correct me ifI'm wrong. Why precisely did they choose straight time domain FIR,instead of Fourier mediated convolution?

If I have a position-tracked audience with headphones being able tomove during a musical performance, or e.g. in a theatre performancethe moving actors as virtual sound sources, and want to place them inarbitrary virtual acoustic settings, the techniques cited above wouldprobably apply as well.

Now headtracking, that's interesting to me. Besides this list, I'm along time participant on the sursound list, and as such a bit of anambisonic freak.

One of my best, novel, ideas there was how to do zero delay directionalhead tracking. Of course for considerable computational cost, but Ithink it's well possible on modern multicore hardware, thanks toambisonic's inherently parallel arithmetic. Nobody's implemented thatidea as of now, but...

If you want to try it out, conceptualize the binaural ambisonicframework a bit differently from how you normally do it. Typically you'dhave a number of spherical harmonics pulsing around which you sample ina conceptual array, and then you'd project them down onto two staticheadphones. Instead of doing that, go with the original formulation:

What you have is a number of rotationally symmetrical fields which addonto a whole soundfield over a sphere. Now sample all of the thing atthe same time with a Kemar set, and you're led to a simple binauralrendering of the field. Now with time structure; the convolution we'vebeen talking about.

But then that problem is very much over-determined when you rotate yourhead around. Because of the rotational symmetry of any ambisonic systemof order, no matter how many directional samples you have of the set,all rotations of the sample set will at most give you the same numberoff independent degrees of freedom. So, in fact, you can reduce thewhole Kemar set to whatever degree of ambisonic representation you wantby just integrating the response over the sphere of possible rotations.

After that, you can do the funniest thing: because of the linearity ofthe spherical Fourier integral of the ambisonic system, and because ofthe linearity of sampling it via the HRTF set (Kemar in this case), it'slegitimate to exchange the order of the two operations. Even if the HRTFset has time structure, because it's going to be separable fromdirection.

If you do that, you no longer 1) rotate a sound source into position,2) convolve with two or more HRTF's, and 3) reduce into binaural sound.Instead you 1) render onto ambisonics of given order at given angle ofarrival, 2) you apply an invariant many-by-many convolution whichtransmits the sound from your source to the sphere where the listener'sears can lie, and 3) then you just sample that sphere at two points.

Sure, it's a heavier calculation. But within the ambisonic frameworkit's guaranteed to be perfect, with constant computational load, andwhen you calculated it in that order, it's absolutely zero delay. AbsentDoppler products of your head turning really fast -- which too can bemimicked at low extra cost -- this sort of thing ought to be a gamer's*dream*. 8)

Wether the same degree of realism is really required for a musicalperformance is probably debatable, but if I want to accuratelyreproduce the room-acoustic properties of dynamic scenes, folding(probably very many) sound sources with very long dynamicallygenerated impulse responses is definitely something which you would doin the context of musical DSP.

Yet, can you really hear that difference? This goes rapidly into thepsychoacoustical territory, I know. Obviously if you want to doeverything perfectly, you have to utilize a bunch of nasty, expensivemethods to do so. At worst the kinds which call for supercomputers,weeks and petabytes, running a high end solver for the wave equation.(In my favourite techno, the *nonlinear* wave equation as well, becauseof what happens with high level bass.)

I'm not too sure that is the relevant marginal we should be thinkingabout in musical DSP, though. Isn't the definition of music something weacutely here, and find pleasing? If so, maybe we should actually speakmore about how we hear and feel, in respect to our algorithms? And notso much about how to make an acoustical simulation just right? ;)

(And yes, sorry again, I have a tendency to get carried of a bit. Noharm, no foul, right...)

--
Sampo Syreeni, aka decoy - [email protected], http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
dupswapdrop: music-dsp mailing list
[email protected]
https://lists.columbia.edu/mailman/listinfo/music-dsp

[music-dsp] 20k

Reply via email to