Re: [linux-audio-dev] peakfiles and EDL's

Tom Pincince Sun, 25 Feb 2001 14:00:54 -0800

might occur at sample 3786). So how can we possibly decide what 
max/min values to use for the 2nd chunk of 2048 samples in the audio 
stream ? Its presumably based on both files, but can we determine it 
without reading the actual audio data for that part of the audio 
stream ?



And it gets worse: what happens if the inserted material is not 
aligned with the first sample of the second file, but is offset. Now, 
every precomputed max/min pair for this file is essentially useless 
because they are "out of phase" with the way the audio is actually 
being used.

I can see how one's mind could become wrapped up in this. However, accuracy is not the issue with peakfiles. Their level of resolution is so crude relative to the actual audio that their only purpose is to provide a general overview to assist in approximate locating of audio events. At this level of detail, the issues you are looking at become non-issues. Detailed edits must be made at a zoom level that does not use peakfiles. How general can peakfiles be and still be useful? Very general. Consider that a 10 minute file may be represented on screen as being 400 pixels wide at a particular zoom level. Most of your peakfile samples are not even being used. Digidesign uses a method where a particular zoom level is identified as the maximum level that can use peakfiles. Zooming in past this level requires that every displayed waveform be calculated from the soundfile, but only the segment of the waveform that is being displayed gets computed. This is no big deal because the zoom level that requires this results in only a few seconds of audio taking up the whole display. If a region is dragged at this zoom level by an amount greater than the width of the display, all other displayed waveforms are computed on the fly, creating a stuttering kind of scroll. Waveform scrolling during playback is only permitted when the zoom level is low enough to permit the use of peakfiles.

I suggest the simplest method possible for constructing peakfiles and using them in waveform displays. Each soundfile has its own peakfile. Using your example, peakfile sample 0 represents soundfile samples 0 - 2047, peakfile sample 1 represents soundfile samples 2048 - 4095... When using peakfiles to produce waveform displays, any segment of a region that begins on sample 0 - 2047 will be represented using peakfile sample 0 as the first sample in the waveform display. Now your issue is that the region may begin on sample 2000 and the peak is located at sample 1234, so the displayed waveform includes a sample that is not even in the region being used. This simple technique guarantees that 50% of the time, when the beginning of a region is not the beginning of the soundfile, or at the junction of two files that does not fall exactly at a multiple of 2048 samples from t=0, or the end of a region that is not the end of the soundfile, the first and/or last sample in the waveform display will be computed from a peakfile sample that is not actually included in the region being displayed. This error only occurs in the first and last pixel of the waveform display. All others will be correct. If you synchronize the zoom resolution steps with the peakfile sample rate, so that waveform display and peakfile samples stay in phase with each other, then if you zoom out so that one pixel covers more than one peakfile sample, you may use a method that chooses the highest value peakfile sample to be represented by that pixel, and that peakfile sample may not be the first one. In this case the error disappears. Regarding the display of audio at a junction, simply treat them as separate regions with their own separate adjacent waveform displays. Worst case will occur when the zoom level has 1 pixel = 2048 samples (since at greater zoom levels you will have to compute the display from the soundfile instead of the peakfile). In your example pixel 0 displays peakfile 1 sample 0. Pixel 1 displays either peakfile 1 sample 1 or peakfile 2 sample 0, whichever is greater (or peakfile 1 sample 1 automatically to keep things simple). Pixel 2 displays peakfile 2 sample 0 (because this segment begins with a sample between 0 and 2047). Pixel 3 displays peakfile 2 sample 1... Regarding "out of phase", the shift will result in a waveform that is positioned incorrectly by a maximum of one pixel to the left or right, and this is simply not a big deal for a generalized waveform display that only shows the envelope of the sound and not the actual wave. None of these errors are audible, since they only affect the display, and they disappear as soon as you zoom in enough to cause a precise recalculation of the display based on the soundfile. All of my final edits are done at this level of zoom, so I am completely unconcerned with the accuracy of the wave overview display as long as it is good enough to get me within 5 seconds of the edit that I want to make. If form follows function, I can't see why you would try to design a waveform display system that offers higher resolution than this from peakfile information.

>instances. Do you think that resampling the peak would hurt? It is hard 


Well, it depends. The model at this point is that each raw audio file 
has one corresponding peakfile. Since each raw audio file can be 
used many times, with different potential "peak phase" choices, that 
would mean generating (potentially) N-1 different peakfiles (where N 
is the number of samples per peak). This seems like a bad idea.

Stick with one peakfile per soundfile and live with the phase error, unless I am completely missing something and the phase error actually does more than shift the display by a maximum of one pixel (in which case, please tell me what I am missing).

However, in thinking about the "objects" (Samplitude) / "regions" 
(ProTools) model, I can see how they get this too work: you compute 
the peak data for each object/region, and it *never* changes because 
the object/region is atomic (you can't subdivide it without creating a 
new object/region). Hmm. There are other reasons for moving toward 
this model, but this might be the killer.

I don't know about pt, but in session this is not true. The peakfile is header info in the sdII file format. I have a very slow computer, so I always turn off the "calculate waveform overview" feature. Regions are then displayed as empty boxes, unless I zoom in to the point where the actual waveform is calculated. If for some reason I decide that I want overview info for a particular file, I select it and choose "calculate selected waveform overview". If I start session A and record soundfiles 1 and 2, then calculate waveform overview for only soundfile 1, then close session A and start a new session B and import soundfiles 1 and 2 into session B, when I drag regions 1 and 2 into tracks of session B, region 1 has waveform info and region 2 does not. This means that session writes the waveform overview data as header data on the soundfile to be read by any compatible app, and session uses only this data when computing waveform displays. They do not compute new peak data for new regions. My slow computer confirms this. If I record a 10 minute soundfile, the peakfile takes about 1 minute to compute. If I create a new region by trimming the first and last 30 seconds from the complete 10 minute region, the new region's waveform display is available immediately. It is also possible to change a region without generating a new one if it is the only instance of a region that has been created by modifying a pre-existing region.

Tom

Re: [linux-audio-dev] peakfiles and EDL's

Reply via email to