Hi Dani, Thanks for your response, very informative! I guess it's probably reasonable not to base a block implementation on an assumption of “it works because I looked at the source code”. And you’re right, it’s true that exotic buffer implementations are free to hand off output buffers how ever they want to (which I think need very specific reasons to do so, but still not contract-breaking).
My block is actually not (just) an FFT, it’s a bit more complicated, I just mentioned that for the sake of giving a simple example use case, but your insight is valuable. Thanks again, Wael > On Jun 4, 2026, at 11:43 PM, Daniel Estévez <[email protected]> wrote: > > Hi Wael, > > This is a great question. My understanding is basically the same as the > conclusion you've arrived to. In practice this works correctly for GR3 > regular CPU buffers, because the buffer is a ring buffer and the write > pointer is not advanced if you return zero as the number of items that your > general_work() call has produced. So the next general_work() call sees the > same data that the previous call wrote in the output buffer. If I remember > correctly, this also works in the same way for GR4 regular CPU buffers. > > However, I think this is going into undocumented assumptions of how the > buffer system should work. I don't think this is solidly written down > anywhere, but for me the contract that general_work() has with the output > buffer is that it should never read data from the output buffer, and it needs > to write exactly as many items at the beginning of the buffer as the number > of items that are being produced. Anything else might break under an exotic > customs buffer implementation. For instance, I could imagine an > implementation which hands off an output buffer taken from a common buffer > pool, with no guarantee that the buffer is the same in consecutive calls even > if no data was produced. > > Since your use case is accumulating many FFT frames, I would say that storing > the accumulator as a member in the block class and copying the result to the > output buffer whenever the accumulation has finished is quite acceptable. The > output data rate is going to be quite low, so the memory copies add very > little overhead. > > Another comment is that a very long integration can be realized by cascading > multiple shorter integrations, which can be realized with the in-tree > Integrate block. For instance you could integrate 1e9 FFTs by cascading three > Integrate blocks, each set to a integration of 1e3. When using floating point > numbers this approach is also better numerically, because otherwise you are > sequentially adding numbers to an accumulator that ends up being > approximately 1e9 times larger than the input numbers, which doesn't really > work because the float32 machine epsilon is 1e-7. > > Best, > Dani. > > On 05/06/2026 03:27, Wael Farah wrote: >> Hi all, >> I have a block whose output items are running averages over a long >> integration (for the sake of simplicity say a power spectrum accumulated >> over millions of FFT frames, way too many to hold as one input buffer). >> The implementation that fell out naturally is: in general_work(), add the >> next batch of partial contributions directly into output_items[0] >> [n_emitted], return 0 while the integration is still incomplete, and only >> return n_emitted > 0 when one or more integrations are done. It's pretty >> tempting to do as this would practically avoid an extra memory allocation >> for an internal buffer and a memcpy back to output_items when accumulation >> is ready. >> However, this relies on the assumption that returning 0 leaves the write >> pointer unadvanced, so the next general_work() call sees the same memory at >> output_items[0][n_emitted] and I can keep adding into it. >> As far as I can tell, this works on current GR (no-op when produce_each(0) >> <https://github.com/gnuradio/gnuradio/blob/main/ >> gnuradio-runtime/lib/block_detail.cc#L123-L132> -> write pointer not >> updated), but it’s more like an implementation detail rather than it being >> documented as part of the scheduler API. >> Two questions: >> 1) Is the "accumulate into output_items[0] across calls" pattern supported, >> or am I in undocumented/unidentified scheduler/buffer behavior territory? >> 2) If it's not supported, is there any reason beyond "no API guarantee”; >> e.g. would it break under certain custom buffers or futuristically GR4? >> If the answer is just "use an internal buffer," happy to refactor. >> Thanks! >> Wael >
