Ah, the fact that Deep samples within a given pixel can change the input source definitely complicates any row-wise optimization. With that in mind, I would agree with Steve that a Tile makes the most sense, as long as asking for a Tile over the same image area from multiple worker threads results in buffer sharing between them (which is my somewhat-limited understanding).
-Nathan From: Steve Booth Sent: Monday, November 04, 2013 9:57 PM To: 'Nuke plug-in development discussion' Subject: RE: [Nuke-dev] Best way to keep Rows from multiple inputs around? Ivan, I just saw this. Let me make another suggestion. Issuing millions of interest/tile requests is going to be really inefficient, as you point out. I would do one giant Tile request and get the entire frame for each channel you need (if there is a high probability you will need them all) into a set of memory-resident buffers, then simply pick what you need, when you need it, directly from memory, without ever doing another interest call. Also... this seems like a perfect application for GPU/CUDA processing. Steve From: [email protected] [mailto:[email protected]] On Behalf Of Ivan Busquets Sent: Monday, November 04, 2013 9:11 PM To: Nuke plug-in development discussion Subject: Re: [Nuke-dev] Best way to keep Rows from multiple inputs around? Hey, Thanks Nathan. That certainly gives me some ideas to think about. In hindsight, though, I see I have presented an overly-simplified case as an example. To expand a little bit more, there is one additional loop per pixel (think of a DeepPixel, for example, and looping through all samples in that DeepPixel) So, the flow of loops would look more like this: for each y: for each x: for each sample: inputNumber = figureOutInput(); sampleout = input(inputNumber)->at(x,y); That's why, considering how deeply buried those at() calls are, I was thinking that it might be more optimal to cull full Rows for each input. Some of the other optimizations you suggest are great ideas, but in this case I don't think I'll get much of a benefit from a) knowing whichs inputs are really needed beforehand; and b) optimizing the X and R bounds of each row. In this case scenario, I would say ALL inputs will be used most of the time, and almost ALL pixels from each input will be needed as well. Thanks for pointing me towards Interests, though. Looking a bit closer, I see there's also a InterestRatchet class which might be exactly what I need. From the class declaration in Interest.h: /** InterestRatchet ** If you create one of these, and pass it to 'Interest' then it will remember ** which Iops it has previously called addInterest on, and not do re-do these, ** thus saving time as addInterest involves contention ** ** Interests are removed again when the InterestRatchet is destroyed. **/ Thanks again, Ivan On Mon, Nov 4, 2013 at 7:24 PM, Nathan Rusch <[email protected]> wrote: Hey Ivan, > How would I go about keeping multiple Rows around, specially when the number > of inputs is not predefined? I'm certainly not the most qualified person to answer all of your questions (especially about relative performance), but I'm going to take a crack at this one anyway for fun. To keep things really simple, you could just loop through and create threaded Interests on all input rows, store them in a vector, and then as you loop through your mask row, use Interest::at() to get data from the one you need based on the required input index. The header documentation seems a little undecided on whether this would actually be faster than Iop::at(), though. A slightly more involved approach would be to pre-determine which inputs you actually need data from. Loop through your "mask" row first, determine which input each pixel indicates, and if you haven't already added a row for that input, add one. Then go through your array of rows in a second loop and pass each one to a call to get() on its corresponding input. For some reason this still feels like a somewhat primitive way of doing things, but it would at least prevent you from pulling in data from inputs that you don't need. You could also use threaded Interests instead of Rows, which may be more efficient for concurrently fetching data from multiple inputs in preparation for generating your output (I'll leave that to someone else to answer). Finally, another variation of this that seems like it could be more efficient would be to loop through your "mask" row as before, but instead of creating Rows or Interests outright, just store X and R bounding coordinates for each input you're going to need first. In other words, start with some sort of signal value for each input saying "I don't need this", but as you encounter mask pixels that say otherwise, remember the first X position for each input, and then keep a running max R for each as you encounter subsequent required positions. Then, in a second loop, you can create your Rows or Interests with better bounds to save memory/calculation time. I have the same hunch you do about fetching whole rows being more efficient than piece-wise calls to Iop::at(), but if you can predetermine the row positions of the pixels you need from a given input, you could potentially make some sort of decision about how to get them, and in some cases, it may actually be faster to use Iop::at() than to ask for a whole row (consider a case where you only need 2 pixels from input N: the first and last in the row). Anyway, sorry for the brain dump. Looking forward to what others have to say as well. -Nathan -------------------------------------------------------------------------------- Date: Mon, 4 Nov 2013 18:13:09 -0800 From: [email protected] To: [email protected] Subject: [Nuke-dev] Best way to keep Rows from multiple inputs around? Hi, I'm hoping someone with more experience with the Nuke API can help me out with the following: - I have an algorithm that wants to use data from multiple inputs. - I only know what input I need to pull data from once I'm in a per-pixel loop. (aka, for each pixel, the data from one of the inputs drives which one of the other inputs is needed) - To avoid having to do Iop::at() or Iop::sample() calls, I would rather fill a Row for each one of the inputs BEFORE the per-pixel loop, and then access that Row later on. My assumption is that this will be more memory hungry, but faster than calling at() or sample(). However, I have questions: - Is this assumption correct? Does this kind of optimization make sense? - How would I go about keeping multiple Rows around, specially when the number of inputs is not predefined? - I've thought of creating my own "uber" float array and just keep appending all rows to it. Then figure out the offset within that array during the per-pixel loop. Is there a more straightforward approach that someone could recommend? For the sake of clarity, I'm looking to turn something like this (pseudocode) for each y: for each x: inputNumber = figureOutInput(); out = input(inputNumber)->at(x,y); Into something like this: for each y: for each inputNumber in inputs: // need to keep a row for each input row(n) = input(inputNumber)->get(row); for each x: inputNumber = figureOutInput(); out = row(inputNumber)[x] Hope that makes sense? Help? Thanks, Ivan _______________________________________________ Nuke-dev mailing list [email protected], http://forums.thefoundry.co.uk/ http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-dev _______________________________________________ Nuke-dev mailing list [email protected], http://forums.thefoundry.co.uk/ http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-dev -------------------------------------------------------------------------------- _______________________________________________ Nuke-dev mailing list [email protected], http://forums.thefoundry.co.uk/ http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-dev
_______________________________________________ Nuke-dev mailing list [email protected], http://forums.thefoundry.co.uk/ http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-dev
