Hi Colin,
It does make sense, thanks.

My output is not a Row in this case, but a DeepPixel, so the increments of
the output and input are not in sync.
But either way, I am managing the float* of that deep pixel already (which
I've found is waaay faster than constructing a DeepOutput pixel and pushing
back values to it, but that's another story).

Your suggestion of using the const float* of each row is indeed where I was
trying to get to.

I guess what I was struggling with was that, by the time I'm ready to fill
the Row (and get the const float pointer from it), I am already within the
inner loop, so I imagined it wouldn't be very efficient to have that many
"get" calls.
Ideally, I would want to fill a Row for each input once, and have them
stored somewhere (my own array?) for later access. But then I would need to
handle things like different lengths for each row, different number of
channels, etc, right?

Keeping a vector of rows would be nice(r), but I don't know what
performance implications that might have.

Currently, I'm going with using an InterestRatchet and keep adding Tiles to
it. It I read the API docs correctly, this should make it so that creating
a new Tile will be extremely fast if that data was already in the
InterestRatchet. This seems to be giving me some nice speedups so far.

Thanks,
Ivan








On Tue, Nov 5, 2013 at 12:26 PM, Colin Doncaster
<[email protected]>wrote:

> Hi Ivan,
>
> You’re discussing inputs, but not channels - are you assuming you’ll have
> the same number and named channels per input you might be accessing?  Once
> requested you should be safe to access the raw const float* which I’d
> recommend.
>
> Keep a float* of your output using the row.writeable(channel) call, with
> the const *float’s using the row.readable(channel) call.
>
> Then do a
>
> foreach(z, a_channels)
> {
> for ( int x = a_x; x < a_r; ++x )
> {
> output = output + inputs[z]; // or whatever shenanigans you want to pull
> output++;
> inputs++;
> }
> }
>
> doing this in a convolve (ie. an inner loop) should compile down to some
> decent vectorized code (-msse3).  I believe Nuke guarantees that the
> channel arrays are aligned nicely making this possible.  You could also eek
> out some more speed using intrinsics (generally the compiler knows best).
>
> So - with that all said, it’s going to be more advantageous thinking about
> the input as rows as you’re also going to make the caching system happier
> when dealing with large graphs where caching whole tiles won’t be as
> efficient.
>
> I hope at least some of that made sense.  :)
>
> Cheers
>
> On Nov 5, 2013, at 12:10 AM, Ivan Busquets <[email protected]> wrote:
>
> Hey,
>
> Thanks Nathan. That certainly gives me some ideas to think about.
>
> In hindsight, though, I see I have presented an overly-simplified case as
> an example.
>
> To expand a little bit more, there is one additional loop per pixel (think
> of a DeepPixel, for example, and looping through all samples in that
> DeepPixel)
>
> So, the flow of loops would look more like this:
>
> for each y:
>   for each x:
>     for each sample:
>        inputNumber = figureOutInput();
>        sampleout = input(inputNumber)->at(x,y);
>
> That's why, considering how deeply buried those at() calls are, I was
> thinking that it might be more optimal to cull full Rows for each input.
>
> Some of the other optimizations you suggest are great ideas, but in this
> case I don't think I'll get much of a benefit from a) knowing whichs inputs
> are really needed beforehand; and b) optimizing the X and R bounds of each
> row.
> In this case scenario, I would say ALL inputs will be used most of the
> time, and almost ALL pixels from each input will be needed as well.
>
> Thanks for pointing me towards Interests, though. Looking a bit closer, I
> see there's also a InterestRatchet class which might be exactly what I
> need. From the class declaration in Interest.h:
>
>  /** InterestRatchet
>
>     ** If you create one of these, and pass it to 'Interest' then it will
> remember
>     ** which Iops it has previously called addInterest on, and not do
> re-do these,
>     ** thus saving time as addInterest involves contention
>     **
>     ** Interests are removed again when the InterestRatchet is destroyed.
>     **/
>
>
> Thanks again,
> Ivan
>
>
>
>
> On Mon, Nov 4, 2013 at 7:24 PM, Nathan Rusch <[email protected]>wrote:
>
>> Hey Ivan,
>>
>> > How would I go about  keeping multiple Rows around, specially when the
>> number of inputs is not predefined?
>>
>> I'm certainly not the most qualified person to answer all of your
>> questions (especially about relative performance), but I'm going to take a
>> crack at this one anyway for fun.
>>
>> To keep things really simple, you could just loop through and create
>> threaded Interests on all input rows, store them in a vector, and then as
>> you loop through your mask row, use Interest::at() to get data from the one
>> you need based on the required input index. The header documentation seems
>> a little undecided on whether this would actually be faster than Iop::at(),
>> though.
>>
>>
>> A slightly more involved approach would be to pre-determine which inputs
>> you actually need data from. Loop through your "mask" row first, determine
>> which input each pixel indicates, and if you haven't already added a row
>> for that input, add one. Then go through your array of rows in a second
>> loop and pass each one to a call to get() on its corresponding input.
>>
>> For some reason this still feels like a somewhat primitive way of doing
>> things, but it would at least prevent you from pulling in data from inputs
>> that you don't need. You could also use threaded Interests instead of
>> Rows, which may be more efficient for concurrently fetching data from
>> multiple inputs in preparation for generating your output (I'll leave that
>> to someone else to answer).
>>
>>
>> Finally, another variation of this that seems like it could be more
>> efficient would be to loop through your "mask" row as before, but instead
>> of creating Rows or Interests outright, just store X and R bounding
>> coordinates for each input you're going to need first. In other words,
>> start with some sort of signal value for each input saying "I don't need
>> this", but as you encounter mask pixels that say otherwise, remember the
>> first X position for each input, and then keep a running max R for each as
>> you encounter subsequent required positions. Then, in a second loop, you
>> can create your Rows or Interests with better bounds to save
>> memory/calculation time.
>>
>> I have the same hunch you do about fetching whole rows being more
>> efficient than piece-wise calls to Iop::at(), but if you can predetermine
>> the row positions of the pixels you need from a given input, you could
>> potentially make some sort of decision about how to get them, and in some
>> cases, it may actually be faster to use Iop::at() than to ask for a whole
>> row (consider a case where you only need 2 pixels from input N: the first
>> and last in the row).
>>
>>
>> Anyway, sorry for the brain dump. Looking forward to what others have to
>> say as well.
>>
>>
>> -Nathan
>>
>> ------------------------------
>> Date: Mon, 4 Nov 2013 18:13:09 -0800
>> From: [email protected]
>> To: [email protected]
>> Subject: [Nuke-dev] Best way to keep Rows from multiple inputs around?
>>
>>
>> Hi,
>>
>> I'm hoping someone with more experience with the Nuke API can help me out
>> with the following:
>>
>> - I have an algorithm that wants to use data from multiple inputs.
>> - I only know what input I need to pull data from once I'm in a per-pixel
>> loop. (aka, for each pixel, the data from one of the inputs drives which
>> one of the other inputs is needed)
>> - To avoid having to do Iop::at() or Iop::sample() calls, I would rather
>> fill a Row for each one of the inputs BEFORE the per-pixel loop, and then
>> access that Row later on.
>>
>> My assumption is that this will be more memory hungry, but faster than
>> calling at() or sample().
>> However, I have questions:
>>
>> - Is this assumption correct? Does this kind of optimization make sense?
>> - How would I go about  keeping multiple Rows around, specially when the
>> number of inputs is not predefined?
>> - I've thought of creating my own "uber" float array and just keep
>> appending all rows to it. Then figure out the offset within that array
>> during the per-pixel loop. Is there a more straightforward approach that
>> someone could recommend?
>>
>>
>> For the sake of clarity, I'm looking to turn something like this
>> (pseudocode)
>>
>> for each y:
>>   for each x:
>>      inputNumber = figureOutInput();
>>      out = input(inputNumber)->at(x,y);
>>
>>
>>
>> Into something like this:
>>
>> for each y:
>>   for each inputNumber in inputs:
>>     // need to keep a row for each input
>>     row(n) = input(inputNumber)->get(row);
>>
>>   for each x:
>>     inputNumber = figureOutInput();
>>     out = row(inputNumber)[x]
>>
>>
>> Hope that makes sense? Help?
>>
>> Thanks,
>> Ivan
>>
>>
>>
>> _______________________________________________ Nuke-dev mailing list
>> [email protected], http://forums.thefoundry.co.uk/
>> http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-dev
>>
>> _______________________________________________
>> Nuke-dev mailing list
>> [email protected], http://forums.thefoundry.co.uk/
>> http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-dev
>>
>>
> _______________________________________________
> Nuke-dev mailing list
> [email protected], http://forums.thefoundry.co.uk/
> http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-dev
>
>
>
> _______________________________________________
> Nuke-dev mailing list
> [email protected], http://forums.thefoundry.co.uk/
> http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-dev
>
>
_______________________________________________
Nuke-dev mailing list
[email protected], http://forums.thefoundry.co.uk/
http://support.thefoundry.co.uk/cgi-bin/mailman/listinfo/nuke-dev

Reply via email to