Re: [ITK-dev] [ITK] [ITK Community] [Insight-developers] non-deterministic v4 registrations in 4.5.x

Matt McCormick Sun, 30 Mar 2014 07:08:08 -0700

Hi Simon,

Thanks for taking a look.


Yes, your assessment is correct.  What is a strategy that would avoid
this, though?

More eyes on the optimizers are greatly welcome!

Thanks,
Matt

On Fri, Mar 28, 2014 at 4:57 PM, Simon Alexander <[email protected]> wrote:
> There is a lot going on here, and I'm not certain that I've got all the
> moving pieces straight in my mind yet, but I've had an quick look at the
> implementation now. I believe the Mattes v4 implementation is similar to
> other metrics it it's approach.
>
> As I suggested earlier in the thread: I believe accumulations like this:
>
>>  for( ThreadIdType threadID = 1; threadID <
>> this->GetNumberOfThreadsUsed(); threadID++ )
>>     {
>>     this->m_ThreaderJointPDFSum[0] +=
>> this->m_ThreaderJointPDFSum[threadID];
>>     }
>
>
> will guarantee that we don't have absolute consistent results between
> different threadcounts, due to lack of associativity.
>
> When I perform only transform initialization and a single evaluation of  the
> metric (i.e. outside of the registration routines), I get results consistent
> with this, for example, results for an center-of-mass initialization between
> two MR image volumes give me (double precision):
>
> 1 thread :  -0.396771472451519
> 2 threads: -0.396771472450998
> 8 threads: -0.396771472451149
>
> for the metric evalution (i.e. via GetValue() of the metric)
>
> AFAICS, This is consistent magnitude of delta from the above.  It will mean
> not chance of binary equivalence between different threadcounts/partitioning
> but you can do this accumulation quite a few times before the accumulated
> divergence gets into digits to worry about.  This sort of thing is
> avoidable, but at some space/speed cost.
>
> However, In the registration for this case it takes only about twenty steps
> for divergence in the third significant digit between metric estimates! (via
> registration->GetOptimizer()->GetCurrentMetricValue() )
>
> Clearly the optimizer is not following the same path, so I think something
> else must be going on.
>
> So at this point I don't think the data partitioning of the metric is the
> root cause, but I will have a more careful look later.
>
> Any holes in this analysis you can see so far?
>
> When I have time to get back into this, I plan to have a look at the
> optimizer next, unless you have better suggestions of where to look next.
>
> cheers,
> Simon
>
>
>
> On Wed, Mar 19, 2014 at 12:56 PM, Simon Alexander <[email protected]>
> wrote:
>>
>> Brian, my apologies for the typo.
>>
>> I assume you all are at least as busy as I am; just didn't want to leave
>> the impression that I would definitely be able to pursue this, but I will
>> try.
>>
>>
>> On Wed, Mar 19, 2014 at 12:45 PM, brian avants <[email protected]> wrote:
>>>
>>> it's brian - and, yes, we all have "copious free time" of course.
>>>
>>>
>>> brian
>>>
>>>
>>>
>>>
>>> On Wed, Mar 19, 2014 at 12:43 PM, Simon Alexander <[email protected]>
>>> wrote:
>>>>
>>>> Thanks for the summary Brain.
>>>>
>>>> A lot of partitioning issues fundamentally  come down to the lack of
>>>> associativity & distributivity  of fp operations.  Not sure I can do
>>>> anything practical to improve it  but I will have a look if I can find a 
>>>> bit
>>>> of my "copious free time" .
>>>>
>>>>
>>>> On Wed, Mar 19, 2014 at 12:29 PM, brian avants <[email protected]> wrote:
>>>>>
>>>>> yes - i understand.
>>>>>
>>>>> * matt mccormick implemented compensated summation to address - it
>>>>> helps but is not a full fix
>>>>>
>>>>> * truncating floating point precision greatly reduces the effect you
>>>>> are talking about but is unatisfactory to most people ... not sure if the
>>>>> functionality for that truncation was taken out of the v4 metrics but it 
>>>>> was
>>>>> in there at one point.
>>>>>
>>>>> * there may be a small and undiscovered bug that contributes to this in
>>>>> mattes specificallly but i dont think that's the issue.  we saw this 
>>>>> effect
>>>>> even in mean squares.  if there is a bug it may be beyond just mattes.   
>>>>> we
>>>>> cannot disprove that there is a bug.  if anyone knows of way to do that, 
>>>>> let
>>>>> me know.
>>>>>
>>>>> * any help is appreciated
>>>>>
>>>>>
>>>>> brian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 19, 2014 at 12:24 PM, Simon Alexander
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Brain,
>>>>>>
>>>>>> I could have sworn I had initially added a follow up email clarifying
>>>>>> this but since I can't find it in the current quoted exchange, let me
>>>>>> reiterate:
>>>>>>
>>>>>> This is not a case of with different results on different systems.
>>>>>> This is a case of different results on the same system if you use a
>>>>>> different number of threads.
>>>>>>
>>>>>> So while that possibly could be some odd intrinsics issue, for
>>>>>> example, the far more likely thing is that data partitioning is not being
>>>>>> handled in a way that ensures consistency.
>>>>>>
>>>>>> Originally I was also seeing intra-system differences due to internal
>>>>>> precision, but that was a separate issue and has been solved.
>>>>>>
>>>>>> Hope that is more clear!
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 19, 2014 at 12:13 PM, Simon Alexander
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Brian,
>>>>>>>
>>>>>>> Do you mean the generality of my AVX  internal precision problem?
>>>>>>>
>>>>>>> I agree that is a very common issue, the surprising thing there was
>>>>>>> that we were already constraining the code generation in way that 
>>>>>>> worked as
>>>>>>> over the different processor generations and types we used, up until we 
>>>>>>> hit
>>>>>>> the first Haswell cpus with AVX2 support (even though no AVX2 
>>>>>>> instructions
>>>>>>> were generated).  Perhaps it shouldn't have surprised me, but It took 
>>>>>>> me a
>>>>>>> few tests to work that out because the problem was confounded with the
>>>>>>> problem I discuss in this thread (which is unrelated).  Once I separated
>>>>>>> them it was easy to spot.
>>>>>>>
>>>>>>> So that is a solved issue for now, but I am still interested the
>>>>>>> partitioning issue in the image metric, as I only have a work around for
>>>>>>> now.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 19, 2014 at 11:24 AM, brian avants <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler
>>>>>>>>
>>>>>>>> just as an example of the generality of this problem
>>>>>>>>
>>>>>>>>
>>>>>>>> brian
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 19, 2014 at 11:22 AM, Simon Alexander
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Brian, Luis,
>>>>>>>>>
>>>>>>>>> Thanks.  I have been using Mattes as you suspect.
>>>>>>>>>
>>>>>>>>> I don't quite understand how precision is specifically the issue
>>>>>>>>> with # of cores.  There are all kinds of issues with precision and 
>>>>>>>>> order of
>>>>>>>>> operations in numerical analysis, but often data partitioning (i.e. 
>>>>>>>>> for
>>>>>>>>> concurrency) schemes can be set up so that the actual sums are done 
>>>>>>>>> the same
>>>>>>>>> way regardless of number of workers, which keeps your final results
>>>>>>>>> identical.  Is there some reason this can't be done for the Matte's 
>>>>>>>>> metric?
>>>>>>>>> I really should look at the implementation to answer that, of course.
>>>>>>>>>
>>>>>>>>> Do you have a pointer to earlier discussions?  If I can find the
>>>>>>>>> time I'd like to dig into this a bit, but I'm not sure when I'll have 
>>>>>>>>> the
>>>>>>>>> bandwidth.  I've "solved" this currently by constraining the core 
>>>>>>>>> count.
>>>>>>>>>
>>>>>>>>> Perhaps interestingly, my earlier experiments were confounded a bit
>>>>>>>>> by a precision issue, but that had to do with intrinsics generation 
>>>>>>>>> on my
>>>>>>>>> compiler behaving differently on systems with AVX2 (even though only 
>>>>>>>>> AVX
>>>>>>>>> intrinsics were being generated).  So that made things confusing at 
>>>>>>>>> first
>>>>>>>>> until I separated the issues.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 19, 2014 at 9:49 AM, brian avants <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> yes - we had several discussions about this during v4 development.
>>>>>>>>>>
>>>>>>>>>> experiments showed that differences are due to precision.
>>>>>>>>>>
>>>>>>>>>> one solution was to truncate precision to the point that is
>>>>>>>>>> reliable.
>>>>>>>>>>
>>>>>>>>>> but there are problems with that too.   last i checked, this was
>>>>>>>>>> an
>>>>>>>>>>
>>>>>>>>>> open problem, in general, in computer science.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> brian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 19, 2014 at 9:16 AM, Luis Ibanez
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Simon,
>>>>>>>>>>>
>>>>>>>>>>> We are aware of some multi-threading related issues in
>>>>>>>>>>> the registration process that result in metric values changing
>>>>>>>>>>> depending on the number of cores used.
>>>>>>>>>>>
>>>>>>>>>>> Are you using the MattesMutualInformationMetric ?
>>>>>>>>>>>
>>>>>>>>>>> At some point it was suspected that the problem was the
>>>>>>>>>>> result of accumulative rounding, in the contributions that
>>>>>>>>>>> each pixel makes to the metric value.... this may or may
>>>>>>>>>>> not be related to what you are observing.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Thanks
>>>>>>>>>>>
>>>>>>>>>>>        Luis
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Feb 20, 2014 at 3:27 PM, Simon Alexander
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I've been finding some regressions in registration results when
>>>>>>>>>>>> using systems with different numbers of cores (so the thread count 
>>>>>>>>>>>> is
>>>>>>>>>>>> different).  This is resolved by fixing the global max.
>>>>>>>>>>>>
>>>>>>>>>>>> It's difficult for me to run the identical code on against
>>>>>>>>>>>> 4.4.2, but similar experiments were run in that timeframe without 
>>>>>>>>>>>> these
>>>>>>>>>>>> regressions.
>>>>>>>>>>>>
>>>>>>>>>>>> I recall that there were changes affecting multhreading in the
>>>>>>>>>>>> v4 registration in 4.5.0 release, so I thought this might be a 
>>>>>>>>>>>> side effect.
>>>>>>>>>>>>
>>>>>>>>>>>> So a few questions:
>>>>>>>>>>>>
>>>>>>>>>>>> Is this behaviour expected?
>>>>>>>>>>>>
>>>>>>>>>>>> Am I correct that this was not the behaviour in 4.4.x ?
>>>>>>>>>>>>
>>>>>>>>>>>> Does anyone who has a feel for  the recent changes 4.4.2 ->
>>>>>>>>>>>> 4.5.[0,1]  have a good idea where to start looking?  I haven't yet 
>>>>>>>>>>>> dug into
>>>>>>>>>>>> the multithreading architecture, but this "smells" like a data 
>>>>>>>>>>>> partitioning
>>>>>>>>>>>> issue to me.
>>>>>>>>>>>>
>>>>>>>>>>>> Any other thoughts?
>>>>>>>>>>>>
>>>>>>>>>>>> cheers,
>>>>>>>>>>>> Simon
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Powered by www.kitware.com
>>>>>>>>>>>>
>>>>>>>>>>>> Visit other Kitware open-source projects at
>>>>>>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>>>>>>
>>>>>>>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>>>>>>>> http://kitware.com/products/protraining.php
>>>>>>>>>>>>
>>>>>>>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>>>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>>>>>>>
>>>>>>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>>>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Community mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> http://public.kitware.com/cgi-bin/mailman/listinfo/community
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Powered by www.kitware.com
>>>>>>>>>>>
>>>>>>>>>>> Visit other Kitware open-source projects at
>>>>>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>>>>>
>>>>>>>>>>> Kitware offers ITK Training Courses, for more information visit:
>>>>>>>>>>> http://kitware.com/products/protraining.php
>>>>>>>>>>>
>>>>>>>>>>> Please keep messages on-topic and check the ITK FAQ at:
>>>>>>>>>>> http://www.itk.org/Wiki/ITK_FAQ
>>>>>>>>>>>
>>>>>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>>>>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Kitware offers ITK Training Courses, for more information visit:
> http://kitware.com/products/protraining.php
>
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-developers
>
> _______________________________________________
> Community mailing list
> [email protected]
> http://public.kitware.com/cgi-bin/mailman/listinfo/community
>
_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html

Kitware offers ITK Training Courses, for more information visit:
http://kitware.com/products/protraining.php

Please keep messages on-topic and check the ITK FAQ at:
http://www.itk.org/Wiki/ITK_FAQ

Follow this link to subscribe/unsubscribe:
http://www.itk.org/mailman/listinfo/insight-developers

Re: [ITK-dev] [ITK] [ITK Community] [Insight-developers] non-deterministic v4 registrations in 4.5.x

Reply via email to