Hi Simon, Thanks for taking a look.
Yes, your assessment is correct. What is a strategy that would avoid this, though? More eyes on the optimizers are greatly welcome! Thanks, Matt On Fri, Mar 28, 2014 at 4:57 PM, Simon Alexander <[email protected]> wrote: > There is a lot going on here, and I'm not certain that I've got all the > moving pieces straight in my mind yet, but I've had an quick look at the > implementation now. I believe the Mattes v4 implementation is similar to > other metrics it it's approach. > > As I suggested earlier in the thread: I believe accumulations like this: > >> for( ThreadIdType threadID = 1; threadID < >> this->GetNumberOfThreadsUsed(); threadID++ ) >> { >> this->m_ThreaderJointPDFSum[0] += >> this->m_ThreaderJointPDFSum[threadID]; >> } > > > will guarantee that we don't have absolute consistent results between > different threadcounts, due to lack of associativity. > > When I perform only transform initialization and a single evaluation of the > metric (i.e. outside of the registration routines), I get results consistent > with this, for example, results for an center-of-mass initialization between > two MR image volumes give me (double precision): > > 1 thread : -0.396771472451519 > 2 threads: -0.396771472450998 > 8 threads: -0.396771472451149 > > for the metric evalution (i.e. via GetValue() of the metric) > > AFAICS, This is consistent magnitude of delta from the above. It will mean > not chance of binary equivalence between different threadcounts/partitioning > but you can do this accumulation quite a few times before the accumulated > divergence gets into digits to worry about. This sort of thing is > avoidable, but at some space/speed cost. > > However, In the registration for this case it takes only about twenty steps > for divergence in the third significant digit between metric estimates! (via > registration->GetOptimizer()->GetCurrentMetricValue() ) > > Clearly the optimizer is not following the same path, so I think something > else must be going on. > > So at this point I don't think the data partitioning of the metric is the > root cause, but I will have a more careful look later. > > Any holes in this analysis you can see so far? > > When I have time to get back into this, I plan to have a look at the > optimizer next, unless you have better suggestions of where to look next. > > cheers, > Simon > > > > On Wed, Mar 19, 2014 at 12:56 PM, Simon Alexander <[email protected]> > wrote: >> >> Brian, my apologies for the typo. >> >> I assume you all are at least as busy as I am; just didn't want to leave >> the impression that I would definitely be able to pursue this, but I will >> try. >> >> >> On Wed, Mar 19, 2014 at 12:45 PM, brian avants <[email protected]> wrote: >>> >>> it's brian - and, yes, we all have "copious free time" of course. >>> >>> >>> brian >>> >>> >>> >>> >>> On Wed, Mar 19, 2014 at 12:43 PM, Simon Alexander <[email protected]> >>> wrote: >>>> >>>> Thanks for the summary Brain. >>>> >>>> A lot of partitioning issues fundamentally come down to the lack of >>>> associativity & distributivity of fp operations. Not sure I can do >>>> anything practical to improve it but I will have a look if I can find a >>>> bit >>>> of my "copious free time" . >>>> >>>> >>>> On Wed, Mar 19, 2014 at 12:29 PM, brian avants <[email protected]> wrote: >>>>> >>>>> yes - i understand. >>>>> >>>>> * matt mccormick implemented compensated summation to address - it >>>>> helps but is not a full fix >>>>> >>>>> * truncating floating point precision greatly reduces the effect you >>>>> are talking about but is unatisfactory to most people ... not sure if the >>>>> functionality for that truncation was taken out of the v4 metrics but it >>>>> was >>>>> in there at one point. >>>>> >>>>> * there may be a small and undiscovered bug that contributes to this in >>>>> mattes specificallly but i dont think that's the issue. we saw this >>>>> effect >>>>> even in mean squares. if there is a bug it may be beyond just mattes. >>>>> we >>>>> cannot disprove that there is a bug. if anyone knows of way to do that, >>>>> let >>>>> me know. >>>>> >>>>> * any help is appreciated >>>>> >>>>> >>>>> brian >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Mar 19, 2014 at 12:24 PM, Simon Alexander >>>>> <[email protected]> wrote: >>>>>> >>>>>> Brain, >>>>>> >>>>>> I could have sworn I had initially added a follow up email clarifying >>>>>> this but since I can't find it in the current quoted exchange, let me >>>>>> reiterate: >>>>>> >>>>>> This is not a case of with different results on different systems. >>>>>> This is a case of different results on the same system if you use a >>>>>> different number of threads. >>>>>> >>>>>> So while that possibly could be some odd intrinsics issue, for >>>>>> example, the far more likely thing is that data partitioning is not being >>>>>> handled in a way that ensures consistency. >>>>>> >>>>>> Originally I was also seeing intra-system differences due to internal >>>>>> precision, but that was a separate issue and has been solved. >>>>>> >>>>>> Hope that is more clear! >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 19, 2014 at 12:13 PM, Simon Alexander >>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> Brian, >>>>>>> >>>>>>> Do you mean the generality of my AVX internal precision problem? >>>>>>> >>>>>>> I agree that is a very common issue, the surprising thing there was >>>>>>> that we were already constraining the code generation in way that >>>>>>> worked as >>>>>>> over the different processor generations and types we used, up until we >>>>>>> hit >>>>>>> the first Haswell cpus with AVX2 support (even though no AVX2 >>>>>>> instructions >>>>>>> were generated). Perhaps it shouldn't have surprised me, but It took >>>>>>> me a >>>>>>> few tests to work that out because the problem was confounded with the >>>>>>> problem I discuss in this thread (which is unrelated). Once I separated >>>>>>> them it was easy to spot. >>>>>>> >>>>>>> So that is a solved issue for now, but I am still interested the >>>>>>> partitioning issue in the image metric, as I only have a work around for >>>>>>> now. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 19, 2014 at 11:24 AM, brian avants <[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler >>>>>>>> >>>>>>>> just as an example of the generality of this problem >>>>>>>> >>>>>>>> >>>>>>>> brian >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 19, 2014 at 11:22 AM, Simon Alexander >>>>>>>> <[email protected]> wrote: >>>>>>>>> >>>>>>>>> Brian, Luis, >>>>>>>>> >>>>>>>>> Thanks. I have been using Mattes as you suspect. >>>>>>>>> >>>>>>>>> I don't quite understand how precision is specifically the issue >>>>>>>>> with # of cores. There are all kinds of issues with precision and >>>>>>>>> order of >>>>>>>>> operations in numerical analysis, but often data partitioning (i.e. >>>>>>>>> for >>>>>>>>> concurrency) schemes can be set up so that the actual sums are done >>>>>>>>> the same >>>>>>>>> way regardless of number of workers, which keeps your final results >>>>>>>>> identical. Is there some reason this can't be done for the Matte's >>>>>>>>> metric? >>>>>>>>> I really should look at the implementation to answer that, of course. >>>>>>>>> >>>>>>>>> Do you have a pointer to earlier discussions? If I can find the >>>>>>>>> time I'd like to dig into this a bit, but I'm not sure when I'll have >>>>>>>>> the >>>>>>>>> bandwidth. I've "solved" this currently by constraining the core >>>>>>>>> count. >>>>>>>>> >>>>>>>>> Perhaps interestingly, my earlier experiments were confounded a bit >>>>>>>>> by a precision issue, but that had to do with intrinsics generation >>>>>>>>> on my >>>>>>>>> compiler behaving differently on systems with AVX2 (even though only >>>>>>>>> AVX >>>>>>>>> intrinsics were being generated). So that made things confusing at >>>>>>>>> first >>>>>>>>> until I separated the issues. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 19, 2014 at 9:49 AM, brian avants <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> yes - we had several discussions about this during v4 development. >>>>>>>>>> >>>>>>>>>> experiments showed that differences are due to precision. >>>>>>>>>> >>>>>>>>>> one solution was to truncate precision to the point that is >>>>>>>>>> reliable. >>>>>>>>>> >>>>>>>>>> but there are problems with that too. last i checked, this was >>>>>>>>>> an >>>>>>>>>> >>>>>>>>>> open problem, in general, in computer science. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> brian >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 19, 2014 at 9:16 AM, Luis Ibanez >>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Simon, >>>>>>>>>>> >>>>>>>>>>> We are aware of some multi-threading related issues in >>>>>>>>>>> the registration process that result in metric values changing >>>>>>>>>>> depending on the number of cores used. >>>>>>>>>>> >>>>>>>>>>> Are you using the MattesMutualInformationMetric ? >>>>>>>>>>> >>>>>>>>>>> At some point it was suspected that the problem was the >>>>>>>>>>> result of accumulative rounding, in the contributions that >>>>>>>>>>> each pixel makes to the metric value.... this may or may >>>>>>>>>>> not be related to what you are observing. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Luis >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Feb 20, 2014 at 3:27 PM, Simon Alexander >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I've been finding some regressions in registration results when >>>>>>>>>>>> using systems with different numbers of cores (so the thread count >>>>>>>>>>>> is >>>>>>>>>>>> different). This is resolved by fixing the global max. >>>>>>>>>>>> >>>>>>>>>>>> It's difficult for me to run the identical code on against >>>>>>>>>>>> 4.4.2, but similar experiments were run in that timeframe without >>>>>>>>>>>> these >>>>>>>>>>>> regressions. >>>>>>>>>>>> >>>>>>>>>>>> I recall that there were changes affecting multhreading in the >>>>>>>>>>>> v4 registration in 4.5.0 release, so I thought this might be a >>>>>>>>>>>> side effect. >>>>>>>>>>>> >>>>>>>>>>>> So a few questions: >>>>>>>>>>>> >>>>>>>>>>>> Is this behaviour expected? >>>>>>>>>>>> >>>>>>>>>>>> Am I correct that this was not the behaviour in 4.4.x ? >>>>>>>>>>>> >>>>>>>>>>>> Does anyone who has a feel for the recent changes 4.4.2 -> >>>>>>>>>>>> 4.5.[0,1] have a good idea where to start looking? I haven't yet >>>>>>>>>>>> dug into >>>>>>>>>>>> the multithreading architecture, but this "smells" like a data >>>>>>>>>>>> partitioning >>>>>>>>>>>> issue to me. >>>>>>>>>>>> >>>>>>>>>>>> Any other thoughts? >>>>>>>>>>>> >>>>>>>>>>>> cheers, >>>>>>>>>>>> Simon >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Powered by www.kitware.com >>>>>>>>>>>> >>>>>>>>>>>> Visit other Kitware open-source projects at >>>>>>>>>>>> http://www.kitware.com/opensource/opensource.html >>>>>>>>>>>> >>>>>>>>>>>> Kitware offers ITK Training Courses, for more information visit: >>>>>>>>>>>> http://kitware.com/products/protraining.php >>>>>>>>>>>> >>>>>>>>>>>> Please keep messages on-topic and check the ITK FAQ at: >>>>>>>>>>>> http://www.itk.org/Wiki/ITK_FAQ >>>>>>>>>>>> >>>>>>>>>>>> Follow this link to subscribe/unsubscribe: >>>>>>>>>>>> http://www.itk.org/mailman/listinfo/insight-developers >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Community mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> http://public.kitware.com/cgi-bin/mailman/listinfo/community >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Powered by www.kitware.com >>>>>>>>>>> >>>>>>>>>>> Visit other Kitware open-source projects at >>>>>>>>>>> http://www.kitware.com/opensource/opensource.html >>>>>>>>>>> >>>>>>>>>>> Kitware offers ITK Training Courses, for more information visit: >>>>>>>>>>> http://kitware.com/products/protraining.php >>>>>>>>>>> >>>>>>>>>>> Please keep messages on-topic and check the ITK FAQ at: >>>>>>>>>>> http://www.itk.org/Wiki/ITK_FAQ >>>>>>>>>>> >>>>>>>>>>> Follow this link to subscribe/unsubscribe: >>>>>>>>>>> http://www.itk.org/mailman/listinfo/insight-developers >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > > _______________________________________________ > Powered by www.kitware.com > > Visit other Kitware open-source projects at > http://www.kitware.com/opensource/opensource.html > > Kitware offers ITK Training Courses, for more information visit: > http://kitware.com/products/protraining.php > > Please keep messages on-topic and check the ITK FAQ at: > http://www.itk.org/Wiki/ITK_FAQ > > Follow this link to subscribe/unsubscribe: > http://www.itk.org/mailman/listinfo/insight-developers > > _______________________________________________ > Community mailing list > [email protected] > http://public.kitware.com/cgi-bin/mailman/listinfo/community > _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Kitware offers ITK Training Courses, for more information visit: http://kitware.com/products/protraining.php Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ Follow this link to subscribe/unsubscribe: http://www.itk.org/mailman/listinfo/insight-developers
