Hello, I've pushed a patch that illustrate a way to have better performances with VariableLengthVectors (VLV). http://review.source.kitware.com/#/c/20389/ I suspect, this patch will need some discussions.
You'll find a descriptions of where I wish to go, of the problems encountered on the way, and of possible solutions. The biggest issue is how to have thread-safe cached variables in functions that were pure functions until now, and still keep the code as simple as possible, and minimize the number of modifications required to improve the code base (regarding performances on VLV) Regards --Luc Hermitte This is the final step of the performances enhancements to be made on VLV in order to have performances on par with monoband itk::Image<>. This patch won't pass tests. This is known. I still have to reorganize the new code and to document it. It's meant to illustrate different possibilities and issues to be solved. Here is a short summary. Let discuss on the mailing list. The *purpose* of this patch (and the previous ones as well) is to eliminate allocations made of each pixel. This will permit (on a Linear Interpolation test case to reduce the computation time, on a VectorImage, from 85s to 27s -- it was around 2 minutes before the first patches, and on the same image read as an itk::Image it takes ~15-17s) First problem: PixelType Evalutate(...); Currently, when we apply an ImageFunction, we call Evaluate() function which requires the creation of a new VLV. Indeed this function return a new pixel value by value. The only way to get rid of the variable creation is to return the new value as an output reference parameter. This means all classes that inherit from ImageFunction will need to change the signature of their EvaluateXxxx() functions. And that all client classes must take this into account. In other words, many modifications will be required if we want to have better performances on VLV. Note however, it's quite simple to migrate gradually the source code to the new interface : we can have all client code explicitly use the new (and badly named) EvaluateNoCopy(), which in turn call the old Evaluate() -- or the other way around (which has a drawback regarding cached variables). Second Problem: Temporary pixel values Some computations will produce new pixel values. For instance p = p0 + (p1 - p0) * d; produces 3 new pixels values. This has already been taken care of in the previous patches thanks to Expression Templates. Third Problem: Local Variables 3.1- Auxiliary variables Some algorithms (e.g. BSplines interpolations) require dynamic data structures like matrices. This implies allocation at each iteration. This particular problem isn't addressed in this patch. The solution to 3.2 will help to provide an solution. 3.2- Factorized computations Sometimes, a complex computation will take advantage of code factorisation (easier to maintain, and better performances). e.g. Pixel p4 = p0 + (p1 - p0) * d0 Pixel p5 = p2 + (p3 - p2) * d1 output = p4 + (p5 - p4) * d2 This implies new pixel values, hence dynamic allocations. In c++11, we could (almost) have written auto p4 = p0 + (p1 - p0) * d0 Except that p4 is an expression template. Its type is quite complex to express in plain C++98/03. But the problem will be that the current implementation of expression templates uses references. Here p4 will be made of dangling references, the code would be bugged. Workarounds would be possible in C++11 as it will permit us to distinguish lvalue references from rvalue references in order to copy the latter, and keep references to the former. We are not in C++11 yet. Moreover, this would not factorize the computations. We need a C++98 solution. The only way I see is to rely on cached variables. This means that the ImageFunctions are no longer [[pure]] functions : they'll mutate some variables that we want to allocate once per ImageFunction and ... per thread. Indeed the ImageFunctions shall be reentrant. I was first hoping to use thread_local variables. Alas they cannot be non-static attributes. This is a problem: if I'm not wrong, a same ImageFunction could be used multiple times in a same processing chain. We still have a simple solution, but it requires a second modification to Evaluate() signature: we need to pass the current ThreadId. The new drawback: code migration requires more work. We can not write Evaluate() in terms of EvaluateNoCopy() as the latter requires the threadId that the former doesn't know. This is the reason why I expect the tests to fail. Instead we need to convert all client code (like the WarpImageFilter) and have the default implementation of EvaluateNoCopy() to call the already written and not yet converted Evalutate(). Fourth Problem: heterogeneous data types (i.e. castings) The input image type, the output image type, and the internal pixel type may all be different. That's why the code contain so many static_casts. Alas a static_cast<> on pixels will induce the creation of new pixel values. This is contrary to the purpose of this patch. Copy constructors and assignment operators already support on-the-fly implicit static_casting. We still need to be able to require casting only when it's required. That's where the new function CastInto(), MoveInto() and As() come into rescue. Alas, all ImageFunction codes will need to be converted to use these functions. _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Kitware offers ITK Training Courses, for more information visit: http://kitware.com/products/protraining.php Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/insight-developers