> -----Original Message----- > From: Andrew W. Steiner [mailto:[email protected]] > Sent: March-22-12 1:09 AM > To: Ted Byers > Cc: [email protected] > Subject: Re: [Help-gsl] Just getting started - I have some 'newbie' questions. > > Hello all, > Hi Andrew,
Thanks. > An attempt to answer at least some of your questions: > > On Wed, Mar 21, 2012 at 9:07 PM, Ted Byers <[email protected]> wrote: > > // I find the following distastefully wasteful. I would have > > preferred to have the GSL functions work directly on std::vectors > > instead // of having to allocate new memory and bear the cost of the > > loop and copying all the data. Indeed, I'd have hoped for expression > > // templates, to make these functions all the more efficient. > > GSL was designed with C and never attempted to work well with > C++ vector types. In the case of STL vectors, you can use > gsl_vector_view_array (double * base, size_t n) and just set base to the > address > of the first std::vector element. It might be possible to make that work with > matrices also, but in the code above it's likely easier (and not that much > slower) just to work with the GSL vector types from the beginning, so you can > make the call to the SVD directly. > I suspected as much. I have seen various people recommending different ways of getting a pointer to the first element of an std::vector, but they all miss a couple things. The last I checked, the way memory is allocated and managed in an STL container is an implementation detail, so there is no guarantee that the contents of a vector are stored in contiguous memory. Unless the standard has changed since last I looked to provide such a guarantee, I wouldn't want to make such an assumption. Further, all bets are off when it comes to member functions that can change the contents of the vector. One of the frequent issues I see junior programmers having is that they forget that all iterators are invalidated when you start adding or removing elements. Alas, this guarantees some inefficinecies, especially when there is a need to use multiple libraries. For example, I have written a library that is quite efficient in use of std::vector in computing various statistics, especially moving statistics. But it depends on the capabilities of std::vector, and especially its forward and reverse iterators, for much of its efficiencies. To use both that code and gsl in the same program, I guess there's little option but to use gsl's vector view and matrix view on pointers to double arrays (and at least with those I can guarantee exception safety by having the memory those arrays use managed by a smart pointer), and bear the cost of one copy of the input data each time the analysis is required. At least, when doing this for a time series, and needing a moving window, I can avoid the cost of reallocating the memory used for the data. > Even boost+STL is not immune from difficulties of this kind. The way to > do > SVD within boost is to use ublas, but ublas vectors aren't trivially > compatible > with STL vectors either (unless they have the same size). This is why I always > write my linear algebra functions as template functions which accept any > reasonably formed vector and matrix types (but I don't have SVD yet). > I guess some of the issue is due to our use of libraries of code that sometimes date back decades and were written in either C or Fortran; libraries that have stood the test of time in terms of being fast and reliable, at least in the hands of one who knows how to use it. I know I wold not be eager to write new code for SVD when there already exists good code that does it, even if I could make the case that using the features of STL containers and C++ specific techniques such as expression templates that I could make the code faster still (my best C++ code is faster than my best Fortran code, especially because of template metaprogramming). It can be cheaper to buy a faster machine than it is to develop such new code de novo, but I guess in due course, it will be necessary if we're to use, e.g., the capabilities of something like Intel's Threading Building Blocks library (which doesn't work on Cygwin BTW): a library that scales performance well by using threading to take advantage of multiple cores in modern processors as the number of cores increases. > > I assume from your allocate and free family of functions that I can't > > create the vectors and matrices using operator new, and free them > > using operator delete, and thus give management of them to a smart > > pointer. How, then, do you manage to maintain exception safety? > > There's no need for a smart pointer, you just have to check the > allocation of > each vector separately and free up all of the previously successful > allocations in > the case that one fails. Alternatively, you can embed the vectors into a > class, > ensuring that the destructor takes care of the deallocation in the usual way. > This is fine if we assume memory allocation won't generate an exception. But it is more challenging when you're allocating more than one block of memory (or array) on the heap and the first few succeed and the last fails. If you're not extra careful, you can end up with a slow memory leak in a long running program if such failures in memory allocation happen apparently randomly through time. I am not looking at writing code that will be used once for an analysis and then the program ends. Rather, I am looking at something that will assume the system is non-autonomous, and so the analysis of a real time data feed will require the analysis be performed, the results used, and then repeated every half hour, every hour, or even one a week (depending on how quickly the system parameters change). While smart pointers are not essential to ensuring fault tolerance, they do make it so much easier. Anyway, thanks for your time and insights. Cheers Ted
