Re: [Help-gsl] Just getting started - I have some 'newbie' questions.

Ted Byers Thu, 22 Mar 2012 05:45:54 -0700

> -----Original Message-----
> From: Andrew W. Steiner [mailto:[email protected]]
> Sent: March-22-12 1:09 AM
> To: Ted Byers
> Cc: [email protected]
> Subject: Re: [Help-gsl] Just getting started - I have some 'newbie' questions.
> 
> Hello all,
> 
Hi Andrew,


Thanks.

>      An attempt to answer at least some of your questions:
> 
> On Wed, Mar 21, 2012 at 9:07 PM, Ted Byers <[email protected]> wrote:
> > // I find the following distastefully wasteful.  I would have
> > preferred to have the GSL functions work directly on std::vectors
> > instead // of having to allocate new memory and bear the cost of the
> > loop and copying all the data.  Indeed, I'd have hoped for expression
> > // templates, to make these functions all the more efficient.
> 
>      GSL was designed with C and never attempted to work well with
> C++ vector types. In the case of STL vectors, you can use
> gsl_vector_view_array (double * base, size_t n) and just set base to the 
> address
> of the first std::vector element. It might be possible to make that work with
> matrices also, but in the code above it's likely easier (and not that much
> slower) just to work with the GSL vector types from the beginning, so you can
> make the call to the SVD directly.
> 
I suspected as much.

I have seen various people recommending different ways of getting a pointer to 
the first element of an std::vector, but they all miss a couple things.  The 
last I checked, the way memory is allocated and managed in an STL container is 
an implementation detail, so there is no guarantee that the contents of a 
vector are stored in contiguous memory.  Unless the standard has changed since 
last I looked to provide such a guarantee, I wouldn't want to make such an 
assumption.  Further, all bets are off when it comes to member functions that 
can change the contents of the vector.  One of the frequent issues I see junior 
programmers having is that they forget that all iterators are invalidated when 
you start adding or removing elements.

Alas, this guarantees some inefficinecies, especially when there is a need to 
use multiple libraries.  For example, I have written a library that is quite 
efficient in use of std::vector in computing various statistics, especially 
moving statistics.  But it depends on the capabilities of std::vector, and 
especially its forward and reverse iterators, for much of its efficiencies.  To 
use both that code and gsl in the same program, I guess there's little option 
but to use gsl's vector view and matrix view on pointers to double arrays (and 
at least with those I can guarantee exception safety by having the memory those 
arrays use managed by a smart pointer), and bear the cost of one copy of the 
input data each time the analysis is required.  At least, when doing this for a 
time series, and needing a moving window, I can avoid the cost of reallocating 
the memory used for the data.

>      Even boost+STL is not immune from difficulties of this kind. The way to 
> do
> SVD within boost is to use ublas, but ublas vectors aren't trivially 
> compatible
> with STL vectors either (unless they have the same size). This is why I always
> write my linear algebra functions as template functions which accept any
> reasonably formed vector and matrix types (but I don't have SVD yet).
> 
I guess some of the issue is due to our use of libraries of code that sometimes 
date back decades and were written in either C or Fortran; libraries that have 
stood the test of time in terms of being fast and reliable, at least in the 
hands of one who knows how to use it.  I know I wold not be eager to write new 
code for SVD when there already exists good code that does it, even if I could 
make the case that using the features of STL containers and C++ specific 
techniques such as expression templates that I could make the code faster still 
(my best C++ code is faster than my best Fortran code, especially because of 
template metaprogramming).  It can be cheaper to buy a faster machine than it 
is to develop such new code de novo, but I guess in due course, it will be 
necessary if we're to use, e.g., the capabilities of something like Intel's 
Threading Building Blocks library (which doesn't work on Cygwin BTW): a library 
that scales performance well by using threading to take advantage of multiple 
cores in modern processors as the number of cores increases.

> > I assume from your allocate and free family of functions that I can't
> > create the vectors and matrices using operator new, and free them
> > using operator delete, and thus give management of them to a smart
> > pointer.  How, then, do you manage to maintain exception safety?
> 
>      There's no need for a smart pointer, you just have to check the 
> allocation of
> each vector separately and free up all of the previously successful 
> allocations in
> the case that one fails. Alternatively, you can embed the vectors into a 
> class,
> ensuring that the destructor takes care of the deallocation in the usual way.
> 
This is fine if we assume memory allocation won't generate an exception.  But 
it is more challenging when you're allocating more than one block of memory (or 
array) on the heap and the first few succeed and the last fails.  If you're not 
extra careful, you can end up with a slow memory leak in a long running program 
if such failures in memory allocation happen apparently randomly through time.  
I am not looking at writing code that will be used once for an analysis and 
then the program ends.  Rather, I am looking at something that will assume the 
system is non-autonomous, and so the analysis of a real time data feed will 
require the analysis be performed, the results used, and then repeated every 
half hour, every hour, or even one a week (depending on how quickly the system 
parameters change).  While smart pointers are not essential to ensuring fault 
tolerance, they do make it so much easier.

Anyway, thanks for your time and insights.

Cheers

Ted

Re: [Help-gsl] Just getting started - I have some 'newbie' questions.

Reply via email to