Re: Code speed (and back to the memory leaks...)

Joseph Wakeling Fri, 16 Apr 2010 04:01:01 -0700

bearophile wrote:
> You are right, sorry.

No need to apologise!  You helped me significantly improve my code,
helped me understand D a lot better and left me feeling generally very
positive about developing further in D.  I'd call that a result. :-)


>> So I think it's probably just compiler difference that's to blame for speed 
>> differences<
> 
> This can be true, but such differences are not magic, they can be found, and 
> eventually they can even become precise enhancement requests for llvm 
> developers, they are open to such requests.

I hope my questions here have been useful in this respect ...

>> After all, D in general and DMD in particular is in development.<
> 
> DMD has a quite old back-end that is not in active development. Maybe it will 
> become 64 bit.

The DMD backend seems a generally problematic piece of code given the
proprietary restrictions that apply to it.  I have the impression that
it's being used simply because it's the backend that is known and tested
and that over the longer term there may be a switch ... ?

The one compiler-related concern I have is whether there will be an
effective free-as-in-freedom compiler for D2 in the near future.  Work
is going on GDC but it's moving slowly -- and I don't see a clear LDC
roadmap to D2 support.

> In this program I have seen that a good percentage of the speed difference 
> between ldc/dmd is caused by foreach loops. When you iterate over structs 
> longer than size_t use ref (or in D2 better ref const(...)).
> foreach (Rating r; ratings)
> ==>
> foreach (ref const(Rating) r; ratings)
> 
> There is nothing algorithmically strange going on here, but to be sure you 
> can take a look at the produced asm.

Interesting ... !  I'm afraid the asm is over my head, but I will read
up and try to understand it.

> To find why the C++ code is faster you can show me the equivalent C++
> code that I can compile myself, or you can compile your C++ code and
> show me the asm of the two functions that are hot spots.

Sure -- attached.  The key functions are the ones in tref.cpp (the
smalltref class) and trefsim.cpp, and the treftest.cpp file that runs a
roughly-equivalent simulation to test.d.

It's a little embarrassing to share because it's fairly badly-designed
code -- my first serious C++ attempt -- it served its purpose some time
ago and was done with.  It started to become vaguely relevant again so I
thought I'd try and redo it with a better design, and rather than do it
in C++ (which I could do much better by now) this seemed like a nice
excuse to build my first D project ... :-)

I already made a couple of tweaks based on your improvements to the D
code, but I think g++ probably already has such optimizations during the
compile process.

I also realised that I wasn't comparing like with like -- the code was
using the RanLux random number generator, and with mt19937 it shaves
quite a bit of time off.  So C++ still has a bit of an edge right now.

There's surely a lot of other stuff that can be done to improve this
C++, but please don't put lots of effort in unless it helps with
improving D (the language and/or frontend, not my code, which is not so
important:-).

Thanks & best wishes,

    -- Joe

#ifndef __TREF_HPP__
#define __TREF_HPP__

#include <vector>

typedef long unsigned int objectID;
typedef long unsigned int userID;

namespace tref
{

class rating {
public:
	objectID o;
	userID u;
	double value;
	rating(objectID _o, userID _u, double _value);
};

// class __tref
// Base class for all iterative refinement algorithms.
// Contains only minimally needed info and functions.
class __tref {
protected:
	bool verbose;
	std::vector<tref::rating> ratings;
	std::vector<double> quality;
	std::vector<double> weight;
	double beta;
	double epsilon;
	double convergence;
	virtual void addLink(objectID o, userID u);
	virtual void iterationInit();
	virtual double qualityUpdate();
	virtual void weightUpdate();
public:
	__tref(long unsigned int _objects, long unsigned int _users, long unsigned int _links,
	       double _beta, double _epsilon, double _convergence,
	       bool _verbose);
	void addRating(objectID o, userID u, double value);
	unsigned int iterationCycle();
	long unsigned int objects();
	long unsigned int users();
	long unsigned int links();
	double objectQuality(objectID o);
	double userWeight(userID u);
	void printRatings();
};

// class smalltref : public __tref
// Memory-efficient but slow algorithm.
class smalltref : public __tref {
protected:
	std::vector<double> qualityLast;
	std::vector<double> weightSum;
	std::vector<unsigned long int> userLinks;
	virtual void addLink(objectID o, userID u);
	virtual double qualityUpdate();
	virtual void weightUpdate();
public:
	smalltref(long unsigned int _objects, long unsigned int _users, long unsigned int _links,
	          double _beta, double _epsilon, double _convergence,
	          bool _verbose);
};

// class bigtref : public __tref
// Supposedly faster algorithm, but needs more memory.
// Extra allocation required may actually make it slower
// in many cases.
class bigtref : public __tref {
protected:
	std::vector< std::vector<tref::rating *> > objectRatings;
	std::vector< std::vector<tref::rating *> > userRatings;
	std::vector<unsigned long int> objectLinks;
	std::vector<unsigned long int> userLinks;
	virtual void iterationInit();
	virtual double qualityUpdate();
	virtual void weightUpdate();
public:
	bigtref(long unsigned int _objects, long unsigned int _users, long unsigned int _links,
	        double _beta, double _epsilon, double _convergence,
	        bool _verbose);
};

}

#endif /* __TREF_HPP__ */

#include <cmath>
#include <iostream>
#include <vector>
#include "tref.hpp"

namespace tref
{

// class rating
// public:
rating::rating(objectID _o, userID _u, double _value)
{
	o = _o;
	u = _u;
	value = _value;
}


// class __tref
// protected:
void __tref::addLink(objectID o, userID u) {}
void __tref::iterationInit() {}
double __tref::qualityUpdate() { return 0; }
void __tref::weightUpdate() {}

// class __tref
// public:
__tref::__tref(long unsigned int _objects, long unsigned int _users, long unsigned int _links,
               double _beta, double _epsilon, double _convergence,
               bool _verbose)
{
	verbose = _verbose;
	ratings.reserve(_links);
	if(verbose) {
		std::cout << "Reserved space for " << ratings.capacity() << " links." << std::endl;
		std::cout << "Maximum possible: " << ratings.max_size() << std::endl;
	}
	quality.resize(_objects,0);
	weight.resize(_users,1);
	beta = _beta;
	epsilon = _epsilon;
	convergence = _convergence;
}

void __tref::addRating(objectID o, userID u, double value)
{
	ratings.push_back(tref::rating(o,u,value));
	if(o>=quality.size())
		quality.resize(o+1,0);
	if(u>=weight.size())
		weight.resize(u+1,1);
	addLink(o,u);
}

unsigned int __tref::iterationCycle()
{
	iterationInit();
	
	if(verbose) {
		std::cout << "Performing iteration cycle with " << quality.size() << " objects, ";
		std::cout << weight.size() << " users" << std::endl;
		std::cout << "and " << ratings.size() << " links, ";
		std::cout << "beta = " << beta << ", epsilon = " << epsilon << std::endl;
		std::cout << "and convergence condition = " << convergence << std::endl;
	}
	
	unsigned int iter = 0;
	double qdiff = qualityUpdate();
	
	do {
		++iter;
		weightUpdate();
		qdiff = qualityUpdate();
	} while(qdiff > convergence);
	
	if(verbose)
		std::cout << "Exited in " << iter << " iterations with qdiff = " << qdiff << std::endl;
	
	return iter;
}

long unsigned int __tref::objects() { return quality.size(); }
long unsigned int __tref::users() { return weight.size(); }
long unsigned int __tref::links() { return ratings.size(); }
double __tref::objectQuality(objectID o) { return quality[o]; }
double __tref::userWeight(userID u) { return weight[u]; }

void __tref::printRatings()
{
	for(std::vector<tref::rating>::iterator r=ratings.begin();r!=ratings.end();++r)
		std::cout << r->o << "\t" << r->u << "\t" << r->value << std::endl;
}

// class smalltref : public __tref
// protected:
void smalltref::addLink(objectID o, userID u)
{
	if(o>=qualityLast.size()) {
		qualityLast.resize(o+1);
		weightSum.resize(o+1);
	}
	if(u>=userLinks.size())
		userLinks.resize(u+1,0);
	++(userLinks[u]);
}

double smalltref::qualityUpdate()
{
	double qdiff = 0;
	
	qualityLast.assign(quality.begin(),quality.end());
	quality.assign(quality.size(),0);
	weightSum.assign(weightSum.size(),0);
	
	for(std::vector<tref::rating>::iterator r=ratings.begin();r!=ratings.end();++r) {
		quality[r->o] += weight[r->u] * r->value;
		weightSum[r->o] += weight[r->u];
	}
	
	for(objectID o=0;o!=quality.size();++o) {
		if(weightSum[o]>0)
			quality[o] /= weightSum[o];
		double aux = quality[o]-qualityLast[o];
		qdiff += aux*aux;
	}
	
	return sqrt(qdiff);
}

void smalltref::weightUpdate()
{
	weight.assign(weight.size(),0);
	for(std::vector<tref::rating>::iterator r=ratings.begin();r!=ratings.end();++r) {
		double aux = r->value - quality[r->o];
		weight[r->u] += aux*aux;
	}
	for(userID u=0;u!=weight.size();++u) {
		if(userLinks[u]!=0)
			weight[u] = pow( ((weight[u]/userLinks[u]) + epsilon), -beta);
	}
}

// class smalltref : public __tref
// public:
smalltref::smalltref(long unsigned int _objects, long unsigned int _users, long unsigned int _links,
                     double _beta, double _epsilon, double _convergence,
                     bool _verbose) : __tref(_objects,_users,_links,
                                             _beta,_epsilon,_convergence,
                                             _verbose)
{
	qualityLast.resize(_objects);
	weightSum.resize(_objects);
	userLinks.resize(_users,0);
}


// class bigtref : public __tref
// protected:
void bigtref::iterationInit()
{
	for(objectID o=0;o!=objectRatings.size();++o)
		objectRatings[o].reserve(objectLinks[o]);
	for(userID u=0;u!=userRatings.size();++u)
		userRatings[u].reserve(userLinks[u]);
	for(std::vector<tref::rating>::iterator r=ratings.begin();r!=ratings.end();++r) {
		objectRatings[r->o].push_back(&(*r));
		userRatings[r->u].push_back(&(*r));
	}
}

double bigtref::qualityUpdate()
{
	double qdiff = 0;
	for(objectID o=0;o!=objectRatings.size();++o) {
		double qold = quality[o];
		double weightSum = quality[o] = 0;
		for(std::vector<tref::rating *>::iterator r=objectRatings[o].begin();r!=objectRatings[o].end();++r) {
			quality[o] += weight[(*r)->u] * (*r)->value;
			weightSum += weight[(*r)->u];
		}
		if(weightSum>0)
			quality[o] /= weightSum;
		qdiff += pow((quality[o]-qold),2);
	}
	return sqrt(qdiff);
}

void bigtref::weightUpdate()
{
	for(userID u=0;u!=userRatings.size();++u) {
		if(userRatings[u].size()>0) {
			weight[u] = 0;
			for(std::vector<tref::rating *>::iterator r=userRatings[u].begin();r!=userRatings[u].end();++r)
				weight[u] += pow( ((*r)->value - quality[(*r)->o]), 2);
			if(weight[u]==0)
				weight[u] = epsilon;
			weight[u] = pow( (weight[u]/userRatings[u].size()), -beta);
		}
	}
}

// class bigtref : public __tref
// public:
bigtref::bigtref(long unsigned int _objects, long unsigned int _users, long unsigned int _links,
                 double _beta, double _epsilon, double _convergence,
                 bool _verbose) : __tref(_objects,_users,_links,
                                         _beta,_epsilon,_convergence,
                                         _verbose)
{
	objectRatings.resize(_objects);
	userRatings.resize(_users);
	objectLinks.resize(_objects,0);
	userLinks.resize(_users,0);
}

}

#ifndef __TREFSIM_HPP__
#define __TREFSIM_HPP__

#include <cmath>
#include <set>
#include <utility>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_sort.h>
#include <gsl/gsl_statistics.h>
#include "tref.hpp"

namespace tref
{

template <class TREF> class sim {
protected:
	TREF *tref;
	gsl_rng *r;
	bool verbose;
	std::vector<double> objectQuality;
	std::vector<double> userAbility;
	unsigned long int objects;
	unsigned long int users;
	double links;
	double linksMax;
	double qualityScale;
	double abilityScale;
	double ratingScale;
	long unsigned int iter;

	virtual void generateObjectQuality()
	{
		for(std::vector<double>::iterator Q=objectQuality.begin();Q!=objectQuality.end();++Q)
			*Q = qualityScale * gsl_rng_uniform_pos(r);
	}

	virtual void generateUserAbility()
	{
		for(std::vector<double>::iterator sigma2=userAbility.begin();sigma2!=userAbility.end();++sigma2)
			*sigma2 = abilityScale * gsl_rng_uniform_pos(r);
	}

	virtual double rateObject(objectID o, userID u)
	{
		return ( ( ratingScale * userAbility[u] * ((2*gsl_rng_uniform(r))-1) ) + objectQuality[o] );
	}

	void generateRatings()
	{
		if(links!=linksMax) {
			std::vector<long unsigned int> objectLinks(objects,0);
			std::vector<long unsigned int> userLinks(users,0);
			std::set< std::pair<objectID,userID> > L;
			unsigned long int unlinkedObjects = 0, unlinkedUsers = 0;
			for(long unsigned int l=0;l!=links;++l) {
				std::pair<std::set< std::pair<objectID,userID> >::iterator,bool> ret;
				objectID o;
				userID u;
				do {
					o = gsl_rng_uniform_int(r,objects);
					u = gsl_rng_uniform_int(r,users);
					ret = L.insert(std::pair<objectID,userID>(o,u));
				} while(!(ret.second));
				tref->addRating(o,u,rateObject(o,u));
				++(objectLinks[o]);
				++(userLinks[u]);
				//std::cout << "Added a link between object " << o << " and user " << u << std::endl;
			}
			for(objectID o=0;o!=objects;++o) {
				if(objectLinks[o]==0) {
					//std::cout << "Object " << o << " has no links!" << std::endl;
					++unlinkedObjects;
				}
			}
			for(userID u=0;u!=users;++u) {
				if(userLinks[u]==0) {
					//std::cout << "User " << u << " has no links!" << std::endl;
					++unlinkedUsers;
				}
			}
			if(verbose && ((unlinkedObjects==0) || (unlinkedUsers==0))) {
				std::cout << unlinkedObjects << " objects and " << unlinkedUsers << " users ";
				std::cout << "have no links!" << std::endl;
			}
		} else {
			for(userID u=0;u!=users;++u)
				for(objectID o=0;o!=objects;++o)
					tref->addRating(o,u,rateObject(o,u));
		}
	}
public:
	sim(unsigned long int _objects, unsigned long int _users,
	    double sparsity, double beta, double epsilon, double convergence,
	    double _qualityScale, double _abilityScale, double _ratingScale,
	    gsl_rng *_r, bool _verbose)
	{
		verbose = _verbose;
		objects = _objects;
		users = _users;
		qualityScale = _qualityScale;
		abilityScale = _abilityScale;
		ratingScale = _ratingScale;
		objectQuality.resize(objects);
		userAbility.resize(users);
		r = _r;
		linksMax = ((double) objects)*((double) users);
		if(verbose)
			std::cout << "Maximum possible links: " << linksMax << std::endl;
		links = sparsity*linksMax;
		if(verbose)
			std::cout << "Actual number of links: " << links << std::endl;
		tref = new TREF(objects,users,links,beta,epsilon,convergence,verbose);
	}

	~sim()
	{
		delete tref;
	}

	void simulation()
	{
		generateObjectQuality();
		generateUserAbility();
		generateRatings();
		iter = tref->iterationCycle();
	}

	double qualityDelta()
	{
		double Delta = 0;
		for(objectID o=0;o!=objects;++o)
			Delta += pow((tref->objectQuality(o)-objectQuality[o]),2);
		return sqrt(Delta/objects);
	}

	long unsigned int iterations() { return iter; }
};


template <class TREF> class paretosim : public sim<TREF> {
	virtual double rateObject(objectID o, userID u)
	{
		double Cf = 2;
		if(this->ratingScale > 3)
			Cf = (this->ratingScale-2)*(this->ratingScale-3);
		double value = gsl_ran_pareto(this->r,this->ratingScale-1,sqrt(0.5*Cf*this->userAbility[u]));
		return (((gsl_rng_uniform(this->r)<0.5)?-value:value) + this->objectQuality[o]);
	}
public:
	paretosim(unsigned long int _objects, unsigned long int _users,
	          double sparsity, double beta, double epsilon, double convergence,
	          double _qualityScale, double _abilityScale, double _ratingScale,
	          gsl_rng *_r, bool _verbose) : sim<TREF> (_objects, _users,
                                                           sparsity, beta, epsilon, convergence,
                                                           _qualityScale, _abilityScale, _ratingScale,
                                                           _r, _verbose)
	{
	}
};


template <class TREF> class llsim : public sim<TREF> {
protected:
	virtual double rateObject(objectID o, userID u)
	{
		double value;
		do {
			value = sim<TREF>::rateObject(o,u);
		} while (value<0 || value > this->qualityScale);
		return value;
	}
public:
	llsim(unsigned long int _objects, unsigned long int _users,
	      double sparsity, double beta, double epsilon, double convergence,
	      double _qualityScale, double _abilityScale, double _ratingScale,
	      gsl_rng *_r, bool _verbose) : sim<TREF> (_objects, _users,
	                                               sparsity, beta, epsilon, convergence,
	                                               _qualityScale, _abilityScale, _ratingScale,
	                                               _r, _verbose)
	{
	}
};


template <class TREF> class intsim : public sim<TREF> {
protected:
	virtual double rateObject(objectID o, userID u)
	{
		double value = sim<TREF>::rateObject(o,u);
		value = floor(value+0.5);
		return value;
	}
public:
	intsim(unsigned long int _objects, unsigned long int _users,
	       double sparsity, double beta, double epsilon, double convergence,
	       double _qualityScale, double _abilityScale, double _ratingScale,
	       gsl_rng *_r, bool _verbose) : sim<TREF> (_objects, _users,
	                                                sparsity, beta, epsilon, convergence,
	                                                _qualityScale, _abilityScale, _ratingScale,
	                                                _r, _verbose)
	{
	}
};


template <template <class T> class TREFSIM, class TREF> class multisim {
protected:
	std::vector<double> qualityDelta;
	std::vector<double> iterations;
	double mean(std::vector<double> &v) { return gsl_stats_mean(&(v[0]),1,v.size()); }
	double median(std::vector<double> &v)
	{
		std::vector<double> w(v.begin(),v.end());
		gsl_sort(&(w[0]),1,w.size());
		return gsl_stats_median_from_sorted_data(&(w[0]),1,w.size());
	}
	double sd(std::vector<double> &v) { return gsl_stats_sd(&(v[0]),1,v.size()); }
	double min(std::vector<double> &v) { return gsl_stats_min(&(v[0]),1,v.size()); }
	double max(std::vector<double> &v) { return gsl_stats_max(&(v[0]),1,v.size()); }
public:
	multisim(unsigned long int sims,
	         unsigned long int objects, unsigned long int users,
	         double sparsity, double beta, double epsilon, double convergence,
	         double qualityScale, double abilityScale, double ratingScale,
	         gsl_rng *r, bool verbose)
	{
		long unsigned int iter_total = 0;
		qualityDelta.resize(sims);
		iterations.resize(sims);

		for(unsigned long int s=0;s!=sims;++s) {
			TREFSIM<TREF> sim(objects,users,
			                  sparsity,beta,epsilon,convergence,
			                  qualityScale,abilityScale,ratingScale,
			                  r,verbose);
			sim.simulation();
			qualityDelta[s] = sim.qualityDelta();
			iter_total += iterations[s] = sim.iterations();
			if(verbose)
				std::cout << "[" << s << "]\t" << qualityDelta[s] << "\t" << iterations[s] << std::endl;
		}
		if(verbose)
			std::cout << "Total iterations: " << iter_total << std::endl;
	}

	double qualityDeltaMean() { return mean(qualityDelta); }
	double qualityDeltaMedian() { return median(qualityDelta); }
	double qualityDeltaSD() { return sd(qualityDelta); }
	double qualityDeltaMin() { return min(qualityDelta); }
	double qualityDeltaMax() { return max(qualityDelta); }
	double iterationsMean() { return mean(iterations); }
	double iterationsMedian() { return median(iterations); }
	double iterationsSD() { return sd(iterations); }
	double iterationsMin() { return min(iterations); }
	double iterationsMax() { return max(iterations); }

	void stats()
	{
		std::cout << "\t" << qualityDeltaMean();
		std::cout << "\t" << qualityDeltaMin();
		std::cout << "\t" << qualityDeltaMax();
		std::cout << "\t" << iterationsMean();
		std::cout << "\t" << iterationsMin();
		std::cout << "\t" << iterationsMax() << std::endl;
	}
};

}

#endif /* __TREFSIM_HPP__ */

#include <iostream>
#include <gsl/gsl_rng.h>
#include "tref.hpp"
#include "trefsim.hpp"

int main()
{
	long unsigned int i, l;
	tref::smalltref sys(10,5,50,0.5,0,1e-12,true);
	unsigned int M = sys.objects(), N = sys.users();
	gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
	for(i=0;i<N;++i)
		for(l=0;l<M;++l)
			sys.addRating(l,i,gsl_rng_uniform_pos(r));
	std::cout << "Raw ratings: " << std::endl;
	sys.printRatings();
	std::cout << std::endl;
	i = sys.iterationCycle();
	std::cout << "Calculated optimum quality in " << i << " iterations..." << std::endl;
	for(l=0;l<M;++l)
		std::cout << "Object " << l << " weighted quality estimate: " << sys.objectQuality(l) << std::endl;
	std::cout << std::endl;

	gsl_rng_set(r,1001);
	tref::multisim<tref::sim,tref::smalltref>
		sim(/* sims */ 100,
		    /*objects*/ 1000, /*users*/ 1000,
		    /*sparsity*/ 1, /*beta*/ 0.8, /*epsilon*/ 1e-36, /*convergence*/ 1e-12,
		    /*qualityScale*/ 10, /*abilityScale*/ 1, /*ratingScale*/ 1,
		    /*rng*/ r, /*verbose*/ true);
	std::cout << "Delta = " << sim.qualityDeltaMean() << " +/- " << sim.qualityDeltaSD();
	std::cout << " calculated in " << sim.iterationsMean() << " +/- " << sim.iterationsSD() << " iterations." << std::endl;
	gsl_rng_free(r);
	return 0;
}

Re: Code speed (and back to the memory leaks...)

Reply via email to