Second installment, again looking for suggestions and additions. The
whole document is expected to be cumulated and uploaded to some wiki.

Part III will be about analysis of the data. Note that for lifetimes,
the example we are addressing here,
we have a combination of known data (or narrow interval censored data)
and pure right censored data, and that
recent hosts contribute to the right censored part in greater
proportion than old hosts. Note also that
Weibull and gamma distributions are the most appropriate for this kind
of data, from reliability theory.

=================================================================================
                    BOINC STATISTICAL RESEARCH HOW-TO, part II
=================================================================================


II  PREVIEWING/PLOTTING THE DATA

II.0 -- Preconditioning the data

In the typical session, the example datafile is all, or a random
selection, of s...@home public data, removing all the rows with
rpc_time or create_time equal to 0.

In bash, ie both Linux and Mac, you can rely on awk or sed, as well as
cat, nl, cut, paste, grep -o, grep -v, etc...

In Windows ? No idea :-(

II.1 --- Gnuplot Recipes

II.1.a -- A typical session

plot 'createrpcsin0.dat' using (int(($2-$1)/3600/24)):(1) smooth frequency
plot 'createrpcsin0.dat' using ($2-$1):(1):(3600*24) smooth kdensity

#gnuplot "histograms" do not support logscale. You need some external
adhoc utility :-(
#Lets get data straight from mysql...
plot '<mysql calculos -e "select
truncate((rpc_time-create_time)/3600/24,0) as lf, count(*) as c from
intervalos where rpc_time>create_time and create_time>0 group by lf" '
using 1:2
set logscale y
replot
wei(x) = N*(x**(alpha-1))*exp(-(x/delta)**alpha)
lwei(x)= logN + (alpha-1)* log(x) - (x/delta)**alpha
fit [1:1000]  lwei(x)  '<mysql mydatabase --skip-column-names -e
"select truncate((rpc_time-create_time)/3600/24,0) as lf, count(*) as
c from hosttable where rpc_time>create_time and create_time>0 group by
lf" ' using 1:(log($2+0.01)) via logN,alpha,delta
replot exp(lwei(x))

plot 'createrpcsin0.dat' using 1:2 with dots
#question: is it possible to do, in gnuplot, a density map from 2D scatter data?

II.1.b -- Generic tricks

New versions of gnuplot have the terminal type "canvas", for HTML5
compatible output.

Here 
http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Making_density_maps_using_Gnuplot
you can copypaste a python script to do density maps.

II.1.c -- Tricks when working with BOINC data

II.2 --- R Recipes

II.2.a -- A typical session

datos=read.table("createrpcsin0.dat", col.names=c("createtime",
"rpctime"), strip white=TRUE)
datos$lifetimes <- with(datos, rpctime-createtime)
summary(datos)
summary(datos$lifetimes/3600/24)
hist(datos$lifetimes)
plot(hist(datos$lifetimes)$counts, log="y", type="h")
plot(hist(datos$lifetimes,breaks=c(seq(0,max(datos$lifetimes)+3600*24*7,3600*24*7)))$counts,log="y",
type="h")
plot(density(datos$lifetimes))
plot(density(datos$lifetimes),log="y")
plot(density(datos$lifetimes/3600/24/7),log="y")
ds <-density(datos$lifetimes)
plot(ds$x, ds$y/(max(ds$x)-ds$x), log="y", type="h")
truncated <- datos$lifetimes[ (datos$rpctime<max(datos$rpctime)-3600*24*30 ) ]
ds2 <- density(truncated)
plot(ds2$x, ds2$y/(max(ds$x)-ds2$x), log="y", type="h")
plot(hist(datos$createtime,breaks=c(seq(min(datos$createtime),max(datos$createtime)+3600*24*7,3600*24*7)))$counts,
log="y", type="h")
plot(hist(datos$rpctime,breaks=c(seq(min(datos$rpctime),max(datos$rpctime)+3600*24*7,3600*24*7)))$counts,
log="y", type="h")

#
# It is possible to smooth 3d plots by using two different techniques,
sm.density from
# the library SM or kde2d from the library MASS. The former is not
available in all the distributions,
# the later needs a lot of memory
#
library(MASS) %alternative: sm.density, from library(SM)
crelifes.density <- kde2d(datos$createtime,datos$lifetimes)
#### Error: cannot allocate vector of size 521.0 Mb
crelifes.density <- kde2d(datos$createtime,datos$lifetimes,n=15)
contour(crelifes.density)
filled.contour(crelifes.density)
with(crelifes.density, contour(x,y,log(z)))
with(crelifes.density, contour(x,y,log(z),nlevels=150))
filled.contour(crelifes.density,nlevels=250)

II.2.b -- Generic tricks

Note that in a lot of results in R, the points to be plotted are
contained in two vectors x and y inside the structure, and you can
access them. So
you have density(..)$x and density(...)$y, for instance. You can use
summary(...) to get a view of such content.

II.2.c -- Tricks when working with BOINC data

II.3 -- Octave recipes ?
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to