Sorry, previously trying to explain from a smartphone!
Here's a toy example that shows an approximate 7x speed slow down if using
the V() accessor in a loop. I know using a loop in this way is pretty
nonsensical, but with my real data a loop is required as I make multiple
logical comparisons between several V() attributes and other external
data. In the second example below the speed increase is at the expense of
creating a new vector, vx, first. I want to avoid this if at all possible
as it seems inefficient to create copies of all the necessary V(g)
attributes in R memory:
n <- 100000
edges <- as.data.frame(cbind(from = (1:n)[order(runif(n))], to =
(1:n)[order(runif(n))]))
g <- graph.data.frame(edges, directed = TRUE)
V(g)$x <- floor(runif(length(V(g)), 1, 4))
## extract V(g)$x before loop (fast)
make.vector <- system.time(vx <- V(g)$x)
> make.vector
user system elapsed
0.007 0.000 0.00
y <- floor(runif(5000, 1, 4))
## directly query x in the loop using the V(g) accessor (slow)
res1 <- integer(length(y))
out1 <- system.time(for(i in 1:length(y))
{
res1[i] <- which(V(g)$x == y[i])[1]
})
> out1
user system elapsed
43.893 0.736 44.587
## use previously vectorized V(g)$x instead (fast)
res2 <- integer(length(y))
out2 <- system.time(for(i in 1:length(y))
{
res2[i] <- which(vx == y[i])[1]
})
> out2
user system elapsed
6.412 0.000 6.407
> all(res1 == res2)
TRUE
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora 20 (Heisenbug)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] CAMERA_1.26.0 igraph_1.0.1 xcms_1.46.0
[4] Biobase_2.30.0 ProtGenerics_1.2.1 BiocGenerics_0.16.1
[7] mzR_2.4.0 Rcpp_0.12.2
loaded via a namespace (and not attached):
[1] graph_1.48.0 Formula_1.2-1 cluster_2.0.3
[4] magrittr_1.5 MASS_7.3-45 splines_3.2.2
[7] munsell_0.4.2 colorspace_1.2-6 lattice_0.20-33
[10] stringr_1.0.0 plyr_1.8.3 tools_3.2.2
[13] nnet_7.3-11 grid_3.2.2 gtable_0.1.2
[16] latticeExtra_0.6-26 survival_2.38-3 RBGL_1.46.0
[19] digest_0.6.8 gridExtra_2.0.0 RColorBrewer_1.1-2
[22] reshape2_1.4.1 ggplot2_1.0.1 acepack_1.3-3.3
[25] codetools_0.2-14 rpart_4.1-10 stringi_1.0-1
[28] scales_0.3.0 Hmisc_3.17-0 stats4_3.2.2
[31] foreign_0.8-66 proto_0.3-10
sessionInfo()
>
> vx <- V(g)$x
>
>
> out <- system.time(for(i in 1:5000)
+ {
+ res <- which(V(g)$x == 2)
+ })
>
>
> out2 <- system.time(for(i in 1:5000)
+ {
+ res <- which(vx == 2)
+ })
>
On 15 February 2016 at 19:34, Gábor Csárdi <[email protected]> wrote:
> Hi, can you send a reproducible example? See
>
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
> Gabor
>
> On Mon, Feb 15, 2016 at 6:28 PM, Tony Larson <[email protected]>
> wrote:
> >
> > Hi,
> > I'm accessing a vertex attribute in R using V(g)$x, where x is a named
> > numeric attribute. If I do this for the whole graph (about 10e5
> vertices),
> > it takes a few ms to get a vector of x values,
> >
> > vx <- V(g)$x
> >
> > If I then use vx as a target vector in an R loop to search through about
> > 10e3 candidate y values for x, it takes maybe 100 ms,
> >
> > for(i in 1:length(y))
> > {
> > z <- which(vx > y[i])
> > }
> > However, if I substitute V(g)$x for vx INSIDE the loop, it takes about
> 5s
> > - more than 50x slower. Why is this?
> >
> > Thanks
> > Tony
> >
> > Dr. Tony R. Larson
> > CNAP
> > Department of Biology, Area 7
> > University of York
> > Wentworth Way
> > Heslington
> > York YO10 5DD
> > UK
> >
> > Tel: +44(0)1904 328 826 (office)
> > Tel: +44(0)7833 471 685 (mobile)
> >
> > [email protected]
> >
> > http://scholar.google.com/citations?user=9hLFka4AAAAJ
> >
> >
> >
> >
> > _______________________________________________
> > igraph-help mailing list
> > [email protected]
> > https://lists.nongnu.org/mailman/listinfo/igraph-help
> >
>
> _______________________________________________
> igraph-help mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/igraph-help
>
--
Dr. Tony R. Larson
CNAP
Department of Biology, Area 7
University of York
Wentworth Way
Heslington
York YO10 5DD
UK
Tel: +44(0)1904 328 826 (office)
Tel: +44(0)7833 471 685 (mobile)
[email protected]
http://scholar.google.com/citations?user=9hLFka4AAAAJ
_______________________________________________
igraph-help mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/igraph-help