Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had

Re: [R] Improving data processing efficiency

2008-06-06 Thread Patrick Burns
One thing that is likely to speed the code significantly is if you create 'result' to be its final size and then subscript into it. Something like: result[i, ] - bestpeer (though I'm not sure if 'i' is the proper index). Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: Hi everyone! I have a question about data processing

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
Its summarized in the last line to r-help. Note reproducible and minimal. On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following: Try reading the posting guide before posting. On

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
That is the last line of every message to r-help. On Fri, Jun 6, 2008 at 12:05 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Its summarized in the last line to r-help. Note reproducible and minimal. On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: i did!

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
i thought since the function code (which i provided in full) was pretty short, it would be reasonably easy to just read the code and see what it's doing. but ok, so... i am attaching a zip file, with a small sample of the data set (tab delimited), and the function code, in a zip file (posting

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
thanks for the tip! i'll try that and see how big of a difference that makes... if i am not sure what exactly the size will be, am i better off making it larger, and then later stripping off the blank rows, or making it smaller, and appending the missing rows? on 06/06/2008 11:44 AM Patrick

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
I think the posting guide may not be clear enough and have suggested that it be clarified. Hopefully this better communicates what is required and why in a shorter amount of space: https://stat.ethz.ch/pipermail/r-devel/2008-June/049891.html On Fri, Jun 6, 2008 at 1:25 PM, Daniel Folkinshteyn

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
just in case, uploaded it to the server, you can get the zip file i mentioned here: http://astro.temple.edu/~dfolkins/helplistfiles.zip on 06/06/2008 01:25 PM Daniel Folkinshteyn said the following: i thought since the function code (which i provided in full) was pretty short, it would be

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
Ok, sorry about the zip, then. :) Thanks for taking the trouble to clue me in as to the best posting procedure! well, here's a dput-ed version of the small data subset you can use for testing. below that, an updated version of the function, with extra explanatory comments, and producing an

Re: [R] Improving data processing efficiency

2008-06-06 Thread Patrick Burns
That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat Daniel Folkinshteyn

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
Cool, I do have an upper bound, so I'll try it and how much of a speedboost it gives me. Thanks for the suggestion! on 06/06/2008 02:03 PM Patrick Burns said the following: That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not

Re: [R] Improving data processing efficiency

2008-06-06 Thread Greg Snow
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns Sent: Friday, June 06, 2008 12:04 PM To: Daniel Folkinshteyn Cc: r-help@r-project.org Subject: Re: [R] Improving data processing efficiency That is going to be situation dependent

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck
On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns Sent: Friday, June 06, 2008 12:04 PM To: Daniel Folkinshteyn Cc: r-help@r-project.org Subject: Re: [R] Improving data

Re: [R] Improving data processing efficiency

2008-06-06 Thread Patrick Burns
] On Behalf Of Patrick Burns Sent: Friday, June 06, 2008 12:04 PM To: Daniel Folkinshteyn Cc: r-help@r-project.org Subject: Re: [R] Improving data processing efficiency That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from

Re: [R] Improving data processing efficiency

2008-06-06 Thread Greg Snow
-Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Friday, June 06, 2008 12:33 PM To: Greg Snow Cc: Patrick Burns; Daniel Folkinshteyn; r-help@r-project.org Subject: Re: [R] Improving data processing efficiency On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
Hmm... ok... so i ran the code twice - once with a preallocated result, assigning rows to it, and once with a nrow=0 result, rbinding rows to it, for the first 20 quarters. There was no speedup. In fact, running with a preallocated result matrix was slower than rbinding to the matrix: for

Re: [R] Improving data processing efficiency

2008-06-06 Thread hadley wickham
On Fri, Jun 6, 2008 at 5:10 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote: Hmm... ok... so i ran the code twice - once with a preallocated result, assigning rows to it, and once with a nrow=0 result, rbinding rows to it, for the first 20 quarters. There was no speedup. In fact, running with a

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
thanks for the suggestions! I'll play with this over the weekend and see what comes out. :) on 06/06/2008 06:48 PM Don MacQueen said the following: In a case like this, if you can possibly work with matrices instead of data frames, you might get significant speedup. (More accurately, I have

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
on 06/06/2008 06:55 PM hadley wickham said the following: Why not try profiling? The profr package provides an alternative display that I find more helpful than the default tools: install.packages(profr) library(profr) p - profr(fcn_create_nonissuing_match_by_quarterssinceissue(...)) plot(p)

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn
install.packages(profr) library(profr) p - profr(fcn_create_nonissuing_match_by_quarterssinceissue(...)) plot(p) That should at least help you see where the slow bits are. Hadley so profiling reveals that '[.data.frame' and '[[.data.frame' and '[' are the biggest timesuckers... i suppose

Re: [R] Improving data processing efficiency

2008-06-06 Thread Horace Tso
. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel Folkinshteyn Sent: Friday, June 06, 2008 4:35 PM To: hadley wickham Cc: r-help@r-project.org; Patrick Burns Subject: Re: [R] Improving data processing efficiency install.packages(profr) library(profr

Re: [R] Improving data processing efficiency

2008-06-06 Thread Esmail Bonakdarian
hadley wickham wrote: Hi, I tried this suggestion as I am curious about bottlenecks in my own R code ... Why not try profiling? The profr package provides an alternative display that I find more helpful than the default tools: install.packages(profr) install.packages(profr) Warning

Re: [R] Improving data processing efficiency

2008-06-06 Thread Esmail Bonakdarian
Esmail Bonakdarian wrote: hadley wickham wrote: Hi, I tried this suggestion as I am curious about bottlenecks in my own R code ... Why not try profiling? The profr package provides an alternative display that I find more helpful than the default tools: install.packages(profr)

Re: [R] Improving data processing efficiency

2008-06-06 Thread Charles C. Berry
On Fri, 6 Jun 2008, Daniel Folkinshteyn wrote: install.packages(profr) library(profr) p - profr(fcn_create_nonissuing_match_by_quarterssinceissue(...)) plot(p) That should at least help you see where the slow bits are. Hadley so profiling reveals that '[.data.frame' and '[[.data.frame'

Re: [R] Improving data processing efficiency

2008-06-06 Thread hadley wickham
install.packages(profr) Warning message: package 'profr' is not available I selected a different mirror in place of the Iowa one and it worked. Odd, I just assumed all the same packages are available on all mirrors. The Iowa mirror is rather out of date as the guy who was looking after

[R] Improving data processing efficiency

2008-06-05 Thread Daniel Folkinshteyn
Hi everyone! I have a question about data processing efficiency. My data are as follows: I have a data set on quarterly institutional ownership of equities; some of them have had recent IPOs, some have not (I have a binary flag set). The total dataset size is 700k+ rows. My goal is this:

Re: [R] Improving data processing efficiency

2008-06-05 Thread bartjoosen
Maybe you should provide a minimal, working code with data, so that we all can give it a try. In the mean time: take a look at the Rprof function to see where your code can be improved. Good luck Bart Daniel Folkinshteyn-2 wrote: Hi everyone! I have a question about data processing

Re: [R] Improving data processing efficiency

2008-06-05 Thread Daniel Folkinshteyn
Thanks, I'll take a look at Rprof... but I think what i'm missing is facility with R idiom to get around the looping, and no amount of profiling will help me with that :) also, full working code is provided in my original post (see toward the bottom). on 06/05/2008 03:43 PM bartjoosen said