Anybody have any thoughts on this? Please? :)
on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
Hi everyone!
I have a question about data processing efficiency.
My data are as follows: I have a data set on quarterly institutional
ownership of equities; some of them have had
One thing that is likely to speed the code significantly
is if you create 'result' to be its final size and then
subscript into it. Something like:
result[i, ] - bestpeer
(though I'm not sure if 'i' is the proper index).
Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
Try reading the posting guide before posting.
On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote:
Anybody have any thoughts on this? Please? :)
on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
Hi everyone!
I have a question about data processing
i did! what did i miss?
on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
Try reading the posting guide before posting.
On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote:
Anybody have any thoughts on this? Please? :)
on 06/05/2008 02:09 PM Daniel
Its summarized in the last line to r-help. Note reproducible and
minimal.
On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote:
i did! what did i miss?
on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
Try reading the posting guide before posting.
On
That is the last line of every message to r-help.
On Fri, Jun 6, 2008 at 12:05 PM, Gabor Grothendieck
[EMAIL PROTECTED] wrote:
Its summarized in the last line to r-help. Note reproducible and
minimal.
On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED]
wrote:
i did!
i thought since the function code (which i provided in full) was pretty
short, it would be reasonably easy to just read the code and see what
it's doing.
but ok, so... i am attaching a zip file, with a small sample of the data
set (tab delimited), and the function code, in a zip file (posting
thanks for the tip! i'll try that and see how big of a difference that
makes... if i am not sure what exactly the size will be, am i better off
making it larger, and then later stripping off the blank rows, or making
it smaller, and appending the missing rows?
on 06/06/2008 11:44 AM Patrick
I think the posting guide may not be clear enough and have suggested that
it be clarified. Hopefully this better communicates what is required and why
in a shorter amount of space:
https://stat.ethz.ch/pipermail/r-devel/2008-June/049891.html
On Fri, Jun 6, 2008 at 1:25 PM, Daniel Folkinshteyn
just in case, uploaded it to the server, you can get the zip file i
mentioned here:
http://astro.temple.edu/~dfolkins/helplistfiles.zip
on 06/06/2008 01:25 PM Daniel Folkinshteyn said the following:
i thought since the function code (which i provided in full) was pretty
short, it would be
Ok, sorry about the zip, then. :) Thanks for taking the trouble to clue
me in as to the best posting procedure!
well, here's a dput-ed version of the small data subset you can use for
testing. below that, an updated version of the function, with extra
explanatory comments, and producing an
That is going to be situation dependent, but if you
have a reasonable upper bound, then that will be
much easier and not far from optimal.
If you pick the possibly too small route, then increasing
the size in largish junks is much better than adding
a row at a time.
Pat
Daniel Folkinshteyn
Cool, I do have an upper bound, so I'll try it and how much of a
speedboost it gives me. Thanks for the suggestion!
on 06/06/2008 02:03 PM Patrick Burns said the following:
That is going to be situation dependent, but if you
have a reasonable upper bound, then that will be
much easier and not
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns
Sent: Friday, June 06, 2008 12:04 PM
To: Daniel Folkinshteyn
Cc: r-help@r-project.org
Subject: Re: [R] Improving data processing efficiency
That is going to be situation dependent
On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow [EMAIL PROTECTED] wrote:
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns
Sent: Friday, June 06, 2008 12:04 PM
To: Daniel Folkinshteyn
Cc: r-help@r-project.org
Subject: Re: [R] Improving data
] On Behalf Of Patrick Burns
Sent: Friday, June 06, 2008 12:04 PM
To: Daniel Folkinshteyn
Cc: r-help@r-project.org
Subject: Re: [R] Improving data processing efficiency
That is going to be situation dependent, but if you have a
reasonable upper bound, then that will be much easier and not
far from
-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
Sent: Friday, June 06, 2008 12:33 PM
To: Greg Snow
Cc: Patrick Burns; Daniel Folkinshteyn; r-help@r-project.org
Subject: Re: [R] Improving data processing efficiency
On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow
Hmm... ok... so i ran the code twice - once with a preallocated result,
assigning rows to it, and once with a nrow=0 result, rbinding rows to
it, for the first 20 quarters. There was no speedup. In fact, running
with a preallocated result matrix was slower than rbinding to the matrix:
for
On Fri, Jun 6, 2008 at 5:10 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote:
Hmm... ok... so i ran the code twice - once with a preallocated result,
assigning rows to it, and once with a nrow=0 result, rbinding rows to it,
for the first 20 quarters. There was no speedup. In fact, running with a
thanks for the suggestions! I'll play with this over the weekend and see
what comes out. :)
on 06/06/2008 06:48 PM Don MacQueen said the following:
In a case like this, if you can possibly work with matrices instead of
data frames, you might get significant speedup.
(More accurately, I have
on 06/06/2008 06:55 PM hadley wickham said the following:
Why not try profiling? The profr package provides an alternative
display that I find more helpful than the default tools:
install.packages(profr)
library(profr)
p - profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
plot(p)
install.packages(profr)
library(profr)
p - profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
plot(p)
That should at least help you see where the slow bits are.
Hadley
so profiling reveals that '[.data.frame' and '[[.data.frame' and '[' are
the biggest timesuckers...
i suppose
.
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel
Folkinshteyn
Sent: Friday, June 06, 2008 4:35 PM
To: hadley wickham
Cc: r-help@r-project.org; Patrick Burns
Subject: Re: [R] Improving data processing efficiency
install.packages(profr)
library(profr
hadley wickham wrote:
Hi,
I tried this suggestion as I am curious about bottlenecks in my own
R code ...
Why not try profiling? The profr package provides an alternative
display that I find more helpful than the default tools:
install.packages(profr)
install.packages(profr)
Warning
Esmail Bonakdarian wrote:
hadley wickham wrote:
Hi,
I tried this suggestion as I am curious about bottlenecks in my own
R code ...
Why not try profiling? The profr package provides an alternative
display that I find more helpful than the default tools:
install.packages(profr)
On Fri, 6 Jun 2008, Daniel Folkinshteyn wrote:
install.packages(profr)
library(profr)
p - profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
plot(p)
That should at least help you see where the slow bits are.
Hadley
so profiling reveals that '[.data.frame' and '[[.data.frame'
install.packages(profr)
Warning message:
package 'profr' is not available
I selected a different mirror in place of the Iowa one and it
worked. Odd, I just assumed all the same packages are available
on all mirrors.
The Iowa mirror is rather out of date as the guy who was looking after
Hi everyone!
I have a question about data processing efficiency.
My data are as follows: I have a data set on quarterly institutional
ownership of equities; some of them have had recent IPOs, some have not
(I have a binary flag set). The total dataset size is 700k+ rows.
My goal is this:
Maybe you should provide a minimal, working code with data, so that we all
can give it a try.
In the mean time: take a look at the Rprof function to see where your code
can be improved.
Good luck
Bart
Daniel Folkinshteyn-2 wrote:
Hi everyone!
I have a question about data processing
Thanks, I'll take a look at Rprof... but I think what i'm missing is
facility with R idiom to get around the looping, and no amount of
profiling will help me with that :)
also, full working code is provided in my original post (see toward the
bottom).
on 06/05/2008 03:43 PM bartjoosen said
30 matches
Mail list logo