On Fri, 6 Jun 2008, Daniel Folkinshteyn wrote:

 install.packages("profr")
 library(profr)
 p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
 plot(p)

 That should at least help you see where the slow bits are.

 Hadley

so profiling reveals that '[.data.frame' and '[[.data.frame' and '[' are the biggest timesuckers...

i suppose i'll try using matrices and see how that stacks up (since all my cols are numeric, should be a problem-free approach).

but i'm really wondering if there isn't some neat vectorized approach i could use to avoid at least one of the nested loops...



As far as a vectorized solution, I'll bet you could do ALL the lookups of non-issuers for all issuers with a single call to findInterval() (modulo some cleanup afterwards) , but the trickery needed to do that would make your code a bit opaque.

And in the end I doubt it would beat mapply() (read on...) by enough to make it worthwhile.

---

What you are doing is conditional on industry group and quarter.

So using

        indus.quarter <- with(tfdat,
                paste(as.character(DATE), as.character(HSICIG), sep=".")))

and then calls like this:

        split( <various> , indus.quater[ relevant.subset ] )

you can create:

        a list of all issuer market caps according to quarter and group,

        a list of all non-issuer caps (that satisfy your 'since quarter'
        restriction) according to quarter and group,

        a list of all non issuer indexes (i.e. row numbers) that satisfy
        that restriction according to quarter and group

Then you write a function that takes the elements of each list for a given quarter-industry group, looks up the matching non-issuers for each issuer, and returns their indexes.

findInterval() will allow you to do this lookup for all issuers in one industry group in a given quarter simultaneously and greatly speed this process (but you will need to deal with the possible non-uniqueness of the non-issuer caps - perhaps by adding a tiny jitter() to the values).

Then you feed the function and the lists to mapply().

The result is a list of indexes on the original data.frame. You can unsplit() this if you like, then use those indexes to build your final "result" data.frame.

HTH,

Chuck


p.s. and if this all seems like too much work, you should at least avoid needlessly creating data.frames. Specifically

reorder things so that

           industrypeers = <etc>

is only done ONCE for each industry group by quarter combination and change stuff like

nrow(industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ]) > 0

to

any( industrypeers$Market.Cap.13f >= arow$Market.Cap.13f )





______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]                  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to