On a related note, there's one other amazingly stupid thing that Excel
(2002 SP3) does - it exports to CSV the numbers as you see them
displayed, and not as they were entered/imported in the first place.
For example, 1.2345678 will be exported to CSV/tab delimited as 1.23
if that column is
Here's one way,
lapply(split(DF, your.vector), function(x) {apply(x, 2, sum)})
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Daniel O'Shea
Sent: Tuesday, August 21, 2007 3:53 PM
To: r-help@stat.math.ethz.ch
Subject: [R] summing columns of data
Don't rush to buy new hardware yet (other than perhaps more RAM for
your existing desktop). First of all you should make sure that your R
code can't be made any faster. (I've seen cases where careful
re-writes increased speed by a factor of 10 or more.) There are some
rules (such as pre-allocate
With regards to your concern - export the R object to a MySQL table
(the RMySQL documentation tells you how), then run an inner join. Or
if the table to query isn't that big, pull it in R and subset it with
%in%. You could use system.time() to see which runs faster.
-Original Message-
I find it easier to install all the packages again:
#---run in previous version
packages - installed.packages()[,Package]
save(packages, file=Rpackages)
#---run in new version
load(Rpackages)
for (p in setdiff(packages, installed.packages()[,Package]))
install.packages(p)
-Original
(1)Institutions (not only academia) using R
http://www.r-project.org/useR-2006/participants.html
(2)Hardware requirements, possibly benchmarks
Since you mention huge data sets, GNU/Linux running on 64-bit machines
with as much RAM as your budget allows.
(3)R clusters, R multiple CPU
This is a bad idea as it can greatly slow things down (the details
were discussed several times on this list). What you want to do is
define from the start the length of your vector/list, then grow it (by
a large margin) only if it becomes full.
lst - vector(mode=list, length=10) #assuming
See ?cut for continuous variables, and ?factor, ?levels for the others.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of lamack lamack
Sent: Tuesday, March 06, 2007 12:49 PM
To: R-help@stat.math.ethz.ch
Subject: [R] R and SAS proc format
Dear all,
The problem with your code is that it doesn't check for errors. See
?try, ?tryCatch. For example:
my.download - function(forloop) {
notok - vector()
for (i in forloop) {
cdaily - try(blpGetData(...))
if (class(cdaily) == try-error) {
notok - c(notok, i)
} else {
days - seq(as.Date(1970/1/1), as.Date(2003/12/31), days)
temp - rnorm(length(days), mean=10, sd=8)
tapply(temp, format(days,%Y-%m), mean)
tapply(temp, format(days,%b), mean)
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Majid Iravani
Sent:
One option for processing very large files with R is split:
## split a large file into pieces
#--parameters: the folder, file and number of parts
FLD=/home/user/data
F=very_large_file.dat
parts=50
#---split
cd $FLD
fn=`echo $F | awk -F\. '{print $1}'` #file name without extension
Hello, I don't understand the behavior of apply() on the data frame below.
test -
structure(list(Date = structure(c(13361, 13361, 13361, 13361,
13361, 13361, 13361, 13361, 13362, 13362, 13362, 13362, 13362,
13362, 13362, 13362, 13363, 13363, 13363, 13363, 13363, 13363,
13363, 13363, 13364, 13364,
Not sure about R, but for a Perl example check
http://yosucker.sourceforge.net/ .
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tudor Bodea
Sent: Monday, January 08, 2007 11:53 AM
To: r-help@stat.math.ethz.ch
Cc: Tudor Bodea
Subject: [R] Access,
Dear useRs,
I have a few hundred plots that I'd like to export to one document.
pdf() isn't an option, because the file created is prohibitively huge
(due to scatter plots with many points). So I have to use png()
instead, but then I end up with a lot of files (would prefer just
one).
1. Is
Nevermind the CPU usage, the likely problem is that your queries are
inefficient in one or more ways (i.e., you don't use indexes when you
really should - it's impossible to guess without knowing how the data
and the queries look like, which somehow you've decided are not
important enough to
If you're on Windows switch to
http://www.copernic.com/en/products/desktop-search/index.html ,
last time I looked it was quite a lot better than Google Desktop Search.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Farrel
Buchinsky
Sent: Wednesday,
Read up on the discrete Fourier transform:
http://en.wikipedia.org/wiki/Discrete_Fourier_transform
http://en.wikipedia.org/wiki/Frequency_spectrum#Spectrum_analysis
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Randy Zelick
Sent: Tuesday, December
Does any one know of comparisons of the Pentium 9x0, Pentium(r)
Extreme/Core 2 Duo, AMD(r) Athlon(r) 64 , AMD(r) Athlon(r) 64
FX/Dual Core AM2 and similar chips when used for this kind of work.
I think your best option, by far, is to answer the question on your
own. Put R and your programs on
This was asked before. Collapse the data frame into a vector, e.g.
v - apply(DF,1,function(x) {paste(x,collapse=_)})
then work with the values of that vector (table, unique etc). If your
data frame is really large run this in a DBMS.
-Original Message-
From: [EMAIL PROTECTED]
What is it that you don't know how to do? Loop over the matrices from
the 2 lists and merge them two by two, for example
AB - list() ; id - 1
for (i in 1:length(A)) for (j in 1:length(B)) {
AB[[id]] - merge(A[[i]],B[[j]],...)
id - id + 1
}
To better keep track of who's who, you may want to
Forget about assign() Co. Search R-help for 'assign', read the
documentation on lists, and realize that it's quite a lot better to
use lists for this kind of stuff.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Scionforbai
Sent: Wednesday, October
I haven't seen the first book (DAAG) mentioned so far, I have it and
think it's very good. Anyway, I recommend you buy all R books (and
perhaps take some extra time off to study them): your employer can
well afford that, given the cash you're saving by not using
proprietary software.
With regards to your first question, here's a function I used a couple
of times to get plots similar to those you're looking for. (Search the
list for how to find the source code. Also, there's a reference other
than MASS on the ?rpart page.)
#bogdan romocea 2006-06
#adapted source code from
A function I've been using for a while returned a surprising [to me,
given the data] error recently:
Error in plot.window(xlim, ylim, log, asp, ...) :
Logarithmic axis must have positive limits
After some digging I realized what was going on:
x - c(10460.97, 10808.67, 29499.98, 1,
One obvious alternative is an SQL join, which you could do directly in
a DBMS, or from R via RMySQL / RSQLite /... Keep in mind that creating
indexes on user/userid before the join may save a lot of time.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf
You forgot to mention your OS. This was asked before and if I recall
correctly the answer for Windows was no. An acceptable solution (imho)
is to edit the Rprofile.site files and add something like
pngplotwidth - 990 ; pngplotheight - 700
pdfplotwidth - 14 ; pdfplotheight - 10
Then, use these
A simple function will do what you want, customize this as needed:
lprint - function(lst,prefix)
{
for (i in 1:length(lst)) {
cat(paste(prefix,$,names(lst)[i],sep=),\n)
print(lst[[i]])
cat(\n)
}
}
P - list(A=a,B=b)
lprint(P,Prefix)
-Original Message-
From: [EMAIL PROTECTED]
Dear useRs,
I'd like to produce some scatter plots where N units on the X axis are
equal to N units on the Y axis (as measured with a ruler, on screen or
paper). This approach
x - sample(10:200,40) ; y - sample(20:100,40)
windows(width=max(x),height=max(y))
plot(x,y)
is better than
By far, the cheapest and easiest solution (and the very first to try)
is to add more memory. The cost depends on what kind you need, but
here's for example 2 GB you can buy for only $150:
http://www.newegg.com/Product/Product.asp?Item=N82E16820144157
Project constraints?! If they don't want to
It's possible and straightforward (just don't use R). IMHO the GNU
Core Utilities
http://www.gnu.org/software/coreutils/
plus a few other tools such as sed, awk, grep etc are much more
appropriate than R for processing massive text files. (Get a good book
about UNIX shell scripting. On Windows you
One option is
library(R2HTML)
?HTML.cormat
The thing you're after is traffic highlighting (via CSS or HTML tags).
If HTML.cormat() doesn't do exactly what you want, modify the source
code. (By the way, I haven't used R2HTML so far so maybe there's a
more appropriate function.)
-Original
Not sure about your data set, but if you have some kind of
(weighted/stratified) sample of hospitals you need to pay special
attention. Survey data violates the assumptions of the classical
linear models (infinite population, identically distributed errors
etc) and needs to be analyzed
-14 at 16:47 -0400, bogdan romocea wrote:
Here's an example. By the way, I find that it's more convenient (where
applicable) to keep the data in 3 vectors/factors rather than one
matrix/data frame.
a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10))
x - y - z
Here's an example. By the way, I find that it's more convenient (where
applicable) to keep the data in 3 vectors/factors rather than one
matrix/data frame.
a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10))
x - y - z - vector()
for (i in 1:nrow(a)) {
x -
I wouldn't use a DBMS at all -- it is not necessary and I don't see
what you would get in return. Instead I would split very large log
files into a number of pieces so that each piece fits in memory (see
below for an example), then process them in a loop. See the list and
the documentation if you
Compare
system.time({
v - vector()
for (i in 1:10^5) v - c(v,1)
})
with
system.time({
v - vector(length=10^5)
for (i in 1:10^5) v[i] - 1
})
If you don't know exactly how long v will be, use a value that's large
enough, then throw away what's extra.
-Original Message-
Macro stuff à la SAS is something that should be avoided whenever
possible - it's messy, limited, and limiting. (I've done it
ocasionally and it works, but I think it's best not to go there.) Read
the documentation on lists (in particular named lists), and keep
everything in one or more lists. For
Repeated merge()-ing does not always increase the space requirements
linearly. Keep in mind that a join between two tables where the same
value appears M and N times will produce M*N rows for that particular
value. My guess is that the number of rows in atot explodes because
you have some
Your approach seems very inefficient - it looks like you're executing
thousands of update statements. Try something like this instead:
#---build a table 'updates' (id and value)
...
#---do all updates via a single left join
UPDATE bigtable a LEFT JOIN updates b
ON a.id = b.id
SET a.col1 = b.value;
I'll see if I can reproduce the steps under Knoppix[1]. Then you can
run Knoppix with a Persistent Disk Image (PDI)[2] that contains R,
the DBI, and RMySQL on just about any machine that runs Knoppix.
Don't bother, it's been done already. See
http://dirk.eddelbuettel.com/quantian.html
This goes the other way - all SQL manipulations are a subset of what
can be done with R. Read up on indexing and see ?merge, ?aggregate,
?by, ?tapply, among others. (For the R equivalent to your query, check
?grep and ?order, and search the list if needed.) Also, this example
might be a good
Here's an example.
dfr - data.frame(A1=1:10,A2=21:30,B1=31:40,B2=41:50)
vars - colnames(dfr)
for (v in vars[grep(B,vars)]) print(mean(dfr[,v]))
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Farrel
Buchinsky
Sent: Wednesday, May 03, 2006 10:46 AM
plot(1:10,axes=FALSE)
axis(1,at=1:10,labels=10:1)
axis(2,at=1:10,labels=5*10:1)
box()
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Christopher Brown
Sent: Tuesday, May 02, 2006 12:13 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Axis labels
I
Another good option is SQL, the fastest and most scalable solution. If
you decide to give it a try pay close attention to indexes.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve Miller
Sent: Monday, May 01, 2006 8:55 AM
To: 'Guojun Zhu';
There is an aspect, worthy of careful consideration, you don't seem to
be aware of. I'll ask the question for you: How does the
explanatory/predictive potential of a dataset vary as the dataset gets
larger and larger?
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL
I agree it would be worthwhile to make some cosmetic changes to
r-project.org (nothing fancy though - no javascript, Flash etc). The
general public may not be fully aware of how R compares to other
statistical software, and I doubt that a web site which looks like it
was put together 10 years ago
Forget about R for now and port the application to MySQL/PostgreSQL
etc, it is possible and worthwhile. In case you happen to use (and
really need) some SAS DATA STEP looping features you might be forced
to look into SQL cursors, otherwise the port should be (very)
straightforward.
Here's an example.
lst - list()
for (i in 1:5) {
lst[[i]] - data.frame(v=sample(1:20,10),sample(1:5,10,replace=TRUE))
colnames(lst[[i]])[2] - paste(x,i,sep=)
}
dfr - lst[[1]]
for (i in 2:length(lst)) dfr - merge(dfr,lst[[i]],all=TRUE)
dfr - dfr[order(dfr[,1]),]
print(dfr)
Installing R on SuSE 10.0 may be less than trivial for a beginner (I
ended up compiling GCC plus 3-4 other things). In case you lose your
patience I'd suggest trying Mepis Linux: it's very easy to install and
the package management GUI (Synaptic) is great. Installing R together
with a bunch of R
Apparently you do not understand the point, and seem to (want to) see
patterns all over the place. A good start for the treatment of this
interesting disease is 'Fooled by Randomness' by Nassim Nicholas
Taleb. The main point of the book is that many things may be a lot
more random than one might
There are several kinds of standardization, and 'normalization' is
only one of them. For some details you could check
http://support.sas.com/91doc/getDoc/statug.hlp/stdize_index.htm
(see Details for standardization methods).
Standardization is required prior to clustering to control for the
Adapt the function below to suit your needs. If you really want to
plot 5 minutes at a time, round the time series to the last MM:00
times (where MM is in 5*0:11) and have idx below loop over them.
splitplot - function(x,points)
{
boundaries - c(1,points*1:floor(length(x)/points),length(x))
for
?assign, but _don't_ use it; lists are better.
dfr - list()
for(j in 1:9) {
dfr[[as.character(j)]] - ...
}
Don't try to imitate the limited macro approach of other software
(e.g. SAS). You can do all that in R, but it's much simpler and much
safer to rely on list indexing and functions that
\r is a carriage return character which some editors may use as a line
terminator when writing files. My guess is that RSQLite writes your
data frame to a temp file using \r as a line terminator and then runs
a script to have SQLite import the data (together with \r - this would
be the problem),
For a general solution without warnings try
interleave - function(v1,v2)
{
ord1 - 2*(1:length(v1))-1
ord2 - 2*(1:length(v2))
c(v1,v2)[order(c(ord1,ord2))]
}
interleave(rep(1,5),rep(3,8))
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gabor
Here's one way,
x - data.frame(V=c(1,1,1,1,2,2,4,4,4,9,10,10,10,10,10))
y - data.frame(V=c(2,9,10))
xy - merge(x,y,all=FALSE)
Pay close attention to what happens if you have duplicate values in y, say
y - data.frame(V=c(2,9,10,10))
-Original Message-
From: [EMAIL PROTECTED]
t1 - as.data.frame(table(1:10)) ; colnames(t1)[2] - A
t2 - as.data.frame(table(5:20)) ; colnames(t2)[2] - B
t3 - merge(t1,t2,all=TRUE)
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Eric Pante
Sent: Tuesday, February 07, 2006 4:22 PM
To:
Here's another approach which can be easily implemented in SQL.
1. Start with the dates as character vectors,
dt - as.character(Sys.time())
2. Extract the minutes and round them to 0,15,30,45:
minutes - floor(as.numeric(substr(dt,15,16))/15)*15
final.mins - as.character(minutes)
By the way, you might find this sed one-liner useful:
sed -n '11981q;11970,11980p' filename.txt
It will print the offending line and its neighbors. If you're on
Windows you need to install Windows Services For Unix or Cygwin.
-Original Message-
From: [EMAIL PROTECTED]
See
http://en.wikipedia.org/wiki/Levenshtein_distance
http://thread.gmane.org/gmane.comp.lang.r.general/31499
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Werner
Wernersen
Sent: Tuesday, January 10, 2006 2:00 PM
To: Gabor Grothendieck
Cc:
Dear useRs,
I got stuck trying to generate a palette of topographic colors that
would satisfy these two requirements:
- the pallete must be 'anchored' at 0 (just like on a map), with
light blue/lawn green corresponding to data values close to 0 (dark
blue to light blue for negative values,
ronggui wrote:
If i am familiar with
database software, using database (and R) is the best choice,but
convert the file into database format is not an easy job for me.
Good working knowledge of a DBMS is almost invaluable when it comes to
working with very large data sets. In addition, learning
Peter Muhlberger wrote:
But, there is a second point here, which is how difficult it
was for me [...] to find what seem to me like standard key
features I've taken for granted in other packages.
There is another side to this. Don't consider only how difficult it
was to find what you were
Check the way you imported the data / the SQLite documentation. The
\r\n that you see (you're on Windows, right?) is used to indicate the
end of the data lines in the source file - \r is a carriage return,
and \n is a new line character.
-Original Message-
From: [EMAIL PROTECTED]
In fact it's just as easy in Internet Explorer: right-click + Open in
New Window, or Shift-Click, followed by Ctrl+D. Or, right-click + Add
to Favorites.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Charles Annis, P.E.
Sent: Monday, January 02,
Your 2-million loop is overkill, because apparently in the (vast)
majority of cases you don't need to loop at all. You could try
something like this:
1. Split the price by id, e.g.
price.list - split(price,id)
For each id,
2a. When price is not NA, assign it to next price _without_ using a
for
Here's one approach,
v1 - sample(c(-1,0,1),30,replace=TRUE)
v2 - sample(c(0.05,0,0.1),30,replace=TRUE)
lst - split(v1,v2)
counted - lapply(lst,table)
mat - do.call(rbind,counted)
print(counted)
print(mat)
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf
Are you talking about Rgui on Windows? Use the shortcut, Alt-F-N.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ronnie
Babigumira
Sent: Wednesday, December 28, 2005 9:21 AM
To: R Help
Subject: [R] Open a new script from R command prompt
Hi, (this is a
Sean Davis wrote:
but you will have to create the table by hand
There's no need for manual steps. To take advantage of MySQL's
extremely fast 'load data infile' you could dump the data in CSV
format, write a script for mysql (the command line tool), for example
q - function(table,infile)
{
That was just an example -- it's not difficult to write an R function
to generate the mysql create table syntax for a data frame with 60 or
600 columns. (BTW, I would never type 67 columns.)
On 12/12/05, Sean Davis [EMAIL PROTECTED] wrote:
On 12/12/05 9:21 AM, bogdan romocea [EMAIL PROTECTED
What do you need a bunch of functions for? I'm not familiar with the
details of difftime objects, however an easy way out of here is to get
the time difference in seconds, which you can then add or subtract as
you please from date-times.
x-Sys.time(); y-Sys.time()+3600
diff -
What if the distributions are not normal etc? You might want to try a
simulation to get an answer. Draw random samples from each
distribution (without assuming normality etc - one way to do this is
to get the quantiles, then draw a sample of quantiles, then draw a
value from each quantile), throw
Don't use assign(), named lists are much better (check the stuff on
indexing lists). Here's an example:
a - list()
a[[one]] - c(1,2,3)
a[[two]] - c(4,5,6)
a[[two]]
do.call(rbind,a)
do.call(cbind,a)
lapply(a,sum)
With regards to your question, did you try printing varname[i] in your
loop to see
Here's a function that you can customize to fit your needs. lst is a named list.
multicomp - function(lst)
{
clr - c(darkgreen,red,blue,brown,magenta)
alldens - lapply(lst,function(x) {density(x,from=min(x),to=max(x))})
allx - sapply(alldens,function(d) {d$x})
ally - sapply(alldens,function(d)
Leaf Sun wrote:
The histogram is highly screwed to the right, say, the range
of the vector is [0, 2], but 95% of the value is squeezed in
the interval (0.01, 0.2).
I guess the histogram is as you wrote. See
http://web.maths.unsw.edu.au/~tduong/seminars/intro2kde/
for a short explanation.
Assuming you don't end up with too many clusters, you could take the
classification and use it as the target for a tree, random forest,
discriminant analysis or multinomial logistic regression. The random
forest may be the best option.
-Original Message-
From: alessandro carletti
Those are obviously days, not seconds. A simple test would have
answered your question:
test - strptime(20051026 15:26:19,format=%Y%m%d %H:%M:%S) -
strptime(20051024 16:23:01,format=%Y%m%d %H:%M:%S)
class(test)
test
cat(test,\n)
If you prefer you can use difftime for conversion:
Welcome to R. See
?merge
then
?aggregate
or
require(Hmisc)
?summarize
or
?by
You can probably find many examples in the archives, if needed.
-Original Message-
From: Michael Graber [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 25, 2005 3:45 PM
To: R-Mailingliste
Here's one approach.
values - c(rnorm(1000,-5,1),rnorm(1000,10,0.5))
boxplot(values)
text(1,0,labels=better use violin plots,col=red)
#--
require(vioplot)
vioplot(values)
text(1,0,labels=better than box plots,col=red,pos=4)
-Original Message-
From: Keith Sabol [mailto:[EMAIL
Simple addition and subtraction works as well:
as.Date(1995/12/01,format=%Y/%m/%d) + 30
If you have datetime values you can use
strptime(1995-12-01 08:00:00,format=%Y-%m-%d %H:%M:%S) + 30*24*3600
where 30*24*3600 = 30 days expressed in seconds.
-Original Message-
From: Marc
Dear useRs,
I'm wondering why the for() loop below runs slower as it progresses.
On a Win XP box, the iterations at the beginning run much faster than
those at the end:
1%, iteration 2000, 10:10:16
2%, iteration 4000, 10:10:17
3%, iteration 6000, 10:10:17
98%, iteration 196000, 10:24:04
99%,
Nevermind, I found the fix. Declaring the length for out eliminates
the performance decrease,
out - vector(mode=numeric,length=length(test))
On 10/10/05, bogdan romocea [EMAIL PROTECTED] wrote:
Dear useRs,
I'm wondering why the for() loop below runs slower as it progresses.
On a Win XP
Dear useRs,
Is there a way to 'properly' format %d when plotting more than one
page on png()? 'Properly' means to me with leading 0s, so that the
PNGs become easy to navigate in a file/image browser. Lacking a better
solution I ended up using the code below, but would much prefer
something like
A related comment - don't rely (too much) on boxplots. They show only
a few things, which may be limiting in many cases and completely
misleading in others. Here are a couple of suggestions for plots which
you may find more useful than the standard box plots:
- figure 3.27 from
Dear useRs,
I'm having a hard time installing RMySQL on a FC4 x86_64 box (R 2.1.0
and MySQL 4.1.11-2 installed through yum). After an initial
configuration error (could not find the MySQL installation include
and/or library directories) I managed to install RMySQL with
# export
Most powerful in what way? Quite a lot depends on the jobs you're going to run.
- To run CPU-bound jobs, more CPUs is better. (Even though R doesn't
do threading, you can manually split some CPU-bound jobs in several
parts and run them simultaneously.) Apart from multiple CPUs and
One solution is
test - c(1.11,10.11,11.11,113.31,114.2,114.3)
id - unlist(lapply(strsplit(test,[.]),function(x) {x[2]}))
-Original Message-
From: Bernd Weiss [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 12:10 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Regular
This appears to be an SQL issue. Look for a way to speed up your
queries in Postgresql. I presume you haven't created an index on
'index', which means that every time you run your SELECT, Postgresql
is forced to do a full table scan (not good). If the index doesn't
solve the problem, look for some
The first one is an index, not a data set. Anyway, just use SAS to
export the data sets in text format (CSV, tab-delimited etc). You can
then easily read those in R. (By the way, the help for read.xport says
that 'The file must be in SAS XPORT format.' Is .sas7bdat an XPORT
file? Hint: no.)
You need the day to convert to a date format. Assuming day=15:
x.date - as.Date(paste(as.character(x),-15,sep=),format=%Y-%m-%d)
-Original Message-
From: alessandro carletti [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 10, 2005 9:37 AM
To: rHELP
Subject: [R] date format
There's something else you could try - since you can't hide the code,
obfuscate it. Hide the real thing in a large pile of useless,
complicated, awfully formatted code that would stop anyone except the
most desperate (including yourself, after a couple of weeks/months)
from trying to understand
If happenat is not a datetime value, convert it with strptime(). Then,
one solution is to transform it in the following way:
num.time - as.numeric(format(happenat,%Y%m%d%H%M%S))
This way, 07/22/05 00:05:14 becomes 20050722000514, and you can subset
your data frame with
dfr[which(num.time =
never close the
connection after a query.)
hth,
b.
-Original Message-
From: Thieme, Lutz [mailto:[EMAIL PROTECTED]
Sent: Friday, July 22, 2005 2:04 AM
To: bogdan romocea
Cc: R-help@stat.math.ethz.ch
Subject: Re: [R] Rprof fails in combination with RMySQL
Hello Bogdan
So your conclusion is that the only choice is to make mistakes and get
in trouble. (That's what Excel excels at.)
Two options I haven't seen mentioned are:
1. Create your deliverables in HTML format, and change the extension
from .htm to .xls; Excel will import them automatically. The way the
I think you're barking up the wrong tree. Optimize the MySQL code
separately from optimizing the R code. A very nice reference about the
former is http://highperformancemysql.com/. Also, if possible, do
everything in MySQL.
hth,
b.
-Original Message-
From: Thieme, Lutz [mailto:[EMAIL
How about avoiding SAS XPORT altogether and exporting everything in
the simple, clean, non-proprietary, extremely reliable,
platform-independent ... etc text format (CSV, tab delimited etc)?
-Original Message-
From: Nelson, Gary (FWE) [mailto:[EMAIL PROTECTED]
Sent: Thursday, July
Why don't you do the simulations in SAS? If you prefer otherwise,
setup the SAS code for running in batch mode (output and log
redirection), then call it from R with (on Windows, untested)
system(start ' ' C:\etc\sas.exe -sysin garch.sas)
To keep the parameters from the estimate, have the SAS job
The best 3 things you can do in this situation are:
1. don't use Excel.
2. never use Excel.
3. never ever use Excel again.
Spreadsheets are _not_ databases. In particular, Excel is a time bomb
- use it long enough and you'll get burned (perhaps without even
realizing it). See
It may be better to do this in SQL. The code below works for an
arbitrary number of IDs and handles missing values.
test - data.frame(id=rep(c(1,2),10),date=sort(c(1:10,1:10)),ret=0.01*-9:10)
idret - list()
ids - sort(unique(test$id))
for (i in ids) {
idret[[as.character(i)]] -
Dear useRs,
I timed the same code (simulation with for loops) on the same box
(dual Xeon EM64T, 1.5 Gb RAM) under 3 OSs and was surprised by the
results:
Windows XP Pro (32-bit): Time difference of 5.97 mins
64-bit GNU/Linux (Fedora Core 4): Time difference of 6.97 mins
32-bit
1 - 100 of 167 matches
Mail list logo