Re: [R] Excel

2007-08-28 Thread bogdan romocea
On a related note, there's one other amazingly stupid thing that Excel (2002 SP3) does - it exports to CSV the numbers as you see them displayed, and not as they were entered/imported in the first place. For example, 1.2345678 will be exported to CSV/tab delimited as 1.23 if that column is

Re: [R] summing columns of data frame by group

2007-08-21 Thread bogdan romocea
Here's one way, lapply(split(DF, your.vector), function(x) {apply(x, 2, sum)}) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel O'Shea Sent: Tuesday, August 21, 2007 3:53 PM To: r-help@stat.math.ethz.ch Subject: [R] summing columns of data

Re: [R] Speed up R

2007-06-21 Thread bogdan romocea
Don't rush to buy new hardware yet (other than perhaps more RAM for your existing desktop). First of all you should make sure that your R code can't be made any faster. (I've seen cases where careful re-writes increased speed by a factor of 10 or more.) There are some rules (such as pre-allocate

Re: [R] RMySQL question, sql with R vector or list

2007-06-05 Thread bogdan romocea
With regards to your concern - export the R object to a MySQL table (the RMySQL documentation tells you how), then run an inner join. Or if the table to query isn't that big, pull it in R and subset it with %in%. You could use system.time() to see which runs faster. -Original Message-

Re: [R] upgrade to 2.5

2007-05-03 Thread bogdan romocea
I find it easier to install all the packages again: #---run in previous version packages - installed.packages()[,Package] save(packages, file=Rpackages) #---run in new version load(Rpackages) for (p in setdiff(packages, installed.packages()[,Package])) install.packages(p) -Original

Re: [R] Reasons to Use R

2007-04-06 Thread bogdan romocea
(1)Institutions (not only academia) using R http://www.r-project.org/useR-2006/participants.html (2)Hardware requirements, possibly benchmarks Since you mention huge data sets, GNU/Linux running on 64-bit machines with as much RAM as your budget allows. (3)R clusters, R multiple CPU

Re: [R] How to create a list that grows automatically

2007-03-09 Thread bogdan romocea
This is a bad idea as it can greatly slow things down (the details were discussed several times on this list). What you want to do is define from the start the length of your vector/list, then grow it (by a large margin) only if it becomes full. lst - vector(mode=list, length=10) #assuming

Re: [R] R and SAS proc format

2007-03-06 Thread bogdan romocea
See ?cut for continuous variables, and ?factor, ?levels for the others. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of lamack lamack Sent: Tuesday, March 06, 2007 12:49 PM To: R-help@stat.math.ethz.ch Subject: [R] R and SAS proc format Dear all,

Re: [R] Fatigued R

2007-02-13 Thread bogdan romocea
The problem with your code is that it doesn't check for errors. See ?try, ?tryCatch. For example: my.download - function(forloop) { notok - vector() for (i in forloop) { cdaily - try(blpGetData(...)) if (class(cdaily) == try-error) { notok - c(notok, i) } else {

Re: [R] How can I calculate conditional mean in a large dataset including date data

2007-02-01 Thread bogdan romocea
days - seq(as.Date(1970/1/1), as.Date(2003/12/31), days) temp - rnorm(length(days), mean=10, sd=8) tapply(temp, format(days,%Y-%m), mean) tapply(temp, format(days,%b), mean) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Majid Iravani Sent:

Re: [R] sequential processing

2007-01-22 Thread bogdan romocea
One option for processing very large files with R is split: ## split a large file into pieces #--parameters: the folder, file and number of parts FLD=/home/user/data F=very_large_file.dat parts=50 #---split cd $FLD fn=`echo $F | awk -F\. '{print $1}'` #file name without extension

[R] hiccup in apply?

2007-01-19 Thread bogdan romocea
Hello, I don't understand the behavior of apply() on the data frame below. test - structure(list(Date = structure(c(13361, 13361, 13361, 13361, 13361, 13361, 13361, 13361, 13362, 13362, 13362, 13362, 13362, 13362, 13362, 13362, 13363, 13363, 13363, 13363, 13363, 13363, 13363, 13363, 13364, 13364,

Re: [R] Access, Process and Read Information from Web Sites

2007-01-09 Thread bogdan romocea
Not sure about R, but for a Perl example check http://yosucker.sourceforge.net/ . -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tudor Bodea Sent: Monday, January 08, 2007 11:53 AM To: r-help@stat.math.ethz.ch Cc: Tudor Bodea Subject: [R] Access,

[R] export many plots to one file

2007-01-04 Thread bogdan romocea
Dear useRs, I have a few hundred plots that I'd like to export to one document. pdf() isn't an option, because the file created is prohibitively huge (due to scatter plots with many points). So I have to use png() instead, but then I end up with a lot of files (would prefer just one). 1. Is

Re: [R] loading data and executing queries with R and Mysql

2007-01-03 Thread bogdan romocea
Nevermind the CPU usage, the likely problem is that your queries are inefficient in one or more ways (i.e., you don't use indexes when you really should - it's impossible to guess without knowing how the data and the queries look like, which somehow you've decided are not important enough to

Re: [R] Google Desktop Search and R script files

2006-12-27 Thread bogdan romocea
If you're on Windows switch to http://www.copernic.com/en/products/desktop-search/index.html , last time I looked it was quite a lot better than Google Desktop Search. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Farrel Buchinsky Sent: Wednesday,

Re: [R] fit sine?

2006-12-19 Thread bogdan romocea
Read up on the discrete Fourier transform: http://en.wikipedia.org/wiki/Discrete_Fourier_transform http://en.wikipedia.org/wiki/Frequency_spectrum#Spectrum_analysis -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Randy Zelick Sent: Tuesday, December

Re: [R] CPU or memory

2006-11-07 Thread bogdan romocea
Does any one know of comparisons of the Pentium 9x0, Pentium(r) Extreme/Core 2 Duo, AMD(r) Athlon(r) 64 , AMD(r) Athlon(r) 64 FX/Dual Core AM2 and similar chips when used for this kind of work. I think your best option, by far, is to answer the question on your own. Put R and your programs on

Re: [R] memory management

2006-10-30 Thread bogdan romocea
This was asked before. Collapse the data frame into a vector, e.g. v - apply(DF,1,function(x) {paste(x,collapse=_)}) then work with the values of that vector (table, unique etc). If your data frame is really large run this in a DBMS. -Original Message- From: [EMAIL PROTECTED]

Re: [R] match lists

2006-10-30 Thread bogdan romocea
What is it that you don't know how to do? Loop over the matrices from the 2 lists and merge them two by two, for example AB - list() ; id - 1 for (i in 1:length(A)) for (j in 1:length(B)) { AB[[id]] - merge(A[[i]],B[[j]],...) id - id + 1 } To better keep track of who's who, you may want to

Re: [R] Automatic File Reading

2006-10-18 Thread bogdan romocea
Forget about assign() Co. Search R-help for 'assign', read the documentation on lists, and realize that it's quite a lot better to use lists for this kind of stuff. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scionforbai Sent: Wednesday, October

Re: [R] Book recommendation for newbie to stats and R?

2006-10-18 Thread bogdan romocea
I haven't seen the first book (DAAG) mentioned so far, I have it and think it's very good. Anyway, I recommend you buy all R books (and perhaps take some extra time off to study them): your employer can well afford that, given the cash you're saving by not using proprietary software.

Re: [R] Some questions on Rpart algorithm

2006-10-17 Thread bogdan romocea
With regards to your first question, here's a function I used a couple of times to get plots similar to those you're looking for. (Search the list for how to find the source code. Also, there's a reference other than MASS on the ?rpart page.) #bogdan romocea 2006-06 #adapted source code from

[R] unexpected behavior of boxplot(x, notch=TRUE, log=y)

2006-10-05 Thread bogdan romocea
A function I've been using for a while returned a surprising [to me, given the data] error recently: Error in plot.window(xlim, ylim, log, asp, ...) : Logarithmic axis must have positive limits After some digging I realized what was going on: x - c(10460.97, 10808.67, 29499.98, 1,

Re: [R] Alternatives to merge for large data sets?

2006-09-07 Thread bogdan romocea
One obvious alternative is an SQL join, which you could do directly in a DBMS, or from R via RMySQL / RSQLite /... Keep in mind that creating indexes on user/userid before the join may save a lot of time. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf

Re: [R] screen resolution effects on graphics

2006-08-28 Thread bogdan romocea
You forgot to mention your OS. This was asked before and if I recall correctly the answer for Windows was no. An acceptable solution (imho) is to edit the Rprofile.site files and add something like pngplotwidth - 990 ; pngplotheight - 700 pdfplotwidth - 14 ; pdfplotheight - 10 Then, use these

Re: [R] prefixing list names in print

2006-08-08 Thread bogdan romocea
A simple function will do what you want, customize this as needed: lprint - function(lst,prefix) { for (i in 1:length(lst)) { cat(paste(prefix,$,names(lst)[i],sep=),\n) print(lst[[i]]) cat(\n) } } P - list(A=a,B=b) lprint(P,Prefix) -Original Message- From: [EMAIL PROTECTED]

[R] scatter plot with axes drawn on the same scale

2006-07-28 Thread bogdan romocea
Dear useRs, I'd like to produce some scatter plots where N units on the X axis are equal to N units on the Y axis (as measured with a ruler, on screen or paper). This approach x - sample(10:200,40) ; y - sample(20:100,40) windows(width=max(x),height=max(y)) plot(x,y) is better than

Re: [R] how to use large data set ?

2006-07-20 Thread bogdan romocea
By far, the cheapest and easiest solution (and the very first to try) is to add more memory. The cost depends on what kind you need, but here's for example 2 GB you can buy for only $150: http://www.newegg.com/Product/Product.asp?Item=N82E16820144157 Project constraints?! If they don't want to

Re: [R] Is it possible to only read a subset by read.table ?

2006-07-12 Thread bogdan romocea
It's possible and straightforward (just don't use R). IMHO the GNU Core Utilities http://www.gnu.org/software/coreutils/ plus a few other tools such as sed, awk, grep etc are much more appropriate than R for processing massive text files. (Get a good book about UNIX shell scripting. On Windows you

Re: [R] print color

2006-07-10 Thread bogdan romocea
One option is library(R2HTML) ?HTML.cormat The thing you're after is traffic highlighting (via CSS or HTML tags). If HTML.cormat() doesn't do exactly what you want, modify the source code. (By the way, I haven't used R2HTML so far so maybe there's a more appropriate function.) -Original

Re: [R] modeling logit(y/n) using lrm

2006-06-16 Thread bogdan romocea
Not sure about your data set, but if you have some kind of (weighted/stratified) sample of hospitals you need to pay special attention. Survey data violates the assumptions of the classical linear models (infinite population, identically distributed errors etc) and needs to be analyzed

Re: [R] bubbleplot for matrix

2006-06-15 Thread bogdan romocea
-14 at 16:47 -0400, bogdan romocea wrote: Here's an example. By the way, I find that it's more convenient (where applicable) to keep the data in 3 vectors/factors rather than one matrix/data frame. a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10)) x - y - z

Re: [R] bubbleplot for matrix

2006-06-14 Thread bogdan romocea
Here's an example. By the way, I find that it's more convenient (where applicable) to keep the data in 3 vectors/factors rather than one matrix/data frame. a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10)) x - y - z - vector() for (i in 1:nrow(a)) { x -

Re: [R] R usage for log analysis

2006-06-12 Thread bogdan romocea
I wouldn't use a DBMS at all -- it is not necessary and I don't see what you would get in return. Instead I would split very large log files into a number of pieces so that each piece fits in memory (see below for an example), then process them in a loop. See the list and the documentation if you

Re: [R] progressive slowdown during script execution?

2006-06-01 Thread bogdan romocea
Compare system.time({ v - vector() for (i in 1:10^5) v - c(v,1) }) with system.time({ v - vector(length=10^5) for (i in 1:10^5) v[i] - 1 }) If you don't know exactly how long v will be, use a value that's large enough, then throw away what's extra. -Original Message-

Re: [R] Manipulating code?

2006-05-23 Thread bogdan romocea
Macro stuff à la SAS is something that should be avoided whenever possible - it's messy, limited, and limiting. (I've done it ocasionally and it works, but I think it's best not to go there.) Read the documentation on lists (in particular named lists), and keep everything in one or more lists. For

Re: [R] win2k memory problem with merge()'ing repeatedly (long email)

2006-05-22 Thread bogdan romocea
Repeated merge()-ing does not always increase the space requirements linearly. Keep in mind that a join between two tables where the same value appears M and N times will produce M*N rows for that particular value. My guess is that the number of rows in atot explodes because you have some

Re: [R] Fast update of a lot of records in a database?

2006-05-19 Thread bogdan romocea
Your approach seems very inefficient - it looks like you're executing thousands of update statements. Try something like this instead: #---build a table 'updates' (id and value) ... #---do all updates via a single left join UPDATE bigtable a LEFT JOIN updates b ON a.id = b.id SET a.col1 = b.value;

Re: [R] Using DBI and RMySQL

2006-05-11 Thread bogdan romocea
I'll see if I can reproduce the steps under Knoppix[1]. Then you can run Knoppix with a Persistent Disk Image (PDI)[2] that contains R, the DBI, and RMySQL on just about any machine that runs Knoppix. Don't bother, it's been done already. See http://dirk.eddelbuettel.com/quantian.html

Re: [R] SQL like manipulations on data frames

2006-05-05 Thread bogdan romocea
This goes the other way - all SQL manipulations are a subset of what can be done with R. Read up on indexing and see ?merge, ?aggregate, ?by, ?tapply, among others. (For the R equivalent to your query, check ?grep and ?order, and search the list if needed.) Also, this example might be a good

Re: [R] Listing Variables

2006-05-03 Thread bogdan romocea
Here's an example. dfr - data.frame(A1=1:10,A2=21:30,B1=31:40,B2=41:50) vars - colnames(dfr) for (v in vars[grep(B,vars)]) print(mean(dfr[,v])) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Farrel Buchinsky Sent: Wednesday, May 03, 2006 10:46 AM

Re: [R] Axis labels

2006-05-02 Thread bogdan romocea
plot(1:10,axes=FALSE) axis(1,at=1:10,labels=10:1) axis(2,at=1:10,labels=5*10:1) box() -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Brown Sent: Tuesday, May 02, 2006 12:13 PM To: r-help@stat.math.ethz.ch Subject: [R] Axis labels I

Re: [R] efficiency in merging two data frames

2006-05-01 Thread bogdan romocea
Another good option is SQL, the fastest and most scalable solution. If you decide to give it a try pay close attention to indexes. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Miller Sent: Monday, May 01, 2006 8:55 AM To: 'Guojun Zhu';

Re: [R] regression modeling

2006-04-25 Thread bogdan romocea
There is an aspect, worthy of careful consideration, you don't seem to be aware of. I'll ask the question for you: How does the explanatory/predictive potential of a dataset vary as the dataset gets larger and larger? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL

Re: [R] www.r-project.org

2006-04-25 Thread bogdan romocea
I agree it would be worthwhile to make some cosmetic changes to r-project.org (nothing fancy though - no javascript, Flash etc). The general public may not be fully aware of how R compares to other statistical software, and I doubt that a web site which looks like it was put together 10 years ago

Re: [R] Considering port of SAS application to R

2006-04-21 Thread bogdan romocea
Forget about R for now and port the application to MySQL/PostgreSQL etc, it is possible and worthwhile. In case you happen to use (and really need) some SAS DATA STEP looping features you might be forced to look into SQL cursors, otherwise the port should be (very) straightforward.

Re: [R] Need R code

2006-04-21 Thread bogdan romocea
Here's an example. lst - list() for (i in 1:5) { lst[[i]] - data.frame(v=sample(1:20,10),sample(1:5,10,replace=TRUE)) colnames(lst[[i]])[2] - paste(x,i,sep=) } dfr - lst[[1]] for (i in 2:length(lst)) dfr - merge(dfr,lst[[i]],all=TRUE) dfr - dfr[order(dfr[,1]),] print(dfr)

Re: [R] I am surprised (and a little irritated)

2006-04-19 Thread bogdan romocea
Installing R on SuSE 10.0 may be less than trivial for a beginner (I ended up compiling GCC plus 3-4 other things). In case you lose your patience I'd suggest trying Mepis Linux: it's very easy to install and the package management GUI (Synaptic) is great. Installing R together with a bunch of R

Re: [R] Multivariate linear regression

2006-04-06 Thread bogdan romocea
Apparently you do not understand the point, and seem to (want to) see patterns all over the place. A good start for the treatment of this interesting disease is 'Fooled by Randomness' by Nassim Nicholas Taleb. The main point of the book is that many things may be a lot more random than one might

Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)

2006-04-06 Thread bogdan romocea
There are several kinds of standardization, and 'normalization' is only one of them. For some details you could check http://support.sas.com/91doc/getDoc/statug.hlp/stdize_index.htm (see Details for standardization methods). Standardization is required prior to clustering to control for the

Re: [R] create a gui with a button to change graphic?

2006-03-20 Thread bogdan romocea
Adapt the function below to suit your needs. If you really want to plot 5 minutes at a time, round the time series to the last MM:00 times (where MM is in 5*0:11) and have idx below loop over them. splitplot - function(x,points) { boundaries - c(1,points*1:floor(length(x)/points),length(x)) for

Re: [R] renaming dataframe1 using column names from dataframe2?

2006-03-17 Thread bogdan romocea
?assign, but _don't_ use it; lists are better. dfr - list() for(j in 1:9) { dfr[[as.character(j)]] - ... } Don't try to imitate the limited macro approach of other software (e.g. SAS). You can do all that in R, but it's much simpler and much safer to rely on list indexing and functions that

Re: [R] \r with RSQLite

2006-03-15 Thread bogdan romocea
\r is a carriage return character which some editors may use as a line terminator when writing files. My guess is that RSQLite writes your data frame to a temp file using \r as a line terminator and then runs a script to have SQLite import the data (together with \r - this would be the problem),

Re: [R] Interleaving elements of two vectors?

2006-03-07 Thread bogdan romocea
For a general solution without warnings try interleave - function(v1,v2) { ord1 - 2*(1:length(v1))-1 ord2 - 2*(1:length(v2)) c(v1,v2)[order(c(ord1,ord2))] } interleave(rep(1,5),rep(3,8)) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor

Re: [R] dataframe subset

2006-02-08 Thread bogdan romocea
Here's one way, x - data.frame(V=c(1,1,1,1,2,2,4,4,4,9,10,10,10,10,10)) y - data.frame(V=c(2,9,10)) xy - merge(x,y,all=FALSE) Pay close attention to what happens if you have duplicate values in y, say y - data.frame(V=c(2,9,10,10)) -Original Message- From: [EMAIL PROTECTED]

Re: [R] matching tables

2006-02-07 Thread bogdan romocea
t1 - as.data.frame(table(1:10)) ; colnames(t1)[2] - A t2 - as.data.frame(table(5:20)) ; colnames(t2)[2] - B t3 - merge(t1,t2,all=TRUE) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Eric Pante Sent: Tuesday, February 07, 2006 4:22 PM To:

Re: [R] 15-min mean values

2006-02-02 Thread bogdan romocea
Here's another approach which can be easily implemented in SQL. 1. Start with the dates as character vectors, dt - as.character(Sys.time()) 2. Extract the minutes and round them to 0,15,30,45: minutes - floor(as.numeric(substr(dt,15,16))/15)*15 final.mins - as.character(minutes)

Re: [R] read.table problem

2006-01-25 Thread bogdan romocea
By the way, you might find this sed one-liner useful: sed -n '11981q;11970,11980p' filename.txt It will print the offending line and its neighbors. If you're on Windows you need to install Windows Services For Unix or Cygwin. -Original Message- From: [EMAIL PROTECTED]

Re: [R] matching country name tables from different sources

2006-01-10 Thread bogdan romocea
See http://en.wikipedia.org/wiki/Levenshtein_distance http://thread.gmane.org/gmane.comp.lang.r.general/31499 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Werner Wernersen Sent: Tuesday, January 10, 2006 2:00 PM To: Gabor Grothendieck Cc:

[R] need palette of topographic colors similar to topo.colors()

2006-01-07 Thread bogdan romocea
Dear useRs, I got stuck trying to generate a palette of topographic colors that would satisfy these two requirements: - the pallete must be 'anchored' at 0 (just like on a map), with light blue/lawn green corresponding to data values close to 0 (dark blue to light blue for negative values,

Re: [R] Suggestion for big files [was: Re: A comment about R:]

2006-01-05 Thread bogdan romocea
ronggui wrote: If i am familiar with database software, using database (and R) is the best choice,but convert the file into database format is not an easy job for me. Good working knowledge of a DBMS is almost invaluable when it comes to working with very large data sets. In addition, learning

Re: [R] Wald tests and Huberized variances (was: A comment about R:)

2006-01-05 Thread bogdan romocea
Peter Muhlberger wrote: But, there is a second point here, which is how difficult it was for me [...] to find what seem to me like standard key features I've taken for granted in other packages. There is another side to this. Don't consider only how difficult it was to find what you were

Re: [R] Q about RSQLite

2006-01-03 Thread bogdan romocea
Check the way you imported the data / the SQLite documentation. The \r\n that you see (you're on Windows, right?) is used to indicate the end of the data lines in the source file - \r is a carriage return, and \n is a new line character. -Original Message- From: [EMAIL PROTECTED]

Re: [R] bookmarking a page inside r-project.org

2006-01-03 Thread bogdan romocea
In fact it's just as easy in Internet Explorer: right-click + Open in New Window, or Shift-Click, followed by Ctrl+D. Or, right-click + Add to Favorites. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charles Annis, P.E. Sent: Monday, January 02,

Re: [R] For loop gets exponentially slower as dataset gets larger...

2006-01-03 Thread bogdan romocea
Your 2-million loop is overkill, because apparently in the (vast) majority of cases you don't need to loop at all. You could try something like this: 1. Split the price by id, e.g. price.list - split(price,id) For each id, 2a. When price is not NA, assign it to next price _without_ using a for

Re: [R] Count or summary data

2005-12-30 Thread bogdan romocea
Here's one approach, v1 - sample(c(-1,0,1),30,replace=TRUE) v2 - sample(c(0.05,0,0.1),30,replace=TRUE) lst - split(v1,v2) counted - lapply(lst,table) mat - do.call(rbind,counted) print(counted) print(mat) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf

Re: [R] Open a new script from R command prompt

2005-12-28 Thread bogdan romocea
Are you talking about Rgui on Windows? Use the shortcut, Alt-F-N. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ronnie Babigumira Sent: Wednesday, December 28, 2005 9:21 AM To: R Help Subject: [R] Open a new script from R command prompt Hi, (this is a

Re: [R] export from R to MySQL

2005-12-12 Thread bogdan romocea
Sean Davis wrote: but you will have to create the table by hand There's no need for manual steps. To take advantage of MySQL's extremely fast 'load data infile' you could dump the data in CSV format, write a script for mysql (the command line tool), for example q - function(table,infile) {

Re: [R] export from R to MySQL

2005-12-12 Thread bogdan romocea
That was just an example -- it's not difficult to write an R function to generate the mysql create table syntax for a data frame with 60 or 600 columns. (BTW, I would never type 67 columns.) On 12/12/05, Sean Davis [EMAIL PROTECTED] wrote: On 12/12/05 9:21 AM, bogdan romocea [EMAIL PROTECTED

Re: [R] date/time arithmetic

2005-11-30 Thread bogdan romocea
What do you need a bunch of functions for? I'm not familiar with the details of difftime objects, however an easy way out of here is to get the time difference in seconds, which you can then add or subtract as you please from date-times. x-Sys.time(); y-Sys.time()+3600 diff -

Re: [R] OT: Statistics question

2005-11-30 Thread bogdan romocea
What if the distributions are not normal etc? You might want to try a simulation to get an answer. Draw random samples from each distribution (without assuming normality etc - one way to do this is to get the quantiles, then draw a sample of quantiles, then draw a value from each quantile), throw

Re: [R] assign() problem

2005-11-23 Thread bogdan romocea
Don't use assign(), named lists are much better (check the stuff on indexing lists). Here's an example: a - list() a[[one]] - c(1,2,3) a[[two]] - c(4,5,6) a[[two]] do.call(rbind,a) do.call(cbind,a) lapply(a,sum) With regards to your question, did you try printing varname[i] in your loop to see

Re: [R] newbie graphics question: Two density plots in same frame ?

2005-11-03 Thread bogdan romocea
Here's a function that you can customize to fit your needs. lst is a named list. multicomp - function(lst) { clr - c(darkgreen,red,blue,brown,magenta) alldens - lapply(lst,function(x) {density(x,from=min(x),to=max(x))}) allx - sapply(alldens,function(d) {d$x}) ally - sapply(alldens,function(d)

Re: [R] Visualizing a Data Distribution -- Was: breaks in hist()

2005-11-02 Thread bogdan romocea
Leaf Sun wrote: The histogram is highly screwed to the right, say, the range of the vector is [0, 2], but 95% of the value is squeezed in the interval (0.01, 0.2). I guess the histogram is as you wrote. See http://web.maths.unsw.edu.au/~tduong/seminars/intro2kde/ for a short explanation.

Re: [R] clustering

2005-10-28 Thread bogdan romocea
Assuming you don't end up with too many clusters, you could take the classification and use it as the target for a tree, random forest, discriminant analysis or multinomial logistic regression. The random forest may be the best option. -Original Message- From: alessandro carletti

Re: [R] How to convert time to days

2005-10-26 Thread bogdan romocea
Those are obviously days, not seconds. A simple test would have answered your question: test - strptime(20051026 15:26:19,format=%Y%m%d %H:%M:%S) - strptime(20051024 16:23:01,format=%Y%m%d %H:%M:%S) class(test) test cat(test,\n) If you prefer you can use difftime for conversion:

Re: [R] data.frame-question

2005-10-25 Thread bogdan romocea
Welcome to R. See ?merge then ?aggregate or require(Hmisc) ?summarize or ?by You can probably find many examples in the archives, if needed. -Original Message- From: Michael Graber [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 3:45 PM To: R-Mailingliste

Re: [R] Boxplot labels

2005-10-20 Thread bogdan romocea
Here's one approach. values - c(rnorm(1000,-5,1),rnorm(1000,10,0.5)) boxplot(values) text(1,0,labels=better use violin plots,col=red) #-- require(vioplot) vioplot(values) text(1,0,labels=better than box plots,col=red,pos=4) -Original Message- From: Keith Sabol [mailto:[EMAIL

Re: [R] adding 1 month to a date

2005-10-12 Thread bogdan romocea
Simple addition and subtraction works as well: as.Date(1995/12/01,format=%Y/%m/%d) + 30 If you have datetime values you can use strptime(1995-12-01 08:00:00,format=%Y-%m-%d %H:%M:%S) + 30*24*3600 where 30*24*3600 = 30 days expressed in seconds. -Original Message- From: Marc

[R] decreasing performance of for() loop

2005-10-10 Thread bogdan romocea
Dear useRs, I'm wondering why the for() loop below runs slower as it progresses. On a Win XP box, the iterations at the beginning run much faster than those at the end: 1%, iteration 2000, 10:10:16 2%, iteration 4000, 10:10:17 3%, iteration 6000, 10:10:17 98%, iteration 196000, 10:24:04 99%,

Re: [R] decreasing performance of for() loop

2005-10-10 Thread bogdan romocea
Nevermind, I found the fix. Declaring the length for out eliminates the performance decrease, out - vector(mode=numeric,length=length(test)) On 10/10/05, bogdan romocea [EMAIL PROTECTED] wrote: Dear useRs, I'm wondering why the for() loop below runs slower as it progresses. On a Win XP

[R] add leading 0s to %d from png() {was Automatic creation of file names}

2005-10-08 Thread bogdan romocea
Dear useRs, Is there a way to 'properly' format %d when plotting more than one page on png()? 'Properly' means to me with leading 0s, so that the PNGs become easy to navigate in a file/image browser. Lacking a better solution I ended up using the code below, but would much prefer something like

Re: [R] boxplot statistics

2005-10-06 Thread bogdan romocea
A related comment - don't rely (too much) on boxplots. They show only a few things, which may be limiting in many cases and completely misleading in others. Here are a couple of suggestions for plots which you may find more useful than the standard box plots: - figure 3.27 from

[R] RMySQL installation problem on FC4 x86_64

2005-09-07 Thread bogdan romocea
Dear useRs, I'm having a hard time installing RMySQL on a FC4 x86_64 box (R 2.1.0 and MySQL 4.1.11-2 installed through yum). After an initial configuration error (could not find the MySQL installation include and/or library directories) I managed to install RMySQL with # export

Re: [R] Linux Standalone Server Suggestions for R

2005-09-01 Thread bogdan romocea
Most powerful in what way? Quite a lot depends on the jobs you're going to run. - To run CPU-bound jobs, more CPUs is better. (Even though R doesn't do threading, you can manually split some CPU-bound jobs in several parts and run them simultaneously.) Apart from multiple CPUs and

Re: [R] Regular expressions sub

2005-08-18 Thread bogdan romocea
One solution is test - c(1.11,10.11,11.11,113.31,114.2,114.3) id - unlist(lapply(strsplit(test,[.]),function(x) {x[2]})) -Original Message- From: Bernd Weiss [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 12:10 PM To: r-help@stat.math.ethz.ch Subject: [R] Regular

Re: [R] retrieving large columns using RODBC

2005-08-15 Thread bogdan romocea
This appears to be an SQL issue. Look for a way to speed up your queries in Postgresql. I presume you haven't created an index on 'index', which means that every time you run your SELECT, Postgresql is forced to do a full table scan (not good). If the index doesn't solve the problem, look for some

Re: [R] Concerning reading of SAS-files

2005-08-12 Thread bogdan romocea
The first one is an index, not a data set. Anyway, just use SAS to export the data sets in text format (CSV, tab-delimited etc). You can then easily read those in R. (By the way, the help for read.xport says that 'The file must be in SAS XPORT format.' Is .sas7bdat an XPORT file? Hint: no.)

Re: [R] date format

2005-08-10 Thread bogdan romocea
You need the day to convert to a date format. Assuming day=15: x.date - as.Date(paste(as.character(x),-15,sep=),format=%Y-%m-%d) -Original Message- From: alessandro carletti [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 10, 2005 9:37 AM To: rHELP Subject: [R] date format

Re: [R] How to hiding code for a package

2005-08-01 Thread bogdan romocea
There's something else you could try - since you can't hide the code, obfuscate it. Hide the real thing in a large pile of useless, complicated, awfully formatted code that would stop anyone except the most desperate (including yourself, after a couple of weeks/months) from trying to understand

Re: [R] choose between dates and times

2005-07-26 Thread bogdan romocea
If happenat is not a datetime value, convert it with strptime(). Then, one solution is to transform it in the following way: num.time - as.numeric(format(happenat,%Y%m%d%H%M%S)) This way, 07/22/05 00:05:14 becomes 20050722000514, and you can subset your data frame with dfr[which(num.time =

Re: [R] Rprof fails in combination with RMySQL

2005-07-22 Thread bogdan romocea
never close the connection after a query.) hth, b. -Original Message- From: Thieme, Lutz [mailto:[EMAIL PROTECTED] Sent: Friday, July 22, 2005 2:04 AM To: bogdan romocea Cc: R-help@stat.math.ethz.ch Subject: Re: [R] Rprof fails in combination with RMySQL Hello Bogdan

Re: [R] Is it possible to create highly customized report in *.xls format by using R/S+?

2005-07-21 Thread bogdan romocea
So your conclusion is that the only choice is to make mistakes and get in trouble. (That's what Excel excels at.) Two options I haven't seen mentioned are: 1. Create your deliverables in HTML format, and change the extension from .htm to .xls; Excel will import them automatically. The way the

Re: [R] Rprof fails in combination with RMySQL

2005-07-21 Thread bogdan romocea
I think you're barking up the wrong tree. Optimize the MySQL code separately from optimizing the R code. A very nice reference about the former is http://highperformancemysql.com/. Also, if possible, do everything in MySQL. hth, b. -Original Message- From: Thieme, Lutz [mailto:[EMAIL

Re: [R] read.xport

2005-07-14 Thread bogdan romocea
How about avoiding SAS XPORT altogether and exporting everything in the simple, clean, non-proprietary, extremely reliable, platform-independent ... etc text format (CSV, tab delimited etc)? -Original Message- From: Nelson, Gary (FWE) [mailto:[EMAIL PROTECTED] Sent: Thursday, July

Re: [R] how to call sas in R

2005-07-05 Thread bogdan romocea
Why don't you do the simulations in SAS? If you prefer otherwise, setup the SAS code for running in batch mode (output and log redirection), then call it from R with (on Windows, untested) system(start ' ' C:\etc\sas.exe -sysin garch.sas) To keep the parameters from the estimate, have the SAS job

Re: [R] Trouble with Excel table connection

2005-06-30 Thread bogdan romocea
The best 3 things you can do in this situation are: 1. don't use Excel. 2. never use Excel. 3. never ever use Excel again. Spreadsheets are _not_ databases. In particular, Excel is a time bomb - use it long enough and you'll get burned (perhaps without even realizing it). See

Re: [R] Make matrix from SQL query result

2005-06-24 Thread bogdan romocea
It may be better to do this in SQL. The code below works for an arbitrary number of IDs and handles missing values. test - data.frame(id=rep(c(1,2),10),date=sort(c(1:10,1:10)),ret=0.01*-9:10) idret - list() ids - sort(unique(test$id)) for (i in ids) { idret[[as.character(i)]] -

[R] how to make R faster under GNU/Linux

2005-06-20 Thread bogdan romocea
Dear useRs, I timed the same code (simulation with for loops) on the same box (dual Xeon EM64T, 1.5 Gb RAM) under 3 OSs and was surprised by the results: Windows XP Pro (32-bit): Time difference of 5.97 mins 64-bit GNU/Linux (Fedora Core 4): Time difference of 6.97 mins 32-bit

  1   2   >