Re: [R] subsetting from a vector or matrix

2009-09-25 Thread andrew
both the following will probably do the trick.

?subset
?[

Basically on the second one, you want to come down to something that
looks like

x[L]

where x is a matrix/vector, and L is a logical vector that has the
same dimension as x, but is TRUE on the values of x that you want to
select.

for instance

x - rnorm(10)
L - x  3
x[L] will return all values of x that are greater than 3.

or you can just do

x[x3]



On Sep 25, 9:45 am, Jim Bouldin jrboul...@ucdavis.edu wrote:
 I realize this should be simple but I'm having trouble subsetting vectors
 and matrices, for example extracting all values meeting a certain
 criterion, from a vector. Cannot seem to figure out the correct syntax and
 help page not very helpful.  Or should I be using some other function than
 subset.  Thanks for any help.

 Jim Bouldin

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphics mailing list?

2009-09-25 Thread baptiste auguie
OK, it makes sense. Let's try that.

Best,

baptiste

2009/9/25 Paul Murrell p.murr...@auckland.ac.nz:
 Hi


 baptiste.auguie wrote:

 (Sorry about the double post earlier, googlemail is having hiccups today)

 2009/9/24 Romain Francois romain.franc...@dbmail.com:

 Why just grid ? why not a list for all kind of graphics ?

 I figured that a good share of the traffic on r-help might be
 considered graphics-related, while I was aiming at discussing less
 documented areas. But I agree that the distinction shouldn't be made
 on a particular package or system.


 I'm not wildly keen on a SIG (it could only mean fewer eyes seeing the
 discussion).  I think r-devel should serve quite well for these discussions,
 at least until people start complaining that it is being overrun with
 graphics questions ...

 Paul


 Best,

 baptiste



 On 09/24/2009 04:34 PM, baptiste.auguie wrote:

 Dear all,

 Would it make sense to have a separate mailing list (special interest
 group*) for Grid graphics? (or is there one already?)

 I don't feel comfortable asking questions about the design of new a
 new grid class in R-help where I'm guessing most people won't be
 interested. Of course having yet another mailing list would only make
 sense if it's to be followed by those people who work with Grid
 (lattice, vcd, ggplot2, latticeExtra, Rgraphics, etc.). Having read a
 bit of code from these packages recently, I get the feeling that
 several people may have been facing similar problems or reinventing
 the same things.

 Just a thought,

 Best regards,

 baptiste

 *: http://www.r-project.org/mail.html

 --
 Romain Francois
 Professional R Enthusiast
 +33(0) 6 28 91 30 30
 http://romainfrancois.blog.free.fr
 |- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc
 |- http://tr.im/yw8E : New R package : sos
 `- http://tr.im/y8y0 : search the graph gallery from R



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --
 Dr Paul Murrell
 Department of Statistics
 The University of Auckland
 Private Bag 92019
 Auckland
 New Zealand
 64 9 3737599 x85392
 p...@stat.auckland.ac.nz
 http://www.stat.auckland.ac.nz/~paul/


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] superimposing xyplots on same scale

2009-09-25 Thread baptiste auguie
2009/9/25 Felix Andrews fe...@nfrac.org:
 Sorry, doubleYScale is not appropriate, since you specifically want a
 common y scale.

 I think Baptiste was suggesting to use layer(), rather than
 as.layer():

Truth be told, I wasn't quite sure what the initial request meant. I
took it quite literally, as superimposing two existing xyplots.
Clearly the other options that were given are much better.

Best,

baptiste

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] differing behaviour between xts (0.6-7) and zoo (1.5-8)

2009-09-25 Thread Murali.MENON
Folks,
 
I have some weekly dataseries that I convert to monthly xts (with
yearmon indices), and obtain the two following extracts:
 
 str(sig)
An 'xts' object from Apr 1998 to Sep 1998 containing:
  Data: num [1:6, 1] 0.0083 0.2799 -0.2524 -0.0119 0.18 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr e1
  Indexed by objects of class: [yearmon] TZ: GMT
  xts Attributes:  
 NULL

 str(ret)
An 'xts' object from Mar 1998 to Aug 1998 containing:
  Data: num [1:6, 1] -0.007829 0.006452 -0.000276 -0.000644 0.002572 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr twi.Close
  Indexed by objects of class: [yearmon] TZ: GMT
  xts Attributes:  
 NULL
 
I understand that mathematical objects on xts objects will be performed
only on the datapoints with common indices, in this case Apr 1998 to Aug
1998. So I do:
 
 sig * ret
Data:
numeric(0)
 
Index:
NULL

Which doesn't give me what I expect. However, if I do:
 
 as.zoo(sig) * as.zoo(ret)
e1
Apr 1998  5.351189e-05
May 1998 -7.716467e-05
Jun 1998  1.624531e-04
Jul 1998 -3.055679e-05
Aug 1998  4.122321e-04

Which is as I expect.

I took a look at the structure of the two objects:

 dput(sig)
structure(c(0.00829354917358671, 0.279914830605598, -0.252440486192738, 
-0.0118822201758384, 0.179972233000564, -0.209066714293924), index =
c(891388800, 
893980800, 896659200, 899251200, 901929600, 904608000), .Dim = c(6L, 
1L), .Dimnames = list(NULL, e1), class = c(xts, zoo), .indexTZ =
structure(GMT, .Names = TZ), .indexCLASS = yearmon)

 dput(ret)
structure(c(-0.00782945094736132, 0.00645222996118644,
-0.000275671952124412, 
-0.000643530245146628, 0.00257163991836062, 0.00229053194651918
), index = c(890784000, 893808000, 896227200, 898646400, 901670400, 
904089600), .Dim = c(6L, 1L), .Dimnames = list(NULL, twi.Close),
.indexCLASS = yearmon, .indexTZ = structure(GMT, .Names = TZ),
class = c(xts, 
zoo))

So clearly the internal values of the supposedly overlapping parts of
the indices are different, although they are both 'yearmon' and seem to
represent the same months.

If I do

 dput(as.zoo(ret))
structure(c(-0.00782945094736132, 0.00645222996118644,
-0.000275671952124412, 
-0.000643530245146628, 0.00257163991836062, 0.00229053194651918
), .Dim = c(6L, 1L), .Dimnames = list(NULL, twi.Close), index =
structure(c(1998.167, 
1998.25, 1998.333, 1998.417, 1998.5, 1998.583
), class = yearmon), class = zoo)

 dput(as.zoo(sig))
structure(c(0.00829354917358671, 0.279914830605598, -0.252440486192738, 
-0.0118822201758384, 0.179972233000564, -0.209066714293924), .Dim =
c(6L, 
1L), .Dimnames = list(NULL, e1), index = structure(c(1998.25, 
1998.333, 1998.417, 1998.5, 1998.583, 
1998.667), class = yearmon), class = zoo)

Now the indices have the expected overlaps.

I'm not sure if this is a bug in xts? 

 sessionInfo()
R version 2.9.2 (2009-08-24) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics  grDevices datasets  tcltk utils methods
base 

other attached packages:
[1] xts_0.6-7   zoo_1.5-8   svSocket_0.9-43 svMisc_0.9-48
TinnR_1.0.3 R2HTML_1.59-1   Hmisc_3.6-1 rcom_2.2-1
rscproxy_1.3-1 

loaded via a namespace (and not attached):
[1] cluster_1.12.0  grid_2.9.2  lattice_0.17-25 tools_2.9.2

Please advise.

Thanks,

Murali

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] basic cubic spline smoothing

2009-09-25 Thread hm567


hm567 wrote:
 
 I am unsure about spar being the smoothness parameter, about where to put
 the standard errors of the points, and about the return of the
 smooth.spline function:

 Smoothing Parameter  spar= 0.5  lambda= 0.006833112 

 
 best regards,

 
Basically, the implementation based on the attached paper, for a standard
error of points =1.0,
the smoothing is too insensitive to the lambda smoothness parameter.
From 1 to almost 0.01, there is almost no smoothing... Only from 0.01 to 0
does one start to see smoothing in action with the limit at 0 being a
straight line.
Note that this implementation's parameter is (1 - parameter)

With R smooth.spline, 'spar' reflects well the smoothness in that:
. at 0%, the spline interpolates
. at 40% already, its shape is very different from the 0% one  ( for my
implementation, they are still same )
. at 90% it is almost a straight line
. at 100% it is definitely a straight line

This is the behavior that I wish to have.
It seems I need to change my lambda with some transformation that is similar
to the one in the doc of smooth.spline   (spar to lambda). Perhaps the
reverse one. But I can't see how to do it.

The other question is the standard errors. What do they correspond to in the
doc of smooth.spline?

Regards,
-- 
View this message in context: 
http://www.nabble.com/basic-cubic-spline-smoothing-tp25569553p25609558.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R 2.10.0 is scheduled for October 26

2009-09-25 Thread Peter Dalgaard
This is to announce that we plan to release R version 2.10.0 on Monday,
October 26, 2009.

Release procedures start today. The detailed schedule can
be found on http://developer.r-project.org

The source tarballs will be made available daily (barring build
troubles), starting September 28, and the tarballs can be picked up at

http://cran.r-project.org/src/base-prerelease/

a little later.

Binary builds are expected to appear soon thereafter.

For the Core Team
Peter Dalgaard


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

___
r-annou...@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-announce
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Downloading data from from internet

2009-09-25 Thread Bogaso

Thank you so much for those helps. However I need little more help. In the
site
http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php;
if I scroll below then there is an option Historical CPI Index For USA
Next if I click on Get Data then another table pops-up, however without
any significant change in address bar. This tables holds more data starting
from 1999. Can you please help me how to get the values of this table?

Thanks


Duncan Temple Lang wrote:
 
 
 Thanks for explaining this, Charlie.
 
 Just for completeness and to make things a little easier,
 the XML package has a function named readHTMLTable()
 and you can call it with a URL and it will attempt
 to read all the tables in the page.
 
  tbls =
 readHTMLTable('http://www.rateinflation.com/consumer-price-index/usa-cpi.php')
 
 yields a list with 10 elements, and the table of interest with the data is
 the 10th one.
 
  tbls[[10]]
 
 The function does the XPath voodoo and sapply() work for you and uses some
 heuristics.
 There are various controls one can specify and also various methods for
 working
 with sub-parts of the HTML document directly.
 
   D.
 
 
 
 cls59 wrote:
 
 
 Bogaso wrote:
 Hi all,

 I want to download data from those two different sources, directly into
 R
 :

 http://www.rateinflation.com/consumer-price-index/usa-cpi.php
 http://eaindustry.nic.in/asp2/list_d.asp

 First one is CPI of US and 2nd one is WPI of India. Can anyone please
 give
 any clue how to download them directly into R. I want to make them zoo
 object for further analysis.

 Thanks,

 
 The following site did not load for me:
 
 http://eaindustry.nic.in/asp2/list_d.asp
 
 But I was able to extract the table from the US CPI site using Duncan
 Temple
 Lang's XML package:
 
   library(XML)
 
 
 First, download the website into R:
 
   html.raw - readLines(
 'http://www.rateinflation.com/consumer-price-index/usa-cpi.php' )
 
 Then, convert to an HTML object using the XML package:
 
   html.data - htmlTreeParse( html.raw, asText = T, useInternalNodes = T
 )
 
 A quick scan of the page source in the browser reveals that the table you
 want is encased in a div with a class of dynamicContent-- we will use a
 xpath specification[1] to retrieve all rows in that table:
 
   table.html - getNodeSet( html.data,
 '//d...@class=dynamicContent]/table/tr' )
 
 Now, the data values can be extracted from the cells in the rows using a
 little sapply and xpathXpply voodoo:
 
   table.data - t( sapply( table.html, function( row ){
 
 row.data -  xpathSApply( row, './td', xmlValue )
 return( row.data)
 
   }))
 
 
 Good luck!
 
 -Charlie
  
   [1]:  http://www.w3schools.com/XPath/xpath_syntax.asp
 
 -
 Charlie Sharpsteen
 Undergraduate
 Environmental Resources Engineering
 Humboldt State University
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Downloading-data-from-from-internet-tp25568930p25610171.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem on SUSE Linux Enterprise Server 10 (ia64)

2009-09-25 Thread Paul Hiemstra

Hi,

You need to install the headers/libs for readline. Probably using your 
package manager, look for something like readline-devel.


cheers,
Paul

Yuan Zhidong wrote:

Dear Sir,
When I install R on SUSE Linux Enterprise Server 10 (ia64)
(Linux a450 2.6.16.21-0.8-default #1 SMP Mon Jul 3 18:25:39 UTC 2006 
ia64 ia64 ia64 GNU/Linux)

it reported the wrong messages at the end:

# ./configure   checking build system 
type... ia64-unknown-linux-gnu 
checking host system type... 
ia64-unknown-linux-gnu  loading site 
script 
'./config.site'  
loading build specific script 
'./config.site'checking for 
pwd... 
/bin/pwd 
checking whether builddir is srcdir... 
yes   checking for working 
aclocal... found
checking for working autoconf... 
found  
. 


checking for readline/readline.h... no
checking for rl_callback_read_char in -lreadline... no
checking for main in -lncurses... yes
checking for rl_callback_read_char in -lreadline... no
checking for history_truncate_file... no
configure: error: --with-readline=yes (default) and headers/libs are 
not available



Could you tell me how to fix the problem? Thank you!

Best wishes,

Yuan Zhidong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] variation in one variable

2009-09-25 Thread Samuel Okoye
Hello,

Could you please tell me wether there is any function in R that tell me how 
many subgroup in one variable I have? So for example if my data are
x - c(rnorm(50,50,3),rgamma(50,2,1),runif(50,0,1))

I want to know how many group I have?

Many thank in advance,
Samuel


--- On Thu, 9/17/09, Samuel Okoye samu...@yahoo.com wrote:

From: Samuel Okoye samu...@yahoo.com
Subject: SVM
To: r-h...@stat.math.ethz.ch
Date: Thursday, September 17, 2009, 4:39 AM

Hello,

I have 12 sample each sample has got 1000 observation, i.e I have a matrix X 
with 1000 rows and 12 columns!
 
m - svm(t(X))
p - predict (m)

Can anyone tell me how to use svmtrain() in R!

Many Yhanks,
Samuel





  


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Binomial

2009-09-25 Thread Ashta
Dear R-users,

Suppose I have the following sample of data,

 0   1   2  4  3
 1   2   1  3  1
 1   3   3  4  1
 0   1   2  1  2
 1   4   1  4  2
  1   2   2  1  1

The first variable is the response variable where 0 is defective and 1
normal. The other four factors( x1,x2,x3,x4) that influence the outcome. I
want to fit a binomial model . How do I do that? I am guessing the response
variable should be transformed  but not sure which family of transformation
to use.
It is easy to do it  in SAS but I just want to learn using R

Any help is highly appreciated

Ashta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R CMD INSTALL --build: Folders /inst and /etc not in zip-file and WindowsXP locks /library/[package]/etc/

2009-09-25 Thread Tobias Schoch

Dear R users,

My set-up: OS=Windows XP, R-2.9.2, Rtools210
I faced the follwing problem with the package compilation: There is no
/inst or /etc subdirectory in the package-zip-file. And the content of
the /etc subdirectory is lost, too. I tried a simplified test package.
The test package has the following structure (see also attachement: test
package as source file):

/test
   |---/inst
   | |---/etc
   |   |---menus.txt
   |
   |---/man
   | |---mymean.Rd
   | |---test-package.Rd
   |
   |---/R
   | |---mymean.R
   |
   |---NAMESPACE
   |---DESCRIPTION
   
The file menus.txt (inspired by the Rcmdr menu structure) contains one
single comment line. The file mymean.R contains a simple function that
computes the mean.

Situation A) [R CMD BUILD test] works fine and the tar-ball contains the
subdirectory /inst/etc/ and the file menus.txt.

Situation B) [R CMD INSTALL --build test] generates the test_1.0.zip file
without any error message. But: 1) This zip-file does neither contain the
/etc folder nor the menus.txt file. 2) The installation created the
folder /test in the library path /R-2.9.2/library/test/ , but this folder
is locked by WindowsXP. That is, the access is denied and can only be
resolved by re-defining the owner and the permission rights. 3) Having
removed the package from the library, and trying to install the zip-package
using install from local zip files... does the installation. But, the
/etc folder is empty; the file menus.txt is missing.

Is there a known problem/bug in the R CMD INSTALL --build process dealing
with the subdirectories /inst and /etc and the contents of the these
folders? How to resolve the problem?

Thanks 
Tobias

http://www.nabble.com/file/p25609569/source_test.zip source_test.zip 
-- 
View this message in context: 
http://www.nabble.com/R-CMD-INSTALL---build%3A-Folders--inst-and--etc-not-in-zip-file-and-WindowsXP-locks--library--package--etc--tp25609569p25609569.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] synchronisation of time series data using interpolation

2009-09-25 Thread Gabor Grothendieck
Create the series as zoo series from the data, and then merge them and
fill in NAs with interpolated values using na.approx.  Finally use
window to pick off the times that were in z1 and plot.  See the three
vignettes that come with zoo and for time and dates see the article in
R News 4/1 and its references.

Lines1 - time,datum
01:00:00,500
01:00:15,600
01:00:30,750
01:00:45,720
01:01:00,700
01:01:15,725
01:01:30,640
01:01:45,710

Lines2 - time,datum
01:00:12,20
01:01:01,55
01:01:55,22

library(zoo)
library(chron)

z1 - read.zoo(textConnection(Lines1), header = TRUE, sep = ,, FUN = times)
z2 - read.zoo(textConnection(Lines2), header = TRUE, sep = ,, FUN = times)

z3 - window(na.approx(merge(z1, z2)), time(z1))
plot(z3$z1, z3$z2)


On Fri, Sep 25, 2009 at 1:41 AM, e-letter inp...@gmail.com wrote:
 Readers,

 I have data with different time stamps that I wish to plot (for example):

 data set 1
 time(hh:mm:ss),datum
 01:00:00,500
 01:00:15,600
 01:00:30,750
 01:00:45,720
 01:01:00,700
 01:01:15,725
 01:01:30,640
 01:01:45,710
 data set 2
 time,datum
 01:00:12,20
 01:01:01,55
 01:01:55,22

 The time interval in data set 1 does not change, but the time interval
 in data set 2 does change, such that for a specific total time range
 (e.g. 60 minutes) there will be more data in data set 1 than in data
 set 2.

 I thought I could solve this problem using interpolation, to create a
 new data set using data from data set 2, interpolated to the time
 stamps in data set 1:

 data set 3
 time,datum
 01:00:00,18
 01:00:15,23
 01:00:30,30
 01:00:45,41
 01:01:00,53
 01:01:15,46
 01:01:30,38
 01:01:45,29

 Therefore I would then be able to plot the data in data set 1 against
 the interpolated data in data set 3, because there would be equal
 quantities of data in both data sets. I've looked at the interp
 function in the help manual, but I don't understand if this function
 can perform the task I want. Any advice please?

 Yours,

 rhelp at conference.jabber.org
 r 251 (27-06-07)
 mandriva 2008

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem on plotting TS using GGPLOT

2009-09-25 Thread bogaso.christofer
Hi, I have following codes :

 

 

library(zoo); library(ggplot2); library(plyr)

 

 

dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv)

dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)), frequency=12)

 

ggplot(dat1) +

geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv, size =
1.3)

 

 

However I got error while plotting them :

Error in data.frame(x = c(2000, 2000.083, 2000.167, 2000.25,
: 

  arguments imply differing number of rows: 51, 306

 

I could not find why that error is coming. Any idea please ?

 

Thanks,


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Java to R interface.

2009-09-25 Thread vikrant S

I want to call R functions from Java. I read a couple of forums that said to
install package rJava in R.
However I am not able to install rJava package in linux Ubuntu.I tried with
two commands.
One is
install.packages(rJava)

and another I downloaded the rJava_0.7-0.tar.gz file from R site.
and gave the command R CMD INSTALL rJava_0.7-0.tar.gz.
 I got the followin Errors :- 


Warning in install.packages(rJava) :
  argument 'lib' is missing: using
'/home/vikrant/R/i486-pc-linux-gnu-library/2.9'
trying URL 'http://cran.uk.r-project.org/src/contrib/rJava_0.7-0.tar.gz'
Content type 'application/x-gzip' length 249486 bytes (243 Kb)
opened URL
==
downloaded 243 Kb

* Installing *source* package ‘rJava’ ...
mv: cannot move `/home/vikrant/R/i486-pc-linux-gnu-library/2.9/rJava' to
`/home/vikrant/R/i486-pc-linux-gnu-library/2.9/00LOCK/rJava': Permission
denied
checking for gcc... gcc -std=gnu99
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... 
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc -std=gnu99 accepts -g... yes
checking for gcc -std=gnu99 option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -std=gnu99 -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/wait.h that is POSIX.1 compatible... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for string.h... (cached) yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking for unistd.h... (cached) yes
checking for an ANSI C-conforming const... yes
checking whether time.h and sys/time.h may both be included... yes
configure: checking whether gcc -std=gnu99 supports static inline...
yes
checking Java support in R... present:
interpreter : '/usr/bin/java'
archiver: '/usr/bin/jar'
compiler: '/usr/bin/javac'
header prep.: '/usr/bin/javah'
cpp flags   : '-I/usr/lib/jvm/java-6-openjdk/jre/../include'
java libs   : '-L/usr/lib/jvm/java-6-openjdk/jre/lib/i386/client
-L/usr/lib/jvm/java-6-openjdk/jre/lib/i386
-L/usr/lib/jvm/java-6-openjdk/jre/../lib/i386 -L
-L/usr/java/packages/lib/i386 -L/lib -L/usr/lib -L/usr/lib/jni -ljvm'
checking whether JNI programs can be compiled... yes
checking JNI data types... configure: error: One or more JNI types differ
from the corresponding native type. You may need to use non-standard
compiler flags or a different compiler in order to fix this.
ERROR: configuration failed for package ‘rJava’

Please Help me to install rJava. and anyone Could u suggest Is there any
better way to call R from Java
And provide me the tutorial for the same ?
Thanks in Advance

-- 
View this message in context: 
http://www.nabble.com/Java-to-R-interface.-tp25606893p25606893.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data import from .csv-file with numeric header

2009-09-25 Thread Tobias Ruff
Hello everybody out there using R,

How can I import data with a numeric header from a .csv-file?
My file example.csv has the following content (a duplicate measurement of 
potentials for three different currents):
1; 2; 6
1.0; 2.1; 5.9
1.1; 2.0; 6.0

I try to import the data by using:
measurement - read.table(example.csv,sep=;,header=T)
However, the values in the header are renamed to the column names X1, X2 and X3.
When I try to plot the data, I don't get the right x-values (the three 
different currents 1, 2 and 6), but 1.0, 2.0 and 3.0:
plot(mean(measurement))

Thanks in advance.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nested select

2009-09-25 Thread premmad

my data :

library(doBy)

lines-lo ptcl5 ptcl99 variable 
 430. 8787a
430   3422343 m
 430. 89mr
4314564774a
431 299   2777m
 4319996  mr
 432333 3433   a
 432 .7377m
 432. 676  mr 
 
DF - read.table(con- textConnection(Lines), skip = 1)

close(con) 

what i want is select lo when ptcl5 is missing and variable is either a or m
.
I tried the following query 
sqldf(select lo from DF where lo=(select lo where ptcl5='.' and
variable='m') or lo=(select lo where ptcl5='.' and variable='a')).
But I'm getting entire data instead of limited by the condition.
Is my query right please help me in this.
-- 
View this message in context: 
http://www.nabble.com/Nested-select-tp25608506p25608506.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re peated measures

2009-09-25 Thread pompon

Hi,

Thank you. 
It was that.

Julien.



Tal Galili wrote:
 
 check for missing values.
 Tal
 
 
 
 On Wed, Sep 23, 2009 at 3:27 PM, pompon julien.pom...@agr.gc.ca wrote:
 

 Hi,

 I am performing a repeated measures 2-way ANOVA  to assess the influence
 of
 plant and leaf on aphid fecundity. Fecundity is measured for each aphid
 on
 a
 single leaf.
 Here is what I typed.

 wingless - reshape(Wingless,
varying =

 list(c(d0,d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16)),
v.names = c(fecundity), timevar = time,
direction = long)

 wingless.aov - aov(fecundity ~ factor(time) * clip.cage * plant +
 Error(factor(id)), data = wingless)

 summary(wingless.aov)

 and I obtained

 Error: factor(id)
   Df Sum Sq Mean Sq F value  Pr(F)
 factor(time)4 56.789  14.197  3.0613 0.05925 .
 clip.cage   1 14.149  14.149  3.0509 0.10621
 plant   1  3.251   3.251  0.7010 0.41880
 factor(time):clip.cage  1  0.304   0.304  0.0655 0.80240
 clip.cage:plant 1 17.114  17.114  3.6903 0.07880 .
 Residuals  12 55.652   4.638
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Error: Within
  Df Sum Sq Mean Sq F value  Pr(F)
 factor(time)  16 340.83   21.30 11.5222  2e-16 ***
 factor(time):clip.cage16  27.341.71  0.9242 0.54195
 factor(time):plant16  46.362.90  1.5673 0.07783 .
 factor(time):clip.cage:plant  16  24.501.53  0.8281 0.65304
 Residuals255 471.441.85
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 I don't understand why I have the factor(time) inmy between subject
 results,
 whereas with a similar set of data I don't.

 Thank you very much,
 Julien Pompon.
 --
 View this message in context:
 http://www.nabble.com/repeated-measures-tp25531110p25531110.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 --
 
 
 My contact information:
 Tal Galili
 Phone number: 972-50-3373767
 FaceBook: Tal Galili
 My Blogs:
 http://www.r-statistics.com/
 http://www.talgalili.com
 http://www.biostatistics.co.il
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/repeated-measures-tp25531110p25610539.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] if else and loop for code in R

2009-09-25 Thread Minh Duy Mai
I am using if else and loop to sortout the data set that is the values  
less than o or more than 100 will be chosen.I could not get outTable  
with loop.

Please help me to correct the code:
I USED:

# Read
a_data - read.table(D:/SNP/copy.sas, header=T, sep=\t)
tr - a_data$truck
ca - a_data$cars
length - nrow(a_data)
outTable - matrix(nrow=length,ncol=3)

stat - for (i in 1:length) {
if (tr0) {0} else
if (ca100) {0}else
{ca}
outTable - c(i, stat, tr)
}
# Writing the output file
  colnames(outTable) - c(number, stat, tr)
   
write.table(outTable,D:/SNP/mixed.txt,append=FALSE,quote=FALSE,sep='\t',  
row.names=F)

# Graph
plot(stat, type=o, col=red, axes=FALSE, ann=FALSE)
# Create a title with a red, bold/italic font title(main=Autos,  
col.main=red, font.main=4)

# Start PNG device driver to save output to figure.png
png(filename=D:/SNP/figure.png, height=295, width=300, bg=white)
.
COMPLAIN
Error: object 'stat' not found
In addition: Warning message:
In if (tr  0) { :
  the condition has length  1 and only the first element will be used
...

Thank alot

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data import from .csv-file with numeric header

2009-09-25 Thread Henrique Dallazuanna
Try this:

measurement - read.table(example.csv, sep = ;,
 header = TRUE, check.names = FALSE)
plot(mean(measurement), names(measurement), xaxt = 'n')
axis(1, names(measurement))

On Fri, Sep 25, 2009 at 3:53 AM, Tobias Ruff lisem...@ymail.com wrote:
 Hello everybody out there using R,

 How can I import data with a numeric header from a .csv-file?
 My file example.csv has the following content (a duplicate measurement of 
 potentials for three different currents):
 1; 2; 6
 1.0; 2.1; 5.9
 1.1; 2.0; 6.0

 I try to import the data by using:
measurement - read.table(example.csv,sep=;,header=T)
 However, the values in the header are renamed to the column names X1, X2 and 
 X3.
 When I try to plot the data, I don't get the right x-values (the three 
 different currents 1, 2 and 6), but 1.0, 2.0 and 3.0:
plot(mean(measurement))

 Thanks in advance.




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data import from .csv-file with numeric header

2009-09-25 Thread Duncan Murdoch

Tobias Ruff wrote:

Hello everybody out there using R,

How can I import data with a numeric header from a .csv-file?
My file example.csv has the following content (a duplicate measurement of 
potentials for three different currents):
1; 2; 6
1.0; 2.1; 5.9
1.1; 2.0; 6.0

I try to import the data by using:
  

measurement - read.table(example.csv,sep=;,header=T)


However, the values in the header are renamed to the column names X1, X2 and X3.
When I try to plot the data, I don't get the right x-values (the three 
different currents 1, 2 and 6), but 1.0, 2.0 and 3.0:
  

plot(mean(measurement))


I got X1, X2 and X6, because 1, 2, and 6 aren't legal variable names.  
If you want to use them as names anyway, use the check.names=FALSE argument.


I don't know how you tried to plot them so I can't help you with that.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug

2009-09-25 Thread dhansekaran

Thank you very much for help. Here is my values of Sale Date 

sample

test[1:100, 76]
11989-08-01
21900-01-01
32003-11-18
42003-05-30
52005-08-18
61990-04-01
71989-01-01
81900-01-01
91996-03-12
10   1900-01-01
11   2005-11-14
12   2002-05-08
13   2000-10-10
14   1900-01-01
15   2007-03-27








Gabor Grothendieck wrote:
 
 You have found a bug.
 
 It would be best to use dput(test1) to show unambiguously display what
 is in test1 but in the absence of that I will assume that its as in
 test1 shown below.
 
 library(sqldf)
 test1 - data.frame(sale_date = as.Date(c(2008-08-01, 2031-01-09,
 + 1990-01-03, 2007-02-03, 1997-01-03, 2004-02-04)))
 
 sqldf(select max(sale_date) from test1)
   max(sale_date)
 1 9864.0
 
 Evidently it is taking the internal numeric representation and then
 storing it in the database as characters and then taking the maximum
 of those characters.  As the fifth entry starts with 9 its the maximum
 when sorted alphabetically:
 
 as.numeric(test1[[1]])
 [1] 14092 22288  7307 13547  9864 12452
 
 I will have to investigate whether the problem is in sqldf or the
 underlying software.  In the meantime if you represent the Date data
 as character you should be ok:
 
 test2 - transform(test1, sale_date = as.character(sale_date))
 sqldf(select max(sale_date) from test2)
   max(sale_date)
 1 2031-01-09
 
 
 
 
 packageDescription(sqldf)$Version
 [1] 0-1.7
 R.version.string
 [1] R version 2.9.2 Patched (2009-09-08 r49647)
 
 Please provide the output of dput(test1) so that we know unambiguously
 what your data looks like.
 
 On Thu, Sep 24, 2009 at 9:07 AM, dhanasekaran dhana...@gmail.com wrote:
 The data looks like

 2008-08-01
 2031-01-09
 1990-01-03
 2007-02-03
 1997-01-03
 2004-02-04

 Thanks.

 On Thu, Sep 24, 2009 at 5:20 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:

 Please read and follow the last line to every message on r-help.

 On Thu, Sep 24, 2009 at 5:32 AM, dhansekaran dhana...@gmail.com wrote:
 
  Hello R users
 
  I tried to get maximum of sale date from my dataframe using sqldf in
 R.
  First time when i was executing the following code
 
 sqldf(select max(sale_date) from test1)
 
  i got the result as 9997.0
 
  BUT
 
  when i was running the same for second time, the result was 2031-04-09
  (this
  is what correct one!)
 
  why it was happened?
 
  thanks.
  --
  View this message in context:
  http://www.nabble.com/Bug-tp25548042p25548042.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Best
 Dhanasekaran

 Without trust, words become the hollow sound of a wooden gong. With
 trust,
 words become life itself.”

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: http://www.nabble.com/Bug-tp25548042p25610059.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Binomial

2009-09-25 Thread Marc Schwartz

On Sep 25, 2009, at 6:29 AM, Ashta wrote:


Dear R-users,

Suppose I have the following sample of data,

0   1   2  4  3
1   2   1  3  1
1   3   3  4  1
0   1   2  1  2
1   4   1  4  2
 1   2   2  1  1

The first variable is the response variable where 0 is defective and 1
normal. The other four factors( x1,x2,x3,x4) that influence the  
outcome. I
want to fit a binomial model . How do I do that? I am guessing the  
response
variable should be transformed  but not sure which family of  
transformation

to use.
It is easy to do it  in SAS but I just want to learn using R

Any help is highly appreciated

Ashta


Presuming that your reference to SAS is to PROC LOGISTIC, then in R  
you would use glm() with 'family = binomial'.


Using:

  help.search(logistic regression)

would get you a lot of hints.

See ?glm for more information, or alternatively, the lrm() function in  
Frank's 'rms' package on CRAN.


I would however hope that your actual working data set is much larger,  
as you don't have enough data above to support a single covariate,  
much less 4.


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Java to R interface.

2009-09-25 Thread Cedrick Johnson

You could also try Rserve

http://www.rforge.net/Rserve/

-cj


vikrant S wrote:

I want to call R functions from Java. I read a couple of forums that said to
install package rJava in R.
However I am not able to install rJava package in linux Ubuntu.I tried with
two commands.
One is
install.packages(rJava)

and another I downloaded the rJava_0.7-0.tar.gz file from R site.
and gave the command R CMD INSTALL rJava_0.7-0.tar.gz.
 I got the followin Errors :- 



Warning in install.packages(rJava) :
  argument 'lib' is missing: using
'/home/vikrant/R/i486-pc-linux-gnu-library/2.9'
trying URL 'http://cran.uk.r-project.org/src/contrib/rJava_0.7-0.tar.gz'
Content type 'application/x-gzip' length 249486 bytes (243 Kb)
opened URL
==
downloaded 243 Kb

* Installing *source* package ‘rJava’ ...
mv: cannot move `/home/vikrant/R/i486-pc-linux-gnu-library/2.9/rJava' to
`/home/vikrant/R/i486-pc-linux-gnu-library/2.9/00LOCK/rJava': Permission
denied
checking for gcc... gcc -std=gnu99
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... 
checking for suffix of object files... o

checking whether we are using the GNU C compiler... yes
checking whether gcc -std=gnu99 accepts -g... yes
checking for gcc -std=gnu99 option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -std=gnu99 -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/wait.h that is POSIX.1 compatible... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for string.h... (cached) yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking for unistd.h... (cached) yes
checking for an ANSI C-conforming const... yes
checking whether time.h and sys/time.h may both be included... yes
configure: checking whether gcc -std=gnu99 supports static inline...
yes
checking Java support in R... present:
interpreter : '/usr/bin/java'
archiver: '/usr/bin/jar'
compiler: '/usr/bin/javac'
header prep.: '/usr/bin/javah'
cpp flags   : '-I/usr/lib/jvm/java-6-openjdk/jre/../include'
java libs   : '-L/usr/lib/jvm/java-6-openjdk/jre/lib/i386/client
-L/usr/lib/jvm/java-6-openjdk/jre/lib/i386
-L/usr/lib/jvm/java-6-openjdk/jre/../lib/i386 -L
-L/usr/java/packages/lib/i386 -L/lib -L/usr/lib -L/usr/lib/jni -ljvm'
checking whether JNI programs can be compiled... yes
checking JNI data types... configure: error: One or more JNI types differ
from the corresponding native type. You may need to use non-standard
compiler flags or a different compiler in order to fix this.
ERROR: configuration failed for package ‘rJava’

Please Help me to install rJava. and anyone Could u suggest Is there any
better way to call R from Java
And provide me the tutorial for the same ?
Thanks in Advance




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] packGrob and dynamic resizing

2009-09-25 Thread baptiste auguie
Thank you Paul, I was convinced I tried this option but I obviously didn't!

In ?packGrob, the user is warned that packing grobs can be slow. In
order to quantify this, I made the following comparison of 3
functions,

- table1 uses frameGrob and packGrob
- table2 uses frameGrob but calculates the sizes manually and uses placeGrob
- table3 creates a grid.layout and draws the grobs in the different viewports.

The three functions have (almost) the same output, but the timing does
differ quite substantially !

 system.time(table1(content))
#   user  system elapsed
# 126.733   2.414 135.450
system.time(table2(content))
#   user  system elapsed
# 22.387   0.508  24.457
 system.time(table3(content))
#   user  system elapsed
#  4.868   0.124   5.695

A few questions:

- why should the placeGrob approach of table2 be 5 times slower than
table3 (pushing viewports) ?

- if so, what are the merits of using a frameGrob over creating a
layout manually?

- can one add some padding to the content placed with a placeGrob approach?


Best regards,

baptiste

The code follows below,

sessionInfo()
R version 2.9.2 (2009-08-24)
i386-apple-darwin8.11.1

locale:
en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  grid  methods
[8] base

### code starts ###
library(grid)

# a few helping functions

rowMax.units - function(u, nrow){ # rowMax with a fake matrix of units
  matrix.indices - matrix(seq_along(u), nrow=nrow)
  do.call(unit.c, lapply(seq(1, nrow), function(ii) {
   max(u[matrix.indices[ii, ]])
  }))
}

colMax.units - function(u, ncol){ # colMax with a fake matrix of units
  matrix.indices - matrix(seq_along(u), ncol=ncol)
  do.call(unit.c, lapply(seq(1, ncol), function(ii) {
   max(u[matrix.indices[, ii]])
  }))
}


textii - function(d, gp=gpar(), name=row-label-){
  function(ii)
textGrob(label=d[ii], gp=gp, name=paste(name, ii, sep=))
}

# create a list of text grobs from a data.frame
makeContent - function(d){
content - as.character(unlist(c(d)))

 makeOneLabel - textii(d=content, gp=gpar(col=blue), name=content-label-)
 lg - lapply(seq_along(content), makeOneLabel)

list(lg=lg, nrow=nrow(d), ncol=ncol(d))
}

 the comparison starts here 

## table1 uses grid.pack
table1 - function(content){

gcells = frameGrob(name=table.cells,
layout = grid.layout(content$nrow, content$ncol))

   label.ind - 1   # index running accross labels

for (ii in seq(1, content$ncol, 1)) {
for (jj in seq(1, content$nrow, 1)) {
gcells = packGrob(gcells, content$lg[[label.ind]], row=jj, 
col=ii,
dynamic=TRUE)
label.ind - label.ind + 1
  }
}
grid.draw(gTree(children=gList(gcells)))
}

## table2 uses grid.place
table2 - function(content){

padding - unit(4, mm)
 lg - content$lg
 ## retrieve the widths and heights of all textGrobs (including some zeroGrobs)
  wg - lapply(lg, grobWidth) # list of grob widths
  hg - lapply(lg, grobHeight) # list of grob heights

 ## concatenate this units
  widths.all - do.call(unit.c, wg) # all grob widths
  heights.all - do.call(unit.c, hg)#all grob heights

 ## matrix-like operations on units to define the table layout
  widths - colMax.units(widths.all, content$ncol)  # all column widths
  heights - rowMax.units(heights.all, content$nrow)  # all row heights

gcells = frameGrob(name=table.cells,
layout = grid.layout(content$nrow, content$ncol,
  width=widths+padding, height=heights+padding))

   label.ind - 1   # index running accross labels

for (ii in seq(1, content$ncol, 1)) {
for (jj in seq(1, content$nrow, 1)) {
gcells = placeGrob(gcells, content$lg[[label.ind]], row=jj, 
col=ii)
label.ind - label.ind + 1
  }
}
grid.draw(gTree(children=gList(gcells)))

}

## table3 uses grid.layout
table3 - function(content){

padding - unit(4, mm)
 lg - content$lg
 ## retrieve the widths and heights of all textGrobs (including some zeroGrobs)
  wg - lapply(lg, grobWidth) # list of grob widths
  hg - lapply(lg, grobHeight) # list of grob heights

 ## concatenate this units
  widths.all - do.call(unit.c, wg) # all grob widths
  heights.all - do.call(unit.c, hg)#all grob heights

 ## matrix-like operations on units to define the table layout
  widths - colMax.units(widths.all, content$ncol)  # all column widths
  heights - rowMax.units(heights.all, content$nrow)  # all row heights

  cells = viewport(name=table.cells, layout =
grid.layout(content$nrow, content$ncol,
width=widths+padding, height=heights+padding) )

  pushViewport(cells)

   label.ind - 1   # index running accross labels

  ## loop over columns and rows
  for (ii in seq(1, content$ncol, 1)) {
for (jj in seq(1, content$nrow, 1)) {
  ## 

Re: [R] keeping all rows with the same values, and not only unique ones

2009-09-25 Thread Dimitri Liakhovitski
Thank you so much, everyone!
Very helpful!
Dimitri

On Thu, Sep 24, 2009 at 7:46 PM, Moshe Olshansky m_olshan...@yahoo.com wrote:
 test[which(test[,total] %in% needed),]

 --- On Fri, 25/9/09, Dimitri Liakhovitski ld7...@gmail.com wrote:

 From: Dimitri Liakhovitski ld7...@gmail.com
 Subject: [R] keeping all rows with the same values, and not only unique ones
 To: R-Help List r-h...@stat.math.ethz.ch
 Received: Friday, 25 September, 2009, 8:52 AM
 Dear R-ers,

 I have a data frame test:
 test-data.frame(x=c(1,2,3,4,5,6,7,8),y=c(2,3,4,5,6,7,8,9),total=c(7,7,8,8,9,9,10,10))
 test

 I have a vector needed:
 needed-c(7,9)
 needed

 I need the result to look like this:
 1 2 7
 2 3 7
 5 6 9
 6 7 9

 When I do the following:
 result-test[test[total]==needed,]
 result

 I only get unique rows that have 7 or 9 in total:
 1 2 7
 6 7 9

 How could I keep ALL rows that have 7 or 9 in total

 Thanks a million!

 --
 Dimitri Liakhovitski
 Ninah.com
 dimitri.liakhovit...@ninah.com

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.





-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if else and loop for code in R

2009-09-25 Thread ONKELINX, Thierry
It looks like you are trying to mimic the SAS data step. In R you can
vectorise this.

a_data - read.table(D:/SNP/copy.sas, header=T, sep=\t) 
a_data$stat - with(a_data, ifelse(truck  0, 0, ifelse(cars  100, 0,
cars)))
a_data$i - seq_len(nrow(a_data))
outTable - a_data[, c(i, stat, truck)]

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Minh Duy Mai
Verzonden: vrijdag 25 september 2009 13:53
Aan: r-help@R-project.org
CC: minhmai...@yahoo.com
Onderwerp: [R] if else and loop for code in R

I am using if else and loop to sortout the data set that is the values
less than o or more than 100 will be chosen.I could not get outTable
with loop.
Please help me to correct the code:
I USED:

# Read
a_data - read.table(D:/SNP/copy.sas, header=T, sep=\t) tr -
a_data$truck ca - a_data$cars length - nrow(a_data) outTable -
matrix(nrow=length,ncol=3)

stat - for (i in 1:length) {
if (tr0) {0} else
if (ca100) {0}else
{ca}
outTable - c(i, stat, tr)
}
# Writing the output file
   colnames(outTable) - c(number, stat, tr)

write.table(outTable,D:/SNP/mixed.txt,append=FALSE,quote=FALSE,sep='\t
',
row.names=F)
# Graph
plot(stat, type=o, col=red, axes=FALSE, ann=FALSE) # Create a title
with a red, bold/italic font title(main=Autos, col.main=red,
font.main=4) # Start PNG device driver to save output to figure.png
png(filename=D:/SNP/figure.png, height=295, width=300, bg=white)
.
COMPLAIN
Error: object 'stat' not found
In addition: Warning message:
In if (tr  0) { :
   the condition has length  1 and only the first element will be used
...

Thank alot

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem on plotting TS using GGPLOT

2009-09-25 Thread ONKELINX, Thierry

You are mixing data from two datasets with different lengths. Your x
variable has 51 elements, while the y variable has 306 elements? What
did you expect to happen with that? 

Use only one dataset within a geom(). Otherwise you are likely the get
in troubles.

HTH,

Thierry 




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens bogaso.christofer
Verzonden: vrijdag 25 september 2009 14:47
Aan: r-h...@stat.math.ethz.ch
Onderwerp: [R] Problem on plotting TS using GGPLOT

Hi, I have following codes :





library(zoo); library(ggplot2); library(plyr)





dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv)

dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)),
frequency=12)



ggplot(dat1) +

geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv,
size =
1.3)





However I got error while plotting them :

Error in data.frame(x = c(2000, 2000.083, 2000.167,
2000.25,
: 

  arguments imply differing number of rows: 51, 306



I could not find why that error is coming. Any idea please ?



Thanks,


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does anybody know how to connect to SAS from within R?

2009-09-25 Thread Michael
Yeah, I also would like to know what synergy can I get from combining
the power of R and SAS...

Maybe there are something that's particularly strong in R and
someother that's particularly strong in SAS?

Thanks!

On Thu, Sep 24, 2009 at 10:26 PM, Indrajit Sengupta
indra_cali...@yahoo.com wrote:
 Here's a good website on using R  SAS. I am not sure if this site or the 
 book mentioned talks about connecting to SAS from R, but nevertheless its 
 worth going through. I have used both - and really can't see much benefit 
 other than transferring datasets.

 Regards,
 Indrajit



 - Original Message 
 From: Michael comtech@gmail.com
 To: r-help r-h...@stat.math.ethz.ch
 Sent: Friday, September 25, 2009 6:06:58 AM
 Subject: [R] Does anybody know how to connect to SAS from within R?

 And what might be the benefit doing that?

 Thanks a lot!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem on plotting TS using GGPLOT

2009-09-25 Thread Bogaso

Thanks for this reply. Here my goal is to plot multiple time series in the
same plotting window. Here y variable has 306 elements, however each value
is associated with factor which is represented by vv variable.

 I want to plot total 6 time series, for example 1st 51 values of y,
represented by a should be treated as single TS, with index as index of
dat2.similarly 2nd 51 values of y, represented by b
should be treated as another single TS, with index as index of
dat2...and so on.

What is the problem here?

Thanks,



ONKELINX, Thierry wrote:
 
 
 You are mixing data from two datasets with different lengths. Your x
 variable has 51 elements, while the y variable has 306 elements? What
 did you expect to happen with that? 
 
 Use only one dataset within a geom(). Otherwise you are likely the get
 in troubles.
 
 HTH,
 
 Thierry 
 
 
 
 
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and Forest
 Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
 methodology and quality assurance
 Gaverstraat 4
 9500 Geraardsbergen
 Belgium
 tel. + 32 54/436 185
 thierry.onkel...@inbo.be
 www.inbo.be
 
 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to
 say what the experiment died of.
 ~ Sir Ronald Aylmer Fisher
 
 The plural of anecdote is not data.
 ~ Roger Brinner
 
 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of
 data.
 ~ John Tukey
 
 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens bogaso.christofer
 Verzonden: vrijdag 25 september 2009 14:47
 Aan: r-h...@stat.math.ethz.ch
 Onderwerp: [R] Problem on plotting TS using GGPLOT
 
 Hi, I have following codes :
 
 
 
 
 
 library(zoo); library(ggplot2); library(plyr)
 
 
 
 
 
 dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv)
 
 dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)),
 frequency=12)
 
 
 
 ggplot(dat1) +
 
 geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv,
 size =
 1.3)
 
 
 
 
 
 However I got error while plotting them :
 
 Error in data.frame(x = c(2000, 2000.083, 2000.167,
 2000.25,
 : 
 
   arguments imply differing number of rows: 51, 306
 
 
 
 I could not find why that error is coming. Any idea please ?
 
 
 
 Thanks,
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Druk dit bericht a.u.b. niet onnodig af.
 Please do not print this message unnecessarily.
 
 Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
 weer 
 en binden het INBO onder geen enkel beding, zolang dit bericht niet
 bevestigd is
 door een geldig ondertekend document. The views expressed in  this message 
 and any annex are purely those of the writer and may not be regarded as
 stating 
 an official position of INBO, as long as the message is not confirmed by a
 duly 
 signed document.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Problem-on-plotting-TS-using-GGPLOT-tp25611332p25612219.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem on plotting TS using GGPLOT

2009-09-25 Thread Bogaso

Let me be more specific. My goal is to plot following multiple TS, using
ggplot2

dat1 - zooreg(matrix(rnorm(306), 51), as.yearmon(as.Date(2000-01-01)),
frequency=12)
colnames(dat1) - letters[1:6]
dat1

Still I can not get what is problem in my ggplot2 codes. Please give some
idea.

Best,


Bogaso wrote:
 
 Thanks for this reply. Here my goal is to plot multiple time series in the
 same plotting window. Here y variable has 306 elements, however each value
 is associated with factor which is represented by vv variable.
 
  I want to plot total 6 time series, for example 1st 51 values of y,
 represented by a should be treated as single TS, with index as index of
 dat2.similarly 2nd 51 values of y, represented by
 b should be treated as another single TS, with index as index of
 dat2...and so on.
 
 What is the problem here?
 
 Thanks,
 
 
 
 ONKELINX, Thierry wrote:
 
 
 You are mixing data from two datasets with different lengths. Your x
 variable has 51 elements, while the y variable has 306 elements? What
 did you expect to happen with that? 
 
 Use only one dataset within a geom(). Otherwise you are likely the get
 in troubles.
 
 HTH,
 
 Thierry 
 
 
 
 
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and Forest
 Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
 methodology and quality assurance
 Gaverstraat 4
 9500 Geraardsbergen
 Belgium
 tel. + 32 54/436 185
 thierry.onkel...@inbo.be
 www.inbo.be
 
 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to
 say what the experiment died of.
 ~ Sir Ronald Aylmer Fisher
 
 The plural of anecdote is not data.
 ~ Roger Brinner
 
 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of
 data.
 ~ John Tukey
 
 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens bogaso.christofer
 Verzonden: vrijdag 25 september 2009 14:47
 Aan: r-h...@stat.math.ethz.ch
 Onderwerp: [R] Problem on plotting TS using GGPLOT
 
 Hi, I have following codes :
 
 
 
 
 
 library(zoo); library(ggplot2); library(plyr)
 
 
 
 
 
 dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv)
 
 dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)),
 frequency=12)
 
 
 
 ggplot(dat1) +
 
 geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv,
 size =
 1.3)
 
 
 
 
 
 However I got error while plotting them :
 
 Error in data.frame(x = c(2000, 2000.083, 2000.167,
 2000.25,
 : 
 
   arguments imply differing number of rows: 51, 306
 
 
 
 I could not find why that error is coming. Any idea please ?
 
 
 
 Thanks,
 
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Druk dit bericht a.u.b. niet onnodig af.
 Please do not print this message unnecessarily.
 
 Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
 weer 
 en binden het INBO onder geen enkel beding, zolang dit bericht niet
 bevestigd is
 door een geldig ondertekend document. The views expressed in  this
 message 
 and any annex are purely those of the writer and may not be regarded as
 stating 
 an official position of INBO, as long as the message is not confirmed by
 a duly 
 signed document.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Problem-on-plotting-TS-using-GGPLOT-tp25611332p25612307.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem on plotting TS using GGPLOT

2009-09-25 Thread Gabor Grothendieck
First get the correct representation which here would be a multivariate
zoo series with 51 time points and 6 components series and then plot it
using zoo's plot function:

z - zoo(matrix(dat, 51), time(dat2))

# all in one panel
plot(z, pch = letters[1:6], screen = 1, type = b, col = 1:6)

# or in separate panels (same but omit screen = 1)
plot(z, pch = letters[1:6], type = b, col = 1:6)

There are many examples of plotting zoo series in the 3 vignettes that
come with zoo and also in ?plot.zoo and ?xyplot.zoo

If you wish to use ggplot2 you can extract the data and times
into a new data frame and use that data frame for further computation.

DF - cbind(tt = time(z), as.data.frame(z))


On Fri, Sep 25, 2009 at 9:32 AM, Bogaso bogaso.christo...@gmail.com wrote:

 Thanks for this reply. Here my goal is to plot multiple time series in the
 same plotting window. Here y variable has 306 elements, however each value
 is associated with factor which is represented by vv variable.

  I want to plot total 6 time series, for example 1st 51 values of y,
 represented by a should be treated as single TS, with index as index of
 dat2.similarly 2nd 51 values of y, represented by b
 should be treated as another single TS, with index as index of
 dat2...and so on.

 What is the problem here?

 Thanks,



 ONKELINX, Thierry wrote:


 You are mixing data from two datasets with different lengths. Your x
 variable has 51 elements, while the y variable has 306 elements? What
 did you expect to happen with that?

 Use only one dataset within a geom(). Otherwise you are likely the get
 in troubles.

 HTH,

 Thierry


 
 
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and Forest
 Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
 methodology and quality assurance
 Gaverstraat 4
 9500 Geraardsbergen
 Belgium
 tel. + 32 54/436 185
 thierry.onkel...@inbo.be
 www.inbo.be

 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to
 say what the experiment died of.
 ~ Sir Ronald Aylmer Fisher

 The plural of anecdote is not data.
 ~ Roger Brinner

 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of
 data.
 ~ John Tukey

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens bogaso.christofer
 Verzonden: vrijdag 25 september 2009 14:47
 Aan: r-h...@stat.math.ethz.ch
 Onderwerp: [R] Problem on plotting TS using GGPLOT

 Hi, I have following codes :





 library(zoo); library(ggplot2); library(plyr)





 dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv)

 dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)),
 frequency=12)



 ggplot(dat1) +

             geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv,
 size =
 1.3)





 However I got error while plotting them :

 Error in data.frame(x = c(2000, 2000.083, 2000.167,
 2000.25,
 :

   arguments imply differing number of rows: 51, 306



 I could not find why that error is coming. Any idea please ?



 Thanks,


       [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Druk dit bericht a.u.b. niet onnodig af.
 Please do not print this message unnecessarily.

 Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
 weer
 en binden het INBO onder geen enkel beding, zolang dit bericht niet
 bevestigd is
 door een geldig ondertekend document. The views expressed in  this message
 and any annex are purely those of the writer and may not be regarded as
 stating
 an official position of INBO, as long as the message is not confirmed by a
 duly
 signed document.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 View this message in context: 
 http://www.nabble.com/Problem-on-plotting-TS-using-GGPLOT-tp25611332p25612219.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list

Re: [R] Problem on plotting TS using GGPLOT

2009-09-25 Thread Bogaso

Thanks Gabor for your input. However I know there is option in zoo to plot
multiple time series. However I want to go with ggplot2 because it looks
better. If anyone point me where is the problem in my ggplot2 code, I would
be truly grateful.

Thanks,



Gabor Grothendieck wrote:
 
 First get the correct representation which here would be a multivariate
 zoo series with 51 time points and 6 components series and then plot it
 using zoo's plot function:
 
 z - zoo(matrix(dat, 51), time(dat2))
 
 # all in one panel
 plot(z, pch = letters[1:6], screen = 1, type = b, col = 1:6)
 
 # or in separate panels (same but omit screen = 1)
 plot(z, pch = letters[1:6], type = b, col = 1:6)
 
 There are many examples of plotting zoo series in the 3 vignettes that
 come with zoo and also in ?plot.zoo and ?xyplot.zoo
 
 If you wish to use ggplot2 you can extract the data and times
 into a new data frame and use that data frame for further computation.
 
 DF - cbind(tt = time(z), as.data.frame(z))
 
 
 On Fri, Sep 25, 2009 at 9:32 AM, Bogaso bogaso.christo...@gmail.com
 wrote:

 Thanks for this reply. Here my goal is to plot multiple time series in
 the
 same plotting window. Here y variable has 306 elements, however each
 value
 is associated with factor which is represented by vv variable.

  I want to plot total 6 time series, for example 1st 51 values of y,
 represented by a should be treated as single TS, with index as index of
 dat2.similarly 2nd 51 values of y, represented by
 b
 should be treated as another single TS, with index as index of
 dat2...and so on.

 What is the problem here?

 Thanks,



 ONKELINX, Thierry wrote:


 You are mixing data from two datasets with different lengths. Your x
 variable has 51 elements, while the y variable has 306 elements? What
 did you expect to happen with that?

 Use only one dataset within a geom(). Otherwise you are likely the get
 in troubles.

 HTH,

 Thierry


 
 
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and Forest
 Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
 methodology and quality assurance
 Gaverstraat 4
 9500 Geraardsbergen
 Belgium
 tel. + 32 54/436 185
 thierry.onkel...@inbo.be
 www.inbo.be

 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to
 say what the experiment died of.
 ~ Sir Ronald Aylmer Fisher

 The plural of anecdote is not data.
 ~ Roger Brinner

 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of
 data.
 ~ John Tukey

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens bogaso.christofer
 Verzonden: vrijdag 25 september 2009 14:47
 Aan: r-h...@stat.math.ethz.ch
 Onderwerp: [R] Problem on plotting TS using GGPLOT

 Hi, I have following codes :





 library(zoo); library(ggplot2); library(plyr)





 dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv)

 dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)),
 frequency=12)



 ggplot(dat1) +

             geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv,
 size =
 1.3)





 However I got error while plotting them :

 Error in data.frame(x = c(2000, 2000.083, 2000.167,
 2000.25,
 :

   arguments imply differing number of rows: 51, 306



 I could not find why that error is coming. Any idea please ?



 Thanks,


       [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Druk dit bericht a.u.b. niet onnodig af.
 Please do not print this message unnecessarily.

 Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
 weer
 en binden het INBO onder geen enkel beding, zolang dit bericht niet
 bevestigd is
 door een geldig ondertekend document. The views expressed in  this
 message
 and any annex are purely those of the writer and may not be regarded as
 stating
 an official position of INBO, as long as the message is not confirmed by
 a
 duly
 signed document.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 View this message in context:
 http://www.nabble.com/Problem-on-plotting-TS-using-GGPLOT-tp25611332p25612219.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 

[R] Spliting columns, strings or reg exp returning substrings

2009-09-25 Thread Dry, Jonathan R
Currently as the first column in a data frame I have string values in the 
format xx_yy - I want to create a new column with just the substring xx (for 
each row in turn).  Three possible ways to do this might be (1) split the 
string by '_' using strsplit and paste the first of the resulting variables 
into a new column, but I have been unable to do this for each row of my data 
frame in turn (trying to use apply); (2) split the column into two based on 
'_', but I am not sure if this is possible; (3) use a regular expression to 
return the substring up to the '_', but I am unsure how to make a regular 
expression return the substring it matches to in R.

Any ideas on all three counts would be gratefully recieved.

--
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem on plotting TS using GGPLOT

2009-09-25 Thread Gabor Grothendieck
Once you have reduced it to a data frame as already discussed, its
just a ggplot2 problem so you can take it to the ggplot2 group:
http://groups.google.com/group/ggplot2

On Fri, Sep 25, 2009 at 9:58 AM, Bogaso bogaso.christo...@gmail.com wrote:

 Thanks Gabor for your input. However I know there is option in zoo to plot
 multiple time series. However I want to go with ggplot2 because it looks
 better. If anyone point me where is the problem in my ggplot2 code, I would
 be truly grateful.

 Thanks,



 Gabor Grothendieck wrote:

 First get the correct representation which here would be a multivariate
 zoo series with 51 time points and 6 components series and then plot it
 using zoo's plot function:

 z - zoo(matrix(dat, 51), time(dat2))

 # all in one panel
 plot(z, pch = letters[1:6], screen = 1, type = b, col = 1:6)

 # or in separate panels (same but omit screen = 1)
 plot(z, pch = letters[1:6], type = b, col = 1:6)

 There are many examples of plotting zoo series in the 3 vignettes that
 come with zoo and also in ?plot.zoo and ?xyplot.zoo

 If you wish to use ggplot2 you can extract the data and times
 into a new data frame and use that data frame for further computation.

 DF - cbind(tt = time(z), as.data.frame(z))


 On Fri, Sep 25, 2009 at 9:32 AM, Bogaso bogaso.christo...@gmail.com
 wrote:

 Thanks for this reply. Here my goal is to plot multiple time series in
 the
 same plotting window. Here y variable has 306 elements, however each
 value
 is associated with factor which is represented by vv variable.

  I want to plot total 6 time series, for example 1st 51 values of y,
 represented by a should be treated as single TS, with index as index of
 dat2.similarly 2nd 51 values of y, represented by
 b
 should be treated as another single TS, with index as index of
 dat2...and so on.

 What is the problem here?

 Thanks,



 ONKELINX, Thierry wrote:


 You are mixing data from two datasets with different lengths. Your x
 variable has 51 elements, while the y variable has 306 elements? What
 did you expect to happen with that?

 Use only one dataset within a geom(). Otherwise you are likely the get
 in troubles.

 HTH,

 Thierry


 
 
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and Forest
 Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
 methodology and quality assurance
 Gaverstraat 4
 9500 Geraardsbergen
 Belgium
 tel. + 32 54/436 185
 thierry.onkel...@inbo.be
 www.inbo.be

 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to
 say what the experiment died of.
 ~ Sir Ronald Aylmer Fisher

 The plural of anecdote is not data.
 ~ Roger Brinner

 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of
 data.
 ~ John Tukey

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens bogaso.christofer
 Verzonden: vrijdag 25 september 2009 14:47
 Aan: r-h...@stat.math.ethz.ch
 Onderwerp: [R] Problem on plotting TS using GGPLOT

 Hi, I have following codes :





 library(zoo); library(ggplot2); library(plyr)





 dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv)

 dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)),
 frequency=12)



 ggplot(dat1) +

             geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv,
 size =
 1.3)





 However I got error while plotting them :

 Error in data.frame(x = c(2000, 2000.083, 2000.167,
 2000.25,
 :

   arguments imply differing number of rows: 51, 306



 I could not find why that error is coming. Any idea please ?



 Thanks,


       [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Druk dit bericht a.u.b. niet onnodig af.
 Please do not print this message unnecessarily.

 Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
 weer
 en binden het INBO onder geen enkel beding, zolang dit bericht niet
 bevestigd is
 door een geldig ondertekend document. The views expressed in  this
 message
 and any annex are purely those of the writer and may not be regarded as
 stating
 an official position of INBO, as long as the message is not confirmed by
 a
 duly
 signed document.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[R] summarize-plyr package

2009-09-25 Thread Veerappa Chetty
Hi,I am using the amazing package 'plyr. I have one problem. I would
appreciate help to fix the following error: Thanks.
__

 library(plyr)
 data(baseball)
 summarise(baseball,
+ duration = max(year) - min(year),
+ nteams = length(unique(team)))
Error: could not find function summarise
 ddply(baseball, id, summarise,
+ duration = max(year) - min(year),
+ nteams = length(unique(team)))
Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress) :
  object summarise not found


-- 
Professor of Family Medicine
Boston University
Tel: 617-414-6221, Fax:617-414-3345
emails: chett...@gmail.com,vche...@bu.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spliting columns, strings or reg exp returning substrings

2009-09-25 Thread Henrique Dallazuanna
Try this:

DF - data.frame(A = c('11_12', '22_23', '33_34'),
 B = sample(3))

#1) Using strsplit
transform(DF, C = sapply(strsplit(as.character(DF$A), _), '[', 1))

#2) Using substr
transform(DF, C = substr(DF$A, 1, 2))

#3) Using regex
transform(DF, C = gsub(_.*, , DF$A))


On Fri, Sep 25, 2009 at 11:01 AM, Dry, Jonathan R
jonathan@astrazeneca.com wrote:
 Currently as the first column in a data frame I have string values in the 
 format xx_yy - I want to create a new column with just the substring xx (for 
 each row in turn).  Three possible ways to do this might be (1) split the 
 string by '_' using strsplit and paste the first of the resulting variables 
 into a new column, but I have been unable to do this for each row of my data 
 frame in turn (trying to use apply); (2) split the column into two based on 
 '_', but I am not sure if this is possible; (3) use a regular expression to 
 return the substring up to the '_', but I am unsure how to make a regular 
 expression return the substring it matches to in R.

 Any ideas on all three counts would be gratefully recieved.

 --
 AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summarize-plyr package

2009-09-25 Thread baptiste auguie
Hi,

it works for me with plyr version 0.1.9. Try upgrading to the latest
version, or post your sessionInfo()


HTH,

baptiste

2009/9/25 Veerappa Chetty chett...@gmail.com:
 Hi,I am using the amazing package 'plyr. I have one problem. I would
 appreciate help to fix the following error: Thanks.
 __

 library(plyr)
 data(baseball)
 summarise(baseball,
 + duration = max(year) - min(year),
 + nteams = length(unique(team)))
 Error: could not find function summarise
 ddply(baseball, id, summarise,
 + duration = max(year) - min(year),
 + nteams = length(unique(team)))
 Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress) :
  object summarise not found
 

 --
 Professor of Family Medicine
 Boston University
 Tel: 617-414-6221, Fax:617-414-3345
 emails: chett...@gmail.com,vche...@bu.edu

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summarize-plyr package

2009-09-25 Thread Girish A.R.

Works alright for me:

 summarise(baseball,duration = max(year) - min(year),nteams =
 length(unique(team)))
  duration nteams
1  136132

 ddply(baseball, id, summarise, duration = max(year) - min(year), nteams
 = length(unique(team)))
id duration nteams
1aaronha01   22  3
2abernte02   17  7
3adairje01   12  4
4adamsba01   20  2
5adamsbo03   13  4



cheers,
-Girish

 sessionInfo()
R version 2.9.2 (2009-08-24) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods  
base 

other attached packages:
[1] RWinEdt_1.8-1  gtools_2.6.1   gmodels_2.15.0 ggplot2_0.8.3 
reshape_0.8.3 
[6] plyr_0.1.9 proto_0.3-8doBy_4.0.2

loaded via a namespace (and not attached):
 [1] cluster_1.12.0   Formula_0.1-3gdata_2.6.1  Hmisc_3.7-0 
kinship_1.1.0-23
 [6] lattice_0.17-25  MASS_7.2-48  nlme_3.1-94  plm_1.1-4   
sandwich_2.2-1  
[11] splines_2.9.2survival_2.35-7  tools_2.9.2 


-

Veerappa Chetty wrote:
 
 Hi,I am using the amazing package 'plyr. I have one problem. I would
 appreciate help to fix the following error: Thanks.
 __
 
 library(plyr)
 data(baseball)
 summarise(baseball,
 + duration = max(year) - min(year),
 + nteams = length(unique(team)))
 Error: could not find function summarise
 ddply(baseball, id, summarise,
 + duration = max(year) - min(year),
 + nteams = length(unique(team)))
 Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress) :
   object summarise not found
 
 
 -- 
 Professor of Family Medicine
 Boston University
 Tel: 617-414-6221, Fax:617-414-3345
 emails: chett...@gmail.com,vche...@bu.edu
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/summarize-plyr-package-tp25612974p25613167.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multicomp plotting

2009-09-25 Thread Nair, Murlidharan T
I have been trying using the following

require(multcomp)
tmp - list(confint=sig.data)
attr(tmp, type) - none
old.oma - par(oma=c(0,1,0,0))
multcomp:::plot.confint.glht(tmp)
par(old.oma)

I have not been able to get it to work. I would greatly appreciate some 
suggestion. 
Thanks .../Murli






From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Nair, Murlidharan T [mn...@iusb.edu]
Sent: Thursday, September 24, 2009 2:06 PM
To: r-help@r-project.org
Subject: [R] multicomp plotting

I am trying to plot my multiple comparison data. Can anyone give me some input 
of the error I am getting. The data and code is appended below.
Thanks ../Murli





library(multcomp)
sig.data-structure(list(X = 1:63, Cell.lines = structure(c(1L, 6L, 13L,
25L, 33L, 42L, 2L, 7L, 14L, 26L, 34L, 43L, 3L, 4L, 5L, 18L, 22L,
52L, 58L, 8L, 27L, 35L, 45L, 9L, 36L, 46L, 10L, 15L, 28L, 37L,
47L, 11L, 16L, 29L, 38L, 44L, 12L, 17L, 30L, 39L, 48L, 19L, 23L,
53L, 59L, 20L, 21L, 24L, 54L, 60L, 31L, 40L, 49L, 50L, 32L, 41L,
51L, 55L, 61L, 56L, 62L, 57L, 63L), .Label = c(DU145-Caki-2,
DU145-Calu1, HCE-7-DU145, HCT116-DU145, HT29-DU145, LAPC4-Caki-2,
LAPC4-Calu1, LAPC4-EC-17, LAPC4-Fet, LAPC4-HCE-7, LAPC4-HCT116,
LAPC4-HT29, LNCaP-Caki-2, LNCaP-Calu1, LNCaP-HCE-7, LNCaP-HCT116,
LNCaP-HT29, LS174-DU145, LS174-LAPC4, LS174-LNCaP, MCF7-LNCaP,
MDA-MB-468-DU145, MDA-MB-468-LAPC4, MDA-MB-468-LNCaP, PC3-Caki-2,
PC3-Calu1, PC3-EC-17, PC3-HCE-7, PC3-HCT116-2, PC3-HT29,
PC3-LS174, PC3-MDA-MB-468, RWPE1-Caki-2, RWPE1-Calu1,
RWPE1-EC-17, RWPE1-Fet, RWPE1-HCE-7, RWPE1-HCT116, RWPE1-HT29,
RWPE1-LS174, RWPE1-MDA-MB-468, RWPE2-Caki-2, RWPE2-Calu1,
RWPE2-E-HCT116, RWPE2-EC-17, RWPE2-Fet, RWPE2-HCE-7,
RWPE2-HT29, RWPE2-LS174, RWPE2-MCF7, RWPE2-MDA-MB-468,
SW480-DU145, SW480-LAPC4, SW480-LNCaP, SW480-PC3, SW480-RWPE1,
SW480-RWPE2, TE3-DU145, TE3-LAPC4, TE3-LNCaP, TE3-PC3,
TE3-RWPE1, TE3-RWPE2), class = factor), estimate = c(-2759.302703,
-3690.072718, -2607.150854, -3282.218985, -3635.312686, -3786.281227,
-1189.109264, -2119.879279, -1036.957415, -1712.025546, -2065.119246,
-2216.087787, 1253.075395, 1009.183561, 808.413018, 2038.189972,
788.61518, 1453.525701, 1001.526663, -1135.02519, -727.171457,
-1080.265157, -1231.233698, -682.040377, -627.280345, -778.248885,
-2183.84541, -1100.923546, -1775.991677, -2129.085377, -2280.053918,
-1939.953576, -857.031712, -1532.099843, -1885.193544, -2036.162085,
-1739.183033, -656.261169, -1331.3293, -1684.423001, -1835.391542,
2968.959987, 1719.385195, 2384.295716, 1932.296678, 1886.038123,
-578.466846, 636.463331, 1301.373852, 849.374814, -2561.106254,
-2914.199954, -3065.168495, -600.663526, -1311.531462, -1664.625162,
-1815.593703, 1976.441983, 1524.442945, 2329.535683, 1877.536646,
2480.504224, 2028.505187), lower = c(-3326.68652, -4257.45653,
-3174.53467, -3849.6028, -4202.6965, -4353.66504, -1756.49308,
-2687.26309, -1604.34123, -2279.40936, -2632.50306, -2783.4716,
685.69158, 441.79975, 241.02921, 1470.80616, 221.23137, 886.14189,
434.14285, -1702.409, -1294.55527, -1647.64897, -1798.61751,
-1249.42419, -1194.66416, -1345.6327, -2751.22922, -1668.30736,
-2343.37549, -2696.46919, -2847.43773, -2507.33739, -1424.41552,
-2099.48366, -2452.57736, -2603.5459, -2306.56685, -1223.64498,
-1898.71311, -2251.80681, -2402.77535, 2401.57617, 1152.00138,
1816.9119, 1364.91287, 1318.65431, -1145.85066, 69.07952, 733.99004,
281.991, -3128.49007, -3481.58377, -3632.55231, -1168.04734,
-1878.91527, -2232.00897, -2382.97752, 1409.05817, 957.05913,
1762.15187, 1310.15283, 1913.12041, 1461.12137), upper = c(-2191.918891,
-3122.688906, -2039.767042, -2714.835173, -3067.928873, -3218.897414,
-621.725451, -1552.495466, -469.573602, -1144.641733, -1497.735434,
-1648.703975, 1820.459207, 1576.567374, 1375.796831, 2605.573784,
1355.998992, 2020.909513, 1568.910476, -567.641377, -159.787644,
-512.881345, -663.849886, -114.656565, -59.896532, -210.865073,
-1616.461597, -533.539733, -1208.607864, -1561.701565, -1712.670106,
-1372.569764, -289.6479, -964.716031, -1317.809731, -1468.778272,
-1171.799221, -88.877357, -763.945488, -1117.039188, -1268.007729,
3536.343799, 2286.769007, 2951.679528, 2499.680491, 2453.421935,
-11.083033, 1203.847143, 1868.757664, 1416.758627, -1993.722441,
-2346.816142, -2497.784683, -33.279714, -744.147649, -1097.24135,
-1248.209891, 2543.825795, 2091.826758, 2896.919496, 2444.920458,
3047.888037, 2595.888999), p.val.raw = c(2.22e-15, 0, 8.22e-15,
0, 0, 0, 6.2e-08, 7.41e-13, 6.07e-07, 6.36e-11, 1.29e-12, 2.85e-13,
2.47e-08, 9.33e-07, 2.3e-05, 1.71e-12, 3.18e-05, 1.59e-09, 1.05e-06,
1.37e-07, 8.74e-05, 3.13e-07, 3.37e-08, 0.000184, 0.000452, 3.77e-05,
3.91e-13, 2.29e-07, 3.02e-11, 6.75e-13, 1.54e-13, 4.84e-12, 1.05e-05,
5.77e-10, 8.81e-12, 1.75e-12, 4.62e-11, 0.000281, 8.24e-09, 8.83e-11,
1.53e-11, 4.44e-16, 5.83e-11, 5.82e-14, 5.26e-12, 8.73e-12, 0.001,
0.000389, 1.25e-08, 1.18e-05, 1.2e-14, 6.66e-16, 2.22e-16, 0.000698,
1.08e-08, 1.12e-10, 1.92e-11, 

Re: [R] packGrob and dynamic resizing

2009-09-25 Thread hadley wickham
On Fri, Sep 25, 2009 at 7:55 AM, baptiste auguie
baptiste.aug...@googlemail.com wrote:
 Thank you Paul, I was convinced I tried this option but I obviously didn't!

 In ?packGrob, the user is warned that packing grobs can be slow. In
 order to quantify this, I made the following comparison of 3
 functions,

 - table1 uses frameGrob and packGrob
 - table2 uses frameGrob but calculates the sizes manually and uses placeGrob
 - table3 creates a grid.layout and draws the grobs in the different viewports.

 The three functions have (almost) the same output, but the timing does
 differ quite substantially !

This matches my experience with ggplot2 - I have been gradually moving
away from frameGrob and packGrob because doing the placement myself is
much faster (and for most of the cases I'm interested in, the full
power of packGrob is not needed)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading data

2009-09-25 Thread Michael A. Miller
Sometimes it is easiest to open a file using a file selection
widget.  I keep this in my .Rprofile:

getOpenFile - function(...){
  require(tcltk)
  return(tclvalue(tkgetOpenFile()))
}

With this you can find your file and open it with 

  rel - read.table(getOpenFile(), quote=, header=FALSE, sep=, 
col.names=c(id,orel,nrel))

or 

  filename - getOpenFile()
  rel - read.table(filename, quote=, header=FALSE, sep=, 
col.names=c(id,orel,nrel))

Mike


P.S. I keep a couple functions on hand for choosing writable files
and directories too...

getSaveFile - function(...){
  require(tcltk)
  return(tclvalue(tkgetSaveFile()))
}

chooseDir - function(...){
  require(tcltk)
  return(tclvalue(tkchooseDirectory()))
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error with Mixdist in R

2009-09-25 Thread Suchit सुिचतShah
Dear R User,

I am an electrical engineering student and have just come across a curve 
fitting problem. I need to find the constituent Gaussian distribution curves 
fitting the data attached in Workbook1.txt here. I tried to use Mixdist on R 
but ran into following problem. Can you suggest me where I am going wrong?

 super -read.table(Workbook1.txt,,sep =\t)
 plot(super)
 fitmixdata -as.mixdata(super)
 plot(fitmixdata)
 plotfit1-mix(super,mixparam(c(-75,-67,-38),10),norm,mixconstr(consigma=NONE))

Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist = dist,  : 
  missing value in parameter

I get this error.

Awaiting a reply soon.

Thanking you,

Regards,


 Suchit Shah 
BOSTON


  Try the new Yahoo! India Homepage. Click here. http://in.yahoo.com/trynew-1500
-1460
-1420
-1380
-1350
-1310
-12744.293
-123118.463
-12027.862
-11613.076
-112173.738
-108322.141
-105633.341
-1011153.095
-97.2   1775.843
-93.5   2956.544
-89.7   4966.666
-86 8216.293
-82.2   13535.566
-78.5   21288.975
-74.8   28815.691
-71 36041.516
-67.3   46679.93
-63.5   59945.395
-59.8   73005.781
-56 89597.742
-52.3   114438.898
-48.6   142680.047
-44.8   170931.375
-41.1   201308.688
-37.3   219909.109
-33.6   209581.188
-29.8   171905.469
-26.1   119971.742
-22.4   71685.445
-18.6   38779.398
-14.9   20554.045
-11.1   10713.763
-7.39   5092.355
-3.65   2304.784
0.0962  720.503
3.8471.953
7.580
11.30
15.10
18.80
22.60
26.30
30  0
33.80
37.50
41.30
45  0
48.80
52.50
56.20
60  0
63.70
67.50
71.20
74.90
78.70
82.40
86.20
89.90
93.70
97.40
101 0
105 0
109 0
112 0
116 0
120 0
124 0
127 0
131 0
135 0
139 0
142 0
146 0
150 0__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested select

2009-09-25 Thread jim holtman
try this:

 lines-lo ptcl5 ptcl99 variable
+  430. 8787a
+ 430   3422343 m
+  430. 89mr
+ 4314564774a
+ 431 299   2777m
+  4319996  mr
+  432333 3433   a
+  432 .7377m
+  432. 676  mr
+  
 DF - read.table(textConnection(lines), header=TRUE)

 closeAllConnections()
 subset(DF, (ptcl5 == '.')  (variable %in% c('a', 'm')))
   lo ptcl5 ptcl99 variable
1 430 .   8787a
8 432 .   7377m




On Fri, Sep 25, 2009 at 4:42 AM, premmad mtechp...@gmail.com wrote:

 my data :

 library(doBy)

 lines-lo ptcl5 ptcl99 variable
  430        .             8787        a
 430       342        2343         m
  430        .             89            mr
 431        456        4774        a
 431         299       2777        m
  431        99            96          mr
  432        333         3433       a
  432         .            7377        m
  432        .             676          mr
  
 DF - read.table(con- textConnection(Lines), skip = 1)

 close(con)

 what i want is select lo when ptcl5 is missing and variable is either a or m
 .
 I tried the following query
 sqldf(select lo from DF where lo=(select lo where ptcl5='.' and
 variable='m') or lo=(select lo where ptcl5='.' and variable='a')).
 But I'm getting entire data instead of limited by the condition.
 Is my query right please help me in this.
 --
 View this message in context: 
 http://www.nabble.com/Nested-select-tp25608506p25608506.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error while plotting

2009-09-25 Thread Nair, Murlidharan T
I am getting the following errors when I am trying to plot the data below. I 
cannot figure out the error.
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf


#I am using the following code
#==
library(multcomp)
sig.data-structure(list(X = 1:10, Cell.lines = structure(c(2L, 5L, 8L, 
9L, 3L, 6L, 10L, 1L, 4L, 7L), .Label = c(T(70%)a-N(0%)c, T(70%)a-N(0%)f, 
T(70%)a-N(0%)i, T(70%)c-N(0%)c, T(70%)c-N(0%)f, T(70%)c-N(0%)i, 
T(80%)a-N(0%)c, T(80%)a-N(0%)f, T(90%)-N(0%)f, T(90%)-N(0%)i
), class = factor), estimate = c(9859.74333, -5553.64802, 6227.17947, 
8063.6472, 6548.86032, -8864.53103, 4752.7642, 9057.72021, -6355.67115, 
5425.15635), lower = c(5560.57875, -9852.8126, 1928.01489, 3764.48262, 
2249.69575, -13163.69561, 453.59962, 4758.55563, -10654.83573, 
1125.99177), upper = c(14158.90791, -1254.48344, 10526.34405, 
12362.81178, 10848.0249, -4565.36645, 9051.92877, 13356.88479, 
-2056.50657, 9724.32092), p.val.raw = c(1.15e-08, 5.78e-05, 1.36e-05, 
3.21e-07, 6.91e-06, 6.97e-08, 0.000331, 4.87e-08, 1.04e-05, 7.63e-05
), p.val.bon = c(2.66e-06, 0.0133, 0.00315, 7.41e-05, 0.0016, 
1.61e-05, 0.0764, 1.13e-05, 0.0024, 0.0176), p.val.adj = c(2.65e-13, 
0.000592, 2.82e-05, 9.72e-08, 6.56e-05, 8.76e-09, 0.0117, 6.22e-09, 
6.44e-06, 0.000334)), .Names = c(X, Cell.lines, estimate, 
lower, upper, p.val.raw, p.val.bon, p.val.adj), class = data.frame, 
row.names = c(T(70%)a-N(0%)f, 
T(70%)c-N(0%)f, T(80%)a-N(0%)f, T(90%)-N(0%)f, T(70%)a-N(0%)i, 
T(70%)c-N(0%)i, T(90%)-N(0%)i, T(70%)a-N(0%)c, T(70%)c-N(0%)c, 
T(80%)a-N(0%)c))

rownames(sig.data)-sig.data[,2]
my.hmtest - structure(list(
  estimate = t(t(structure(sig.data[,estimate], .Names = 
rownames(sig.data,
  conf.int = sig.data[,4:5],
  ctype = ABCC4-2007),
  class = hmtest)
par(mex=0.5) #This helps to accomodate the margins when text is getting cut off
plot(my.hmtest, cex.axis=0.7)

 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fitting a asymmetric logistic peak curve

2009-09-25 Thread Marc


UseRs,

I am working on the analysis of green area growth in winter wheat and the 
effects of the amount of water on it. I am trying to fit a asymmetric 
logistic peak curve to my data as described by Royo et al., Europ. J Agronomy 
20 (2004) 419. I want to calculate the maximum green area, maximum growth 
rate and senescence for each cultivar in each treatment.

I started by calculating the means of all cultivars in all treatments in each 
sampling date and fit this data to the curve using the nls function in the 
stats package. I am new to non-linear regression and I am getting the error 
described below. After doing some search, it seems that the problem is the 
start values of the coefficients and some suggestions were done by 
linearizing the data in order to have better starting values. I have no idea 
how to do this with my data. 

Any help in solving this problem will be appreciated.

Marc.

 sessionInfo()
R version 2.9.2 (2009-08-24) 
i486-pc-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] chron_2.3-30

 x
   ndays y
1 99 0.4047951
2112 0.5894659
3125 0.6570246
4133 0.7065050
5139 0.6634155
6148 0.7051833
7162 0.6794740
8169 0.6399054
9175 0.4850703
10   182 0.2961120

 model3 - nls(x ~ a + 
(b/e)*{(1+exp(ndays+d*log(e)-f)/d)^-((e+1)/e)}*{(exp(ndays+d*log(e)-f)/d)^-(e+1)/e}*(e+1)^{(e+1)/e},
+   data = x,start = list(a = -1, b = 0.5, f=-0.4, d=0.6, e=2)
+   )
Error in nlsModel(formula, mf, start, wts) : 
  singular gradient matrix at initial parameter estimates

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression for Multinomial Data using R

2009-09-25 Thread Nimal Fernando
Hi

I want to do logistic regression for multinomial data.

How can I do it in R?

Thanks a lot

Nimal Fernando

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.delim very slow in reading files with lots of columns

2009-09-25 Thread Ping-Hsun Hsieh
Thanks, Ben.

The matrix is a pure numeric matrix (6x70, 31mb).
I tried the colClasses='numeric' as well as nrows=7(one of these is header 
line) on the matrix.
Also I tested it with not setting the two options in read.delim()

Here is the time spent on reading the matrix for each test.

system.time( tmp - read.delim(test_data.txt))
 usersystem   elapsed
50985.42127.665 51013.384

system.time(tmp - 
read.delim(test_data.txt,colClasses=numeric,nrows=7,comment.char=))
 usersystem   elapsed
51301.56360.491 51362.208

It seems setting the options does not speed up the reading at all.
Is it because of the header line? I will test it.
Did I misunderstand something?

One additional and interesting observation:
The one with the options does save memory a lot. It took ~150mb, while the 
other took ~4GB for reading the matrix.

I will try the scan() and see if it helps.

Thanks!
Mike


-Original Message-
From: Benilton Carvalho [mailto:bcarv...@jhsph.edu] 
Sent: Wednesday, September 23, 2009 4:56 PM
To: Ping-Hsun Hsieh
Cc: r-help@r-project.org
Subject: Re: [R] read.delim very slow in reading files with lots of columns

use the 'colClasses' argument and you can also set 'nrows'.

b

On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote:

 Hi,



 I am trying to read a tab-delimited file into R (Ver. 2.8). The  
 machine I am using is 64bit Linux with 16 GB.

 The file is basically a matrix(~600x70) and as large as 3GB.



 The read.delim() ran extremely slow (hours) even with a subset of  
 the file (31 MB with 6x70)

 I monitored the memory usage, and found it constantly only took less  
 than 1% of 16GB memory.

 Does read.delim() have difficulty to read files with lots of columns?

 Any suggestions?



 Thanks,

 Mike




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SEa nd CI

2009-09-25 Thread Ashta
How can I get the the  standard error and confidence interval for the
prediction in a multiple regression model using the R command?

for a simple regression I used

*predict(xc, newdata=data.frame(var1=10.),se=T)
where xc is the glm model using binomial and var1 is teh variable.
 *
I can get the upper and lower intervals of the prediction

Any help is welcome

.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non-parametric test for location with two unpaired setsof data measured on ordinal scale.

2009-09-25 Thread Greg Snow
Yes, I agree that the median makes the most sense here, but there could be 
other measures of location that would be of interest (quartiles, some version 
of the rank sum).

Here is some sample code for a permutation test on the medians (there are a 
couple of packages that will do this as well, but this is pretty straight 
forward with straight R code):

set1 - c(1,3,2,2,4,3,3,2,2)
set2 - c(4,4,4,3,3,5,4,4)

sets - c(set1,set2)
g1 - seq_along(set1)

orig - median( sets[ -g1 ] ) - median( sets[ g1 ] )

perms - replicate( 1999, { tmp - sample(sets)
median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )

#  or

pb - winProgressBar(max=1999)

setWinProgressBar(pb, 0)

perms - replicate(1999, { setWinProgressBar( pb, getWinProgressBar(pb) + 1 )
tmp - sample(sets)
median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )

close(pb)


perms - c(orig,perms)
sum( perms = orig )
mean( perms = orig )
prop.test( sum(perms=orig), length(perms) )
hist(perms)
abline(v=orig, col='blue')

(if you want the progress bar on an os other than windows, then use the tcltk 
package and the tkProgressBar).

Hope this helps,


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: John Sorkin [mailto:jsor...@grecc.umaryland.edu]
 Sent: Thursday, September 24, 2009 2:52 PM
 To: Greg Snow; r-help@r-project.org
 Subject: Re: [R] Non-parametric test for location with two unpaired
 setsof data measured on ordinal scale.
 
 Greg,
 I used the term location because I did not want to use the terms mean
 or median for the exact reason that you gave; these to values can be
 different in a given distribution. I want to test the null hypothesis
 that the data come from a single distribution. This is often done by
 comparing a measure of location (e.g. mean for ANOVOA), but as you know
 the mean need not be the only measure of location that is tested.
 Giiven that my data are measured on an ordinal scale, the mean is
 without meaning, so I suspect that the best measure for me would be a
 comparison of medians, but I am open to other suggestions.
 John
 
 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)
 
  Greg Snow greg.s...@imail.org 9/24/2009 4:30 PM 
 What do you mean by location?  I can think of examples where 2
 distributions have the same median but different means, or the same
 means but different medians.
 
 Are you willing to assume that the distributions are exactly the same
 under the null hypothesis? (not just the same 'center/location')
 
 I would probably do a permutation test on the difference between the
 means or medians (which ever you think is more meaningful), this
 assumes the exact same distribution under the null.
 
 You can also do a Mann-Whitney/Wilcoxin test (but I don't like
 explaining, or sometimes even thinking about, what it is actually
 testing), you could do a bootstrap confidence interval on the
 difference between means/medians (does not assume distributions are the
 same, just have same mean/median), or you could just replace all values
 by their ranks and do a t-test (essentially transforms the data to a
 uniform distribution, the CLT for the uniform kicks in around n=5, but
 I would simulate just to check).
 
 This is not the nice simple answer that you were probably looking for,
 but hopefully it gives you some things to think about that will help,
 
 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of John Sorkin
  Sent: Thursday, September 24, 2009 1:08 PM
  To: r-help@r-project.org
  Subject: [R] Non-parametric test for location with two unpaired sets
 of
  data measured on ordinal scale.
 
  Please forgive a stats question.
 
  I have to sets of data (unpaired) measured on an ordinal scale. I
 want
  to test to see if the two sets are different (i.e. do they have the
  same location):
 
  set1: 1,3,2,2,4,3,3,2,2
  set:   4,4,4,3,3,5,4,4
 
  What is the most appropriate non-parametric test to test location?
 
  Thanks,
  John
 
  Confidentiality Statement:
  This email message, including any attachments, is for
  th...{{dropped:6}}
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 

Re: [R] Logistic Regression for Multinomial Data using R

2009-09-25 Thread JLucke
Use polr from the MASS package




Nimal Fernando pnp...@gmail.com 
Sent by: r-help-boun...@r-project.org
09/25/2009 12:33 PM

To
r-help@r-project.org
cc

Subject
Re: [R] Logistic Regression for Multinomial Data using R






Hi

I want to do logistic regression for multinomial data.

How can I do it in R?

Thanks a lot

Nimal Fernando

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.delim very slow in reading files with lots of columns

2009-09-25 Thread Charles C. Berry

On Fri, 25 Sep 2009, Ping-Hsun Hsieh wrote:


Thanks, Ben.

The matrix is a pure numeric matrix (6x70, 31mb).
I tried the colClasses='numeric' as well as nrows=7(one of these is header 
line) on the matrix.
Also I tested it with not setting the two options in read.delim()



A couple of things come to mind.

First, I have not read the internals of scan, but suspect that parsing a 
really long line may be slowing things down.


Since you are attempting to read in a numeric matrix, you can simply do a 
global replacement of your delimiter with a newline and use scan on 
the result. On unix-like systems, something like


tmp - scan( pipe( 'tr \t \n   test_data.txt' ) )

ought to help.

Second, the memory occupied by each line - once it has been processed - is 
spread over the full 32MB (or 3.2 GB for the 600 by 70 version) region 
of memory. I am guessing that this is causing your cache to work hard to 
put it in place.


If you really want the result to be a 600 by 70 matrix, you might try 
to read it in smaller blocks using scan( pipe( cut ...  ) ) to feed 
selected blocks of columns of your text file to R.


HTH,

Chuck




Here is the time spent on reading the matrix for each test.


system.time( tmp - read.delim(test_data.txt))

usersystem   elapsed
50985.42127.665 51013.384


system.time(tmp - 
read.delim(test_data.txt,colClasses=numeric,nrows=7,comment.char=))

usersystem   elapsed
51301.56360.491 51362.208

It seems setting the options does not speed up the reading at all.
Is it because of the header line? I will test it.
Did I misunderstand something?

One additional and interesting observation:
The one with the options does save memory a lot. It took ~150mb, while the 
other took ~4GB for reading the matrix.

I will try the scan() and see if it helps.

Thanks!
Mike


-Original Message-
From: Benilton Carvalho [mailto:bcarv...@jhsph.edu]
Sent: Wednesday, September 23, 2009 4:56 PM
To: Ping-Hsun Hsieh
Cc: r-help@r-project.org
Subject: Re: [R] read.delim very slow in reading files with lots of columns

use the 'colClasses' argument and you can also set 'nrows'.

b

On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote:


Hi,



I am trying to read a tab-delimited file into R (Ver. 2.8). The
machine I am using is 64bit Linux with 16 GB.

The file is basically a matrix(~600x70) and as large as 3GB.



The read.delim() ran extremely slow (hours) even with a subset of
the file (31 MB with 6x70)

I monitored the memory usage, and found it constantly only took less
than 1% of 16GB memory.

Does read.delim() have difficulty to read files with lots of columns?

Any suggestions?



Thanks,

Mike




   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading data

2009-09-25 Thread Henrik Bengtsson
You can use R.utils (on CRAN) to help you figure out why the file is
not found or not readable.

library(R.utils);
pathname - C:/Documents and Settings/ashta/My Documents/R_data/rel.dat;
pathname - Arguments$getReadablePathname(pathname);
rel - read.table(pathname, quote=, header=FALSE, sep=,
col.names=c(id,orel,nrel));

If the file is not found it gives an error an tries to tell you why, e.g.

 Arguments$getReadablePathname(C:/Windows/system32/cmd.exe)
[1] C:/Windows/system32/cmd.exe

 Arguments$getReadablePathname(C:/Windows/system323/cmd.exe)
Error in list(`Arguments$getReadablePathname(C:/Windows/system323/cmd.exe)` =
environment,  :

[2009-09-25 10:11:57] Exception: Pathname not found:
C:/Windows/system323/cmd.exe (C:/Windows/ exists, but nothing beyond)
  at throw(Exception(...))
  at throw.default(Pathname not found: , pathname, reason)
  at throw(Pathname not found: , pathname, reason)
  at method(static, ...)
  at Arguments$getReadablePathname(C:/Windows/system323/cmd.exe)

It will also tell you if the file exists, but you don't have the
permission to read it.


Second, your error message reports on a pathname that starts with
'file=', which I've never seen;

  cannot open file 'file=C:/Documents and
Settings/sewalem/MyDocuments/R_data/rel.dat': Invalid argument

what version of R are you use, i.e. what does sessionInfo() give?


Third, it is true that backslashes need to be escaped.  However,
*forward-slashes* work with *any
platform*.  I stick with the latter so I don't have to think about it.
 It should make no difference in your case.


My $.02

/Henrik

On Fri, Sep 25, 2009 at 7:32 AM, Michael A. Miller mmill...@iupui.edu wrote:
 Sometimes it is easiest to open a file using a file selection
 widget.  I keep this in my .Rprofile:

 getOpenFile - function(...){
  require(tcltk)
  return(tclvalue(tkgetOpenFile()))
 }

 With this you can find your file and open it with

  rel - read.table(getOpenFile(), quote=, header=FALSE, sep=, 
 col.names=c(id,orel,nrel))

 or

  filename - getOpenFile()
  rel - read.table(filename, quote=, header=FALSE, sep=, 
 col.names=c(id,orel,nrel))

 Mike


 P.S. I keep a couple functions on hand for choosing writable files
 and directories too...

 getSaveFile - function(...){
  require(tcltk)
  return(tclvalue(tkgetSaveFile()))
 }

 chooseDir - function(...){
  require(tcltk)
  return(tclvalue(tkchooseDirectory()))
 }

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading data

2009-09-25 Thread Henrik Bengtsson
On Fri, Sep 25, 2009 at 10:18 AM, Henrik Bengtsson h...@stat.berkeley.edu 
wrote:
 You can use R.utils (on CRAN) to help you figure out why the file is
 not found or not readable.

 library(R.utils);
 pathname - C:/Documents and Settings/ashta/My Documents/R_data/rel.dat;
 pathname - Arguments$getReadablePathname(pathname);
 rel - read.table(pathname, quote=, header=FALSE, sep=,
 col.names=c(id,orel,nrel));

 If the file is not found it gives an error an tries to tell you why, e.g.

 Arguments$getReadablePathname(C:/Windows/system32/cmd.exe)
 [1] C:/Windows/system32/cmd.exe

 Arguments$getReadablePathname(C:/Windows/system323/cmd.exe)
 Error in list(`Arguments$getReadablePathname(C:/Windows/system323/cmd.exe)` 
 =
 environment,  :

 [2009-09-25 10:11:57] Exception: Pathname not found:
 C:/Windows/system323/cmd.exe (C:/Windows/ exists, but nothing beyond)
  at throw(Exception(...))
  at throw.default(Pathname not found: , pathname, reason)
  at throw(Pathname not found: , pathname, reason)
  at method(static, ...)
  at Arguments$getReadablePathname(C:/Windows/system323/cmd.exe)

 It will also tell you if the file exists, but you don't have the
 permission to read it.


 Second, your error message reports on a pathname that starts with
 'file=', which I've never seen;

  cannot open file 'file=C:/Documents and
 Settings/sewalem/MyDocuments/R_data/rel.dat': Invalid argument

 what version of R are you use, i.e. what does sessionInfo() give?

Did you *really* do?

 rel - read.table(C:/Documents and
Settings/sewalem/MyDocuments/R_data/rel.dat, quote=, header=FALSE,
sep=, col.names=c(id,orel,nrel))

or did you try to do:

 rel - read.table(file=C:/Documents and
Settings/sewalem/MyDocuments/R_data/rel.dat, quote=, header=FALSE,
sep=, col.names=c(id,orel,nrel))

but wrote?

 rel - read.table(file=C:/Documents and
Settings/sewalem/MyDocuments/R_data/rel.dat, quote=, header=FALSE,
sep=, col.names=c(id,orel,nrel))

/H



 Third, it is true that backslashes need to be escaped.  However,
 *forward-slashes* work with *any
 platform*.  I stick with the latter so I don't have to think about it.
  It should make no difference in your case.


 My $.02

 /Henrik

 On Fri, Sep 25, 2009 at 7:32 AM, Michael A. Miller mmill...@iupui.edu wrote:
 Sometimes it is easiest to open a file using a file selection
 widget.  I keep this in my .Rprofile:

 getOpenFile - function(...){
  require(tcltk)
  return(tclvalue(tkgetOpenFile()))
 }

 With this you can find your file and open it with

  rel - read.table(getOpenFile(), quote=, header=FALSE, sep=, 
 col.names=c(id,orel,nrel))

 or

  filename - getOpenFile()
  rel - read.table(filename, quote=, header=FALSE, sep=, 
 col.names=c(id,orel,nrel))

 Mike


 P.S. I keep a couple functions on hand for choosing writable files
 and directories too...

 getSaveFile - function(...){
  require(tcltk)
  return(tclvalue(tkgetSaveFile()))
 }

 chooseDir - function(...){
  require(tcltk)
  return(tclvalue(tkchooseDirectory()))
 }

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grep or other complex string matching approach to capture necessary information...

2009-09-25 Thread Jason Rupert
Say I have the following data:


house_number-floor(runif(100, 200, 600))
water_evaluation-c(No water damage, Water damage, Water On, Water off, 
water pipes damaged, leaking water)
water_evaluation_selection-floor(runif(100, 1,6))
house_info-data.frame(water_evaluation[water_evaluation_selection],
   house_number) 

And, that I only want to pull out the ones with negative water evaluations, 
i.e. Water damage, water pipes damaged, and leaking water. 

Should/could I use grep in order to pull the house numbers out of house_info 
with those negative water evaluations?  

I guess I want to know the house numbers from house_info where the water 
evaluation is negative.  Is there a way to use grep or another R function in 
order to acquire that information? 

Thank you again in advance for any insights.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to download from github

2009-09-25 Thread Felipe Carrillo
That's strange, my pc is not that slow, it has 3 mb of Ram. The download button 
doesn't respond either using my computer at work or at home. When you click the 
download button, Do you get a dialog box prompting you where to save the files?

--- On Thu, 9/24/09, Charlie Sharpsteen ch...@sharpsteen.net wrote:

 From: Charlie Sharpsteen ch...@sharpsteen.net
 Subject: Re: [R] How to download from github
 To: Felipe Carrillo mazatlanmex...@yahoo.com
 Cc: r-help@r-project.org
 Date: Thursday, September 24, 2009, 9:06 PM
 Hmm, clicking on the 'Download'
 button and then on either the 'TAR' or 'ZIP'
 icons is working fine for me. It might take a while for the
 actual download to start-- GitHub has to compress the files
 which can take a half a minute or more.
 
 Also, GitHub appears to be preparing for a
 move to a new set of servers-- this may cause some
 instability and weirdness in the way the website responds.
 
 
 Good luck!
 -Charlie
 
 On Thu, Sep 24, 2009 at 8:20 PM,
 Felipe Carrillo mazatlanmex...@yahoo.com
 wrote:
 
 Hi:
 
 Is my first attempt to try to download from github. Nothing
 happens by clicking on the 'download' button. Could
 anyone give me a hint on how to get all the files from the
 link below? Thanks
 
 
 
  http://github.com/hadley/ggplot2-bayarea/tree/0a8bf71dea38cfbf2d928eb713d24dfd928359fc
 
 
 
 
 
 
 
 Felipe D. Carrillo
 
 Supervisory Fishery Biologist
 
 Department of the Interior
 
 US Fish  Wildlife Service
 
 California, USA
 
 
 
 __
 
 R-help@r-project.org
 mailing list
 
 https://stat.ethz.ch/mailman/listinfo/r-help
 
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 
 and provide commented, minimal, self-contained,
 reproducible code.
 
 
 
 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep or other complex string matching approach to capture necessary information...

2009-09-25 Thread Tony Plate

You could use grep, but it's probably easier to use %in% (see also 
is.element()), e.g.:


house_info[ house_info[,1] %in% c(Water damage, water pipes damaged, leaking 
water), ]

  water_evaluation.water_evaluation_selection. house_number
6   water pipes damaged  489
8   water pipes damaged  512
11  water pipes damaged  597
19 Water damage  478
21  water pipes damaged  373
23 Water damage  465


house_info[ house_info[,1] %in% c(Water damage, water pipes damaged, leaking 
water), 2]

[1] 489 512 597 478 373 465 337 362 234 535 551 351 415 495 220 216 317 443 346 
577 585 268 463 441 225 200 304 486 390 476 485 247
[33] 399 504 262 551 575 359 538

sort(unique(house_info[ house_info[,1] %in% c(Water damage, water pipes damaged, 
leaking water), 2]))

[1] 200 216 220 225 234 247 262 268 304 317 337 346 351 359 362 373 390 399 415 
441 443 463 465 476 478 485 486 489 495 504 512 535
[33] 538 551 575 577 585 597





Also, an easier way to generated random integers is sample(), e.g.

sample(1:3, size=5, rep=T)

[1] 3 1 2 1 1



(This is more straightforward, and more easily avoids possibly unintended 
errors such as floor(runif(100, 1,6) never generating a 6, but do be careful of 
the gotcha that sample(2:3, ...) will generate a selection of 2's and 3's, 
while sample(3,...) will generate samples from 1, 2, and 3.)

-- Tony Plate

Jason Rupert wrote:

Say I have the following data:


house_number-floor(runif(100, 200, 600))
water_evaluation-c(No water damage, Water damage, Water On, Water off, water pipes 
damaged, leaking water)
water_evaluation_selection-floor(runif(100, 1,6))
house_info-data.frame(water_evaluation[water_evaluation_selection],
   house_number) 

And, that I only want to pull out the ones with negative water evaluations, i.e. Water damage, water pipes damaged, and leaking water. 

Should/could I use grep in order to pull the house numbers out of house_info with those negative water evaluations?  

I guess I want to know the house numbers from house_info where the water evaluation is negative.  Is there a way to use grep or another R function in order to acquire that information? 


Thank you again in advance for any insights.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to download from github

2009-09-25 Thread Henrique Dallazuanna
What is browser that you are using to download?

Try the direct link to download:
http://github.com/hadley/ggplot2-bayarea/zipball/0a8bf71dea38cfbf2d928eb713d24dfd928359fc




On Fri, Sep 25, 2009 at 3:07 PM, Felipe Carrillo
mazatlanmex...@yahoo.com wrote:
 That's strange, my pc is not that slow, it has 3 mb of Ram. The download 
 button doesn't respond either using my computer at work or at home. When you 
 click the download button, Do you get a dialog box prompting you where to 
 save the files?

 --- On Thu, 9/24/09, Charlie Sharpsteen ch...@sharpsteen.net wrote:

 From: Charlie Sharpsteen ch...@sharpsteen.net
 Subject: Re: [R] How to download from github
 To: Felipe Carrillo mazatlanmex...@yahoo.com
 Cc: r-help@r-project.org
 Date: Thursday, September 24, 2009, 9:06 PM
 Hmm, clicking on the 'Download'
 button and then on either the 'TAR' or 'ZIP'
 icons is working fine for me. It might take a while for the
 actual download to start-- GitHub has to compress the files
 which can take a half a minute or more.

 Also, GitHub appears to be preparing for a
 move to a new set of servers-- this may cause some
 instability and weirdness in the way the website responds.


 Good luck!
 -Charlie

 On Thu, Sep 24, 2009 at 8:20 PM,
 Felipe Carrillo mazatlanmex...@yahoo.com
 wrote:

 Hi:

 Is my first attempt to try to download from github. Nothing
 happens by clicking on the 'download' button. Could
 anyone give me a hint on how to get all the files from the
 link below? Thanks



  http://github.com/hadley/ggplot2-bayarea/tree/0a8bf71dea38cfbf2d928eb713d24dfd928359fc







 Felipe D. Carrillo

 Supervisory Fishery Biologist

 Department of the Interior

 US Fish  Wildlife Service

 California, USA



 __

 R-help@r-project.org
 mailing list

 https://stat.ethz.ch/mailman/listinfo/r-help

 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained,
 reproducible code.








 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale.

2009-09-25 Thread Marc Schwartz

Greg and John,

Just to throw it out there, the data sets here are small enough that  
you co do a fully enumerable permutation test by replacing your  
replicate() call with:


  perms - combn(17, 9, function(x) median(sets[x]) - median(sets[-x]))

This is based on an off-list communication that I had with Peter  
Dalgaard about 3 years ago for a different scenario and gives you:


 choose(17, 9)
[1] 24310

permutations.


It does not take long:

 system.time(perms - combn(17, 9, function(x) median(sets[x]) -  
median(sets[-x])))

   user  system elapsed
  3.863   0.019   3.898


Which yields:

 str(perms)
 num [1:24310(1d)] -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...


 table(perms)
perms
  -2 -1.5   -1 -0.50  0.51  1.52
 285  175 2595 7000 8050  875 3720 1425  185


perms - c(orig,perms)

prop.test( sum(perms=orig), length(perms) )
# or
binom.test(sum(perms = orig), length(perms))


# Variation on the graphic...
plot(table(perms), type = h)
abline(v = orig, col = blue)


See ?combn for more information.

HTH,

Marc Schwartz



On Sep 25, 2009, at 11:47 AM, Greg Snow wrote:

Yes, I agree that the median makes the most sense here, but there  
could be other measures of location that would be of interest  
(quartiles, some version of the rank sum).


Here is some sample code for a permutation test on the medians  
(there are a couple of packages that will do this as well, but this  
is pretty straight forward with straight R code):


set1 - c(1,3,2,2,4,3,3,2,2)
set2 - c(4,4,4,3,3,5,4,4)

sets - c(set1,set2)
g1 - seq_along(set1)

orig - median( sets[ -g1 ] ) - median( sets[ g1 ] )

perms - replicate( 1999, { tmp - sample(sets)
median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )

#  or

pb - winProgressBar(max=1999)

setWinProgressBar(pb, 0)

perms - replicate(1999, { setWinProgressBar( pb,  
getWinProgressBar(pb) + 1 )

tmp - sample(sets)
median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )

close(pb)


perms - c(orig,perms)
sum( perms = orig )
mean( perms = orig )
prop.test( sum(perms=orig), length(perms) )
hist(perms)
abline(v=orig, col='blue')

(if you want the progress bar on an os other than windows, then use  
the tcltk package and the tkProgressBar).


Hope this helps,


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: John Sorkin [mailto:jsor...@grecc.umaryland.edu]
Sent: Thursday, September 24, 2009 2:52 PM
To: Greg Snow; r-help@r-project.org
Subject: Re: [R] Non-parametric test for location with two unpaired
setsof data measured on ordinal scale.

Greg,
I used the term location because I did not want to use the terms mean
or median for the exact reason that you gave; these to values can be
different in a given distribution. I want to test the null hypothesis
that the data come from a single distribution. This is often done by
comparing a measure of location (e.g. mean for ANOVOA), but as you  
know

the mean need not be the only measure of location that is tested.
Giiven that my data are measured on an ordinal scale, the mean is
without meaning, so I suspect that the best measure for me would be a
comparison of medians, but I am open to other suggestions.
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)


Greg Snow greg.s...@imail.org 9/24/2009 4:30 PM 

What do you mean by location?  I can think of examples where 2
distributions have the same median but different means, or the same
means but different medians.

Are you willing to assume that the distributions are exactly the same
under the null hypothesis? (not just the same 'center/location')

I would probably do a permutation test on the difference between the
means or medians (which ever you think is more meaningful), this
assumes the exact same distribution under the null.

You can also do a Mann-Whitney/Wilcoxin test (but I don't like
explaining, or sometimes even thinking about, what it is actually
testing), you could do a bootstrap confidence interval on the
difference between means/medians (does not assume distributions are  
the
same, just have same mean/median), or you could just replace all  
values

by their ranks and do a t-test (essentially transforms the data to a
uniform distribution, the CLT for the uniform kicks in around n=5,  
but

I would simulate just to check).

This is not the nice simple answer that you were probably looking  
for,

but hopefully it gives you some things to think about that will help,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of John 

Re: [R] read.delim very slow in reading files with lots of columns

2009-09-25 Thread Benilton Carvalho
it may be worth it writing a script to transpose the data (in awk, it  
takes 10min on my laptop)... then read in the transposed data...



 system.time({x - read.delim(testTransposed.txt, header=F,  
colClasses=numeric, nrow=70); x - t(x)})

   user  system elapsed
  4.958   0.412   5.477

b

On Sep 25, 2009, at 1:35 PM, Ping-Hsun Hsieh wrote:


Thanks, Ben.

The matrix is a pure numeric matrix (6x70, 31mb).
I tried the colClasses='numeric' as well as nrows=7(one of these is  
header line) on the matrix.

Also I tested it with not setting the two options in read.delim()

Here is the time spent on reading the matrix for each test.


system.time( tmp - read.delim(test_data.txt))

usersystem   elapsed
50985.42127.665 51013.384

system.time(tmp -  
read 
.delim(test_data.txt,colClasses=numeric,nrows=7,comment.char=))

usersystem   elapsed
51301.56360.491 51362.208

It seems setting the options does not speed up the reading at all.
Is it because of the header line? I will test it.
Did I misunderstand something?

One additional and interesting observation:
The one with the options does save memory a lot. It took ~150mb,  
while the other took ~4GB for reading the matrix.


I will try the scan() and see if it helps.

Thanks!
Mike


-Original Message-
From: Benilton Carvalho [mailto:bcarv...@jhsph.edu]
Sent: Wednesday, September 23, 2009 4:56 PM
To: Ping-Hsun Hsieh
Cc: r-help@r-project.org
Subject: Re: [R] read.delim very slow in reading files with lots of  
columns


use the 'colClasses' argument and you can also set 'nrows'.

b

On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote:


Hi,



I am trying to read a tab-delimited file into R (Ver. 2.8). The
machine I am using is 64bit Linux with 16 GB.

The file is basically a matrix(~600x70) and as large as 3GB.



The read.delim() ran extremely slow (hours) even with a subset of
the file (31 MB with 6x70)

I monitored the memory usage, and found it constantly only took less
than 1% of 16GB memory.

Does read.delim() have difficulty to read files with lots of columns?

Any suggestions?



Thanks,

Mike




  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to download from github

2009-09-25 Thread Felipe Carrillo

Henrique:
It worked nicely, I am using IE 6.0. Thanks a lot for your help


--- On Fri, 9/25/09, Henrique Dallazuanna www...@gmail.com wrote:

 From: Henrique Dallazuanna www...@gmail.com
 Subject: Re: [R] How to download from github
 To: Felipe Carrillo mazatlanmex...@yahoo.com
 Cc: Charlie Sharpsteen ch...@sharpsteen.net, r-help@r-project.org
 Date: Friday, September 25, 2009, 11:15 AM
 What is browser that you are using to
 download?
 
 Try the direct link to download:
 http://github.com/hadley/ggplot2-bayarea/zipball/0a8bf71dea38cfbf2d928eb713d24dfd928359fc
 
 
 
 
 On Fri, Sep 25, 2009 at 3:07 PM, Felipe Carrillo
 mazatlanmex...@yahoo.com
 wrote:
  That's strange, my pc is not that slow, it has 3 mb of
 Ram. The download button doesn't respond either using my
 computer at work or at home. When you click the download
 button, Do you get a dialog box prompting you where to save
 the files?
 
  --- On Thu, 9/24/09, Charlie Sharpsteen ch...@sharpsteen.net
 wrote:
 
  From: Charlie Sharpsteen ch...@sharpsteen.net
  Subject: Re: [R] How to download from github
  To: Felipe Carrillo mazatlanmex...@yahoo.com
  Cc: r-help@r-project.org
  Date: Thursday, September 24, 2009, 9:06 PM
  Hmm, clicking on the 'Download'
  button and then on either the 'TAR' or 'ZIP'
  icons is working fine for me. It might take a
 while for the
  actual download to start-- GitHub has to compress
 the files
  which can take a half a minute or more.
 
  Also, GitHub appears to be preparing for a
  move to a new set of servers-- this may cause
 some
  instability and weirdness in the way the website
 responds.
 
 
  Good luck!
  -Charlie
 
  On Thu, Sep 24, 2009 at 8:20 PM,
  Felipe Carrillo mazatlanmex...@yahoo.com
  wrote:
 
  Hi:
 
  Is my first attempt to try to download from
 github. Nothing
  happens by clicking on the 'download' button.
 Could
  anyone give me a hint on how to get all the files
 from the
  link below? Thanks
 
 
 
   http://github.com/hadley/ggplot2-bayarea/tree/0a8bf71dea38cfbf2d928eb713d24dfd928359fc
 
 
 
 
 
 
 
  Felipe D. Carrillo
 
  Supervisory Fishery Biologist
 
  Department of the Interior
 
  US Fish  Wildlife Service
 
  California, USA
 
 
 
  __
 
  R-help@r-project.org
  mailing list
 
  https://stat.ethz.ch/mailman/listinfo/r-help
 
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
 
  and provide commented, minimal, self-contained,
  reproducible code.
 
 
 
 
 
 
 
 
  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 
 
 
 
 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O
 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale.

2009-09-25 Thread Greg Snow
Thanks Marc,

The sampling is so easy that I often forget that we can do the exact 
permutation test for smaller samples (and I can never remember when small is 
small enough for this).  With the exact permutations we really don't need to do 
the prop.test or binom.test, I usually do that to get the confidence interval 
on the p-value due to sampling from the permutations rather than doing all 
possible (and this tells me if I need to increase the number of permutations to 
be sure my p-value is precise enough).  With all possible permutations, there 
is no sampling, and no need for an interval, the p-value is exact.

Thanks again, I need to remember combn.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: Marc Schwartz [mailto:marc_schwa...@me.com]
 Sent: Friday, September 25, 2009 12:17 PM
 To: Greg Snow
 Cc: John Sorkin; r-help@r-project.org
 Subject: Re: [R] Non-parametric test for location with two unpaired
 sets of data measured on ordinal scale.
 
 Greg and John,
 
 Just to throw it out there, the data sets here are small enough that
 you co do a fully enumerable permutation test by replacing your
 replicate() call with:
 
perms - combn(17, 9, function(x) median(sets[x]) - median(sets[-
 x]))
 
 This is based on an off-list communication that I had with Peter
 Dalgaard about 3 years ago for a different scenario and gives you:
 
   choose(17, 9)
 [1] 24310
 
 permutations.
 
 
 It does not take long:
 
   system.time(perms - combn(17, 9, function(x) median(sets[x]) -
 median(sets[-x])))
 user  system elapsed
3.863   0.019   3.898
 
 
 Which yields:
 
   str(perms)
   num [1:24310(1d)] -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 
 
   table(perms)
 perms
-2 -1.5   -1 -0.50  0.51  1.52
   285  175 2595 7000 8050  875 3720 1425  185
 
 
 perms - c(orig,perms)
 
 prop.test( sum(perms=orig), length(perms) )
 # or
 binom.test(sum(perms = orig), length(perms))
 
 
 # Variation on the graphic...
 plot(table(perms), type = h)
 abline(v = orig, col = blue)
 
 
 See ?combn for more information.
 
 HTH,
 
 Marc Schwartz
 
 
 
 On Sep 25, 2009, at 11:47 AM, Greg Snow wrote:
 
  Yes, I agree that the median makes the most sense here, but there
  could be other measures of location that would be of interest
  (quartiles, some version of the rank sum).
 
  Here is some sample code for a permutation test on the medians
  (there are a couple of packages that will do this as well, but this
  is pretty straight forward with straight R code):
 
  set1 - c(1,3,2,2,4,3,3,2,2)
  set2 - c(4,4,4,3,3,5,4,4)
 
  sets - c(set1,set2)
  g1 - seq_along(set1)
 
  orig - median( sets[ -g1 ] ) - median( sets[ g1 ] )
 
  perms - replicate( 1999, { tmp - sample(sets)
  median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )
 
  #  or
 
  pb - winProgressBar(max=1999)
 
  setWinProgressBar(pb, 0)
 
  perms - replicate(1999, { setWinProgressBar( pb,
  getWinProgressBar(pb) + 1 )
  tmp - sample(sets)
  median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } )
 
  close(pb)
 
 
  perms - c(orig,perms)
  sum( perms = orig )
  mean( perms = orig )
  prop.test( sum(perms=orig), length(perms) )
  hist(perms)
  abline(v=orig, col='blue')
 
  (if you want the progress bar on an os other than windows, then use
  the tcltk package and the tkProgressBar).
 
  Hope this helps,
 
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
 
  -Original Message-
  From: John Sorkin [mailto:jsor...@grecc.umaryland.edu]
  Sent: Thursday, September 24, 2009 2:52 PM
  To: Greg Snow; r-help@r-project.org
  Subject: Re: [R] Non-parametric test for location with two unpaired
  setsof data measured on ordinal scale.
 
  Greg,
  I used the term location because I did not want to use the terms
 mean
  or median for the exact reason that you gave; these to values can be
  different in a given distribution. I want to test the null
 hypothesis
  that the data come from a single distribution. This is often done by
  comparing a measure of location (e.g. mean for ANOVOA), but as you
  know
  the mean need not be the only measure of location that is tested.
  Giiven that my data are measured on an ordinal scale, the mean is
  without meaning, so I suspect that the best measure for me would be
 a
  comparison of medians, but I am open to other suggestions.
  John
 
  John David Sorkin M.D., Ph.D.
  Chief, Biostatistics and Informatics
  University of Maryland School of Medicine Division of Gerontology
  Baltimore VA Medical Center
  10 North Greene Street
  GRECC (BT/18/GR)
  Baltimore, MD 21201-1524
  (Phone) 410-605-7119
  (Fax) 410-605-7913 (Please call phone number above prior to faxing)
 
  Greg Snow greg.s...@imail.org 9/24/2009 4:30 PM 
  What do you mean by location?  I can think of examples where 2
  distributions have the same median but different means, or the same
  

Re: [R] read.delim very slow in reading files with lots of columns

2009-09-25 Thread jim holtman
Here is how much time it took to read a file with 10 lines and 700,000
columns per line separated with comma:

 system.time(input - scan(/tempxx.txt, what=0, sep=','))
Read 700 items
   user  system elapsed
  15.620.22   15.84
 object.size(input)
5624 bytes


'scan' should be sufficient and it will not take another 10 minutes in awk.

On Fri, Sep 25, 2009 at 1:17 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote:
 On Fri, 25 Sep 2009, Ping-Hsun Hsieh wrote:

 Thanks, Ben.

 The matrix is a pure numeric matrix (6x70, 31mb).
 I tried the colClasses='numeric' as well as nrows=7(one of these is header
 line) on the matrix.
 Also I tested it with not setting the two options in read.delim()


 A couple of things come to mind.

 First, I have not read the internals of scan, but suspect that parsing a
 really long line may be slowing things down.

 Since you are attempting to read in a numeric matrix, you can simply do a
 global replacement of your delimiter with a newline and use scan on the
 result. On unix-like systems, something like

        tmp - scan( pipe( 'tr \t \n   test_data.txt' ) )

 ought to help.

 Second, the memory occupied by each line - once it has been processed - is
 spread over the full 32MB (or 3.2 GB for the 600 by 70 version) region
 of memory. I am guessing that this is causing your cache to work hard to put
 it in place.

 If you really want the result to be a 600 by 70 matrix, you might try to
 read it in smaller blocks using scan( pipe( cut ...  ) ) to feed selected
 blocks of columns of your text file to R.

 HTH,

 Chuck



 Here is the time spent on reading the matrix for each test.

 system.time( tmp - read.delim(test_data.txt))

    user    system   elapsed
 50985.421    27.665 51013.384

 system.time(tmp -
 read.delim(test_data.txt,colClasses=numeric,nrows=7,comment.char=))

    user    system   elapsed
 51301.563    60.491 51362.208

 It seems setting the options does not speed up the reading at all.
 Is it because of the header line? I will test it.
 Did I misunderstand something?

 One additional and interesting observation:
 The one with the options does save memory a lot. It took ~150mb, while the
 other took ~4GB for reading the matrix.

 I will try the scan() and see if it helps.

 Thanks!
 Mike


 -Original Message-
 From: Benilton Carvalho [mailto:bcarv...@jhsph.edu]
 Sent: Wednesday, September 23, 2009 4:56 PM
 To: Ping-Hsun Hsieh
 Cc: r-help@r-project.org
 Subject: Re: [R] read.delim very slow in reading files with lots of
 columns

 use the 'colClasses' argument and you can also set 'nrows'.

 b

 On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote:

 Hi,



 I am trying to read a tab-delimited file into R (Ver. 2.8). The
 machine I am using is 64bit Linux with 16 GB.

 The file is basically a matrix(~600x70) and as large as 3GB.



 The read.delim() ran extremely slow (hours) even with a subset of
 the file (31 MB with 6x70)

 I monitored the memory usage, and found it constantly only took less
 than 1% of 16GB memory.

 Does read.delim() have difficulty to read files with lots of columns?

 Any suggestions?



 Thanks,

 Mike




       [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive
 Medicine
 E mailto:cbe...@tajo.ucsd.edu               UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] On what (shared) machine will R perform faster?

2009-09-25 Thread Dimitri Liakhovitski
Dear R-ers,

need your advice on hardware (beware - I am not knowledgeable about that).

I find R runs wonderfully on laptops. In my company, we decided to get
some kind of a powerful computer (server?) so that we could run big
jobs on it (e.g., in R, SAS, SPSS, Excel). We were thinking of
something more powerful than a simple additional laptop or desktop,
but something with superior computing power - something everyone could
log into remotely, and something that several people could work on
simultaneously.

Our IT has given us what they call a virtual server, but it runs
more slowly than my laptop - I checked!

Any advice on what we should be looking for? What should that be? What
technical characteristics should it have?
And should we then just install the regular R on it or something else?



Thank you very much for your advice!


-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.delim very slow in reading files with lots of columns

2009-09-25 Thread Benilton Carvalho

or that! :-D thanks jim.
b

On Sep 25, 2009, at 3:57 PM, jim holtman wrote:


Here is how much time it took to read a file with 10 lines and 700,000
columns per line separated with comma:


system.time(input - scan(/tempxx.txt, what=0, sep=','))

Read 700 items
  user  system elapsed
 15.620.22   15.84

object.size(input)

5624 bytes




'scan' should be sufficient and it will not take another 10 minutes  
in awk.


On Fri, Sep 25, 2009 at 1:17 PM, Charles C. Berry cbe...@tajo.ucsd.edu 
 wrote:

On Fri, 25 Sep 2009, Ping-Hsun Hsieh wrote:


Thanks, Ben.

The matrix is a pure numeric matrix (6x70, 31mb).
I tried the colClasses='numeric' as well as nrows=7(one of these  
is header

line) on the matrix.
Also I tested it with not setting the two options in read.delim()



A couple of things come to mind.

First, I have not read the internals of scan, but suspect that  
parsing a

really long line may be slowing things down.

Since you are attempting to read in a numeric matrix, you can  
simply do a
global replacement of your delimiter with a newline and use scan on  
the

result. On unix-like systems, something like

  tmp - scan( pipe( 'tr \t \n   test_data.txt' ) )

ought to help.

Second, the memory occupied by each line - once it has been  
processed - is
spread over the full 32MB (or 3.2 GB for the 600 by 70 version)  
region
of memory. I am guessing that this is causing your cache to work  
hard to put

it in place.

If you really want the result to be a 600 by 70 matrix, you  
might try to
read it in smaller blocks using scan( pipe( cut ...  ) ) to feed  
selected

blocks of columns of your text file to R.

HTH,

Chuck




Here is the time spent on reading the matrix for each test.


system.time( tmp - read.delim(test_data.txt))


  usersystem   elapsed
50985.42127.665 51013.384


system.time(tmp -
read 
.delim 
(test_data.txt,colClasses=numeric,nrows=7,comment.char=))


  usersystem   elapsed
51301.56360.491 51362.208

It seems setting the options does not speed up the reading at all.
Is it because of the header line? I will test it.
Did I misunderstand something?

One additional and interesting observation:
The one with the options does save memory a lot. It took ~150mb,  
while the

other took ~4GB for reading the matrix.

I will try the scan() and see if it helps.

Thanks!
Mike


-Original Message-
From: Benilton Carvalho [mailto:bcarv...@jhsph.edu]
Sent: Wednesday, September 23, 2009 4:56 PM
To: Ping-Hsun Hsieh
Cc: r-help@r-project.org
Subject: Re: [R] read.delim very slow in reading files with lots of
columns

use the 'colClasses' argument and you can also set 'nrows'.

b

On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote:


Hi,



I am trying to read a tab-delimited file into R (Ver. 2.8). The
machine I am using is 64bit Linux with 16 GB.

The file is basically a matrix(~600x70) and as large as 3GB.



The read.delim() ran extremely slow (hours) even with a subset of
the file (31 MB with 6x70)

I monitored the memory usage, and found it constantly only took  
less

than 1% of 16GB memory.

Does read.delim() have difficulty to read files with lots of  
columns?


Any suggestions?



Thanks,

Mike




 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
  Dept of Family/Preventive
Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego  
92093-0901


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible 

[R] How to open only one file in a .gz file?

2009-09-25 Thread Peng Yu
Hi,

Suppose that there are multiple files in a .gz file. How to open only
one file in it? I don't find such options in the help.

Regards,
Peng

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Function question

2009-09-25 Thread njhuang86

Hi. I was wondering how I can write a function that generates the outcome
values for a user specified equation. For example, function(x^2, 4) will
return back 16 and function(x^3 - 10, 2) will give back -2...

I've been playing around with various lines of code but somehow, I just
cannot get R to recognize the equation that I pass to it is just a random
variable and doesn't need to be initialized...
-- 
View this message in context: 
http://www.nabble.com/Function-question-tp25619434p25619434.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function question

2009-09-25 Thread Henrique Dallazuanna
Try this:

foo - function(expr, x){
eval(substitute(expr))
}

foo(x^2, 4)
foo(x^3-10, 2)

On Fri, Sep 25, 2009 at 6:16 PM, njhuang86 njhuan...@yahoo.com wrote:

 Hi. I was wondering how I can write a function that generates the outcome
 values for a user specified equation. For example, function(x^2, 4) will
 return back 16 and function(x^3 - 10, 2) will give back -2...

 I've been playing around with various lines of code but somehow, I just
 cannot get R to recognize the equation that I pass to it is just a random
 variable and doesn't need to be initialized...
 --
 View this message in context: 
 http://www.nabble.com/Function-question-tp25619434p25619434.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NLM

2009-09-25 Thread Andrew Wang
I am trying to understand NLM package, so I generated this data set consisting 
y and x using 

y= a + b*x +c*x^2 + N(0,10), with a=3.5,b=4.5,c=5.5


Given y and x, I am trying to use NLM to have estimates of parameters a, b and 
c that minimize the least square error  

my code looks like
 
f- function(y,x,a,b,c) {sum((y-(a+b*x+c*x^2))^2)}
nlm(f, y, x, a=3, b=4,c=5)


But it comes up with rediculous result. My understanding of this NLM must be 
wrong.

Please help!

Andy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in make.names when trying to read.table in if statement

2009-09-25 Thread Juliet Hannah
Does this work for you?

data_list - list()
filepattern=modrate*
all_files - list.files(pattern=filepattern)
data_list - lapply(all_files, read.table,header=TRUE,sep=,)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in make.names when trying to read.table in if statement

2009-09-25 Thread David Winsemius


On Sep 21, 2009, at 12:19 PM, Cynthia Sadler wrote:


Hi,

I'm trying to read data from a collection of CSV files for  
processing and graphing. All of my files begin with modrate and  
end with .csv. I think I have the regex working but I am stumped  
at trying to get read.table to work within an if statement.


This works:

 filepattern=modrate*
 files - list.files(pattern=filepattern)
 data - read.table(files[1], header=TRUE, sep=,)

But I cannot get this to work:

 for (i in seq(along=files)) {
+ data - read.table(files[i], header=TRUE, sep=,)
+ }
Error in make.names(col.names, unique = TRUE) :
 invalid multibyte string at 'ffd8ffe0'


The error message suggests that you might have strange characters in  
your header lines which make.names()  is choking on. What is the  
result of substituting readLines with parameter of n=1 for the  
read.table() call on those files? Or perhaps you should be using [[  
rather than [ to access the files list?





I'm sure I'm making a newbie mistake here.


Despite several years, I still consider myself a newbie.

I'm using R version 2.9.2 (2009-08-24) on Mac OS X, and didn't find  
anything about this in the help archives (though, as I'm new to  
this, I may not have searched in the best way). Any advice? Thanks.




David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with multi-objective program using lpSolve API

2009-09-25 Thread becky_vcu

Hello, I am struggling with R and have little experience. I need help or
suggestions to create a multi-objective program. I have a table as follows
http://www.nabble.com/file/p25615459/smallmodel01.xls smallmodel01.xls 

My constraints are subject to each origin, their corresponding numbers and
the specific number for that origin constraint. This creates my single
objective model just fine. However, I need to hold these constraints true,
in addition to subjecting the destinations to a specific constraint as well.
The destination constraint is to ensure that all transportation to each
destination is filled. As in, I need to make this decision binary saying
that all transportation to that specific destination is being used or not
used. 

Thank you,
Becky
-- 
View this message in context: 
http://www.nabble.com/help-with-multi-objective-program-using-lpSolve-API-tp25615459p25615459.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Starting values in “arima.sim” fu nction

2009-09-25 Thread Lina Rusyte
Hello,
 
Could someone tell me please how can I find out which starting values has R 
used for the simulation? 
 
I have AR(1) model:
 
y(t)=0.2*y(t-1)+0.2*y(t-2) + e(t)  
 
(e(t) is distributed according standard normal distribution)
 
I need y(0) (or y(t-1), then t=1) values for my following calculations (it is 
very important parameter). 
Should I assume that y(0)=mean(yt) or set y(0)=0? 
 
How to find out, which values R uses for y(0), y(-1) and so on?
 
Thank you very much for the answer!
 
Best regards,
Lina Rusyte


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] evaluate a set of symbols within an IF statement

2009-09-25 Thread zubin
Hello, writing some R code to cleanse a data set, if the following set 
of symbols are identified then perform some actions.  trying to write 
the minimum code to do this. 



tname = VIX
checkticker = c(VIX, TYX, TNX, IRX)

   if (tname == checkticker) {
   //perform some operations
   }

result i get is

 tname == checkticker
[1]  TRUE FALSE FALSE FALSE

how do i evaluate this whole list to a single boolean True or False?  If 
any of these are true the whole statement is True, else False.   this 
only seems to work for the first ticker, the rest don't perform the 
operations within the loop.



 tname = IRX
 tname == checkticker
[1] FALSE FALSE FALSE  TRUE

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] evaluate a set of symbols within an IF statement

2009-09-25 Thread cls59


zubin-2 wrote:
 
 
 how do i evaluate this whole list to a single boolean True or False?  If 
 any of these are true the whole statement is True, else False.   this 
 only seems to work for the first ticker, the rest don't perform the 
 operations within the loop.
 
 

Try %in%

tname %in% checkticker

-Charlie

-
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://www.nabble.com/evaluate-a-set-of-symbols-within-an-IF-statement-tp25620871p25620900.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale.

2009-09-25 Thread Peter Ehlers


Greg and Marc,

Not that it's needed here but, of course, perm.test() in
pkg:exactRankTests or oneway_test() in pkg:coin can be
used.

Using Marc's / Greg's computations, the (two-sided) p-value
is

 sum(abs(perms) = abs(orig)) / length(perms)
 [1] 0.01937395

perm.test() and oneway_test give a p-value of 0.003249691,
using the exact option.

Question:
Why the difference?

Answer:
perm.test() uses the *mean* difference instead of the
median difference. (Easy to check: just replace 'median'
with 'mean' in Marc's computation of perms.)

As Greg correctly points out, different test statistics
can sensibly be used. But which statistic, mean difference
or median difference, is more appropriate for the given data?

Assumptions:

1. the null hypothesis is that the two sets of observations
   represent random samples from the same distribution;

2. the range of the distribution consists of a small set
   of integers.

Fire away!

Peter Ehlers


Greg Snow wrote:

Thanks Marc,

The sampling is so easy that I often forget that we can do the exact 
permutation test for smaller samples (and I can never remember when small is 
small enough for this).  With the exact permutations we really don't need to do 
the prop.test or binom.test, I usually do that to get the confidence interval 
on the p-value due to sampling from the permutations rather than doing all 
possible (and this tells me if I need to increase the number of permutations to 
be sure my p-value is precise enough).  With all possible permutations, there 
is no sampling, and no need for an interval, the p-value is exact.

Thanks again, I need to remember combn.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Design Package - Penalized Logistic Reg. - Query

2009-09-25 Thread Lars Bishop
Dear R experts,

The lrm function in the Design package can perform penalized (Ridge)
logistic regression. It is my understanding that the ridge solutions are not
equivalent under scaling of the inputs, so one normally standardizes the
inputs. Do you know if input standardization is done internally in lrm or I
would have to do it prior to applying this function.

Also, as I'm new in R (coming from SAS) I don't know how well R will handle
relatively large data sets (e.g. 1/2 million observations on 40 variables).

I'll appreciate your comments.

Many thanks in advance.

Lars/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fourier Transfrom (FFT) Example

2009-09-25 Thread delic

LOL Rolf. Yes I am sure it isn't homework. I am working on an aeroacoustics
problem and was trying to figure out how to implement a fourier transform in
R. I normally don't work in this field so this stuff was new to me at the
time of writing. I have since figured it out.

Unfortunately I don't have my actual code where I am now but here is an
older version, it might have some bugs in it since I never verified this
version. Anyway I hope it helps someone, even if it's your homework!
Apparently some don't realize that there are different ways of learning,
learning by example being one of those ways.



func-function(x)
{
   mag2-mag^2
   f-f
approx(f,mag2,x)$y
}


layout(matrix(c(1,2,3,4), 4, 1, byrow = TRUE))
#SETUP
   T- 5. #time 0 - T 
   dt   - 0.01 #s
   n- T/dt
   F- 1/dt # freq domain -F/2 - F/2
   df   - 1/T
   t- seq(0,T,by=dt)  
   freq - 5 #Hz

#SIGNAL FUNCTION
   y - 10*sin(2*pi*freq*t) 

#FREQ ARRAY
   f - 1:length(t)/T 

#FOURIER WORK
   Y - fft(y)
   mag   - sqrt(Re(Y)^2+Im(Y)^2)*2/n #Amplitude
   phase - Arg(Y)*180/pi 
   Yr- Re(Y)
   Yi- Im(Y)

#PLOT SIGNALS
   plot(t,y,type=l,xlim=c(0,T)) 
   grid(NULL,NULL, col = lightgray, lty = dotted,lwd = 1)
   par(mar=c(5, 4, 0, 2) + 0.1)
   
   plot(f[1:length(f)/2],phase[1:length(f)/2],type=l,xlab=Frequency,
Hz,ylab=Phase,deg)
   grid(NULL,NULL, col = lightgray, lty = dotted,lwd = 1)
   
   plot(f[1:length(f)/2],mag[1:length(f)/2],type=l,xlab=Frequency,
Hz,ylab=Amplitude)
   grid(NULL,NULL, col = lightgray, lty = dotted,lwd = 1) 
   
   plot(f[1:length(f)/2],(mag^2)[1:length(f)/2],type=l,xlab=Frequency,
Hz,ylab=Power, Amp^2,log=xy,ylim=c(10^-6,100))

pref-20E-6 #pa
p-integrate(func,f[1],f[length(f)/2])
   pwrDB-10*log10(p$value/pref^2)
cat(Area under power curve: ,p$value,Pa ,pwrDB, dB\n)









Rolf Turner-3 wrote:
 
 
 On 17/09/2009, at 3:39 AM, delic wrote:
 

 I wrote a script that I anticipating seeing a spike at  10Hz with the
 function 10* sin(2*pi*10*t).
 I can't figure out why my plots  do not show spikes at the  
 frequencies I
 expect. Am I doing something wrong or is my expectations wrong?
 
 (a) Is this a homework question?
 
 (b) Have you figured it out yet?
 
 (c) Hint:  You have spikes at +/- 40 in a range from -50 to 50. You  
 *want* spikes
 at 10 and 90 Hz. Could it be that you haven't set your frequency  
 vector ``f''
 quite right? :-)
 
   cheers,
 
   Rolf Turner
 
 P. S. You won't get spikes bang on at 10 and 90 Hz. because these are  
 *not*
 Fourier frequencies when n = 256.  If you want spikes in your  
 periodogram
 at bang on 10 and 90 Hz use a value of n that is divisible by 10,  
 e.g. n=500.
 Why would you want a power of 2 anyhow?  (Well, the fft goes faster  
 when n
 is a power of 2, but who cares?)
 
   R. T.
 
 ##
 Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Fourier-Transform-fft-help-tp25475063p25621211.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] renaming intercept column when retrieving coeficients from lme using coef function

2009-09-25 Thread Eric McKibben
I am still fairly new to R and have a fairly rudimentary question.  I am
trying to name a vector of coefficients retrieved from a multilevel model
using the coef function.  I guess the default name is Intercept and I
cannot figure out how to rename it.  

 

I have tried the using the code below to name the column of coefficients
ind.y derived from an lme model.  Unfortunately, the name ind.y is not
applied to the column.  What can I do to name the column?

 

toy-data.frame(ID=c(1,1,1,2,2,2,3,3,3,4,4,4), x=rnorm(12), y=rnorm(12))

 model.toy-lme(y~1, random=~1|ID, data=toy)

 coef.y-(ind.y=coef(model.toy))

 coef.y

  (Intercept)

1  0.52065015

2  0.04066776

3  0.29793571

4  0.11213693

 

Thanks,

 

Eric McKibben

Doctoral Candidate 

I-O Psychology

Clemson University


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] renaming intercept column when retrieving coeficients from lme using coef function

2009-09-25 Thread jim holtman
Is this what you want:

 coef.y
  (Intercept)
1  0.03109602
2  0.03109602
3  0.03109603
4  0.03109602
 str(coef.y)
Classes ‘coef.lme’, ‘ranef.lme’ and 'data.frame':   4 obs. of  1 variable:
 $ (Intercept): num  0.0311 0.0311 0.0311 0.0311
 - attr(*, level)= int 1
 - attr(*, label)= chr Coefficients
 - attr(*, effectNames)= chr (Intercept)
 - attr(*, standardized)= logi FALSE
 - attr(*, grpNames)= chr ID
 names(coef.y) - 'myName'
 coef.y
  myName
1 0.03109602
2 0.03109602
3 0.03109603
4 0.03109602



On Fri, Sep 25, 2009 at 9:10 PM, Eric McKibben emck...@clemson.edu wrote:
 I am still fairly new to R and have a fairly rudimentary question.  I am
 trying to name a vector of coefficients retrieved from a multilevel model
 using the coef function.  I guess the default name is Intercept and I
 cannot figure out how to rename it.



 I have tried the using the code below to name the column of coefficients
 ind.y derived from an lme model.  Unfortunately, the name ind.y is not
 applied to the column.  What can I do to name the column?



 toy-data.frame(ID=c(1,1,1,2,2,2,3,3,3,4,4,4), x=rnorm(12), y=rnorm(12))

 model.toy-lme(y~1, random=~1|ID, data=toy)

 coef.y-(ind.y=coef(model.toy))

 coef.y

  (Intercept)

 1  0.52065015

 2  0.04066776

 3  0.29793571

 4  0.11213693



 Thanks,



 Eric McKibben

 Doctoral Candidate

 I-O Psychology

 Clemson University


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Design Package - Penalized Logistic Reg. - Query

2009-09-25 Thread David Winsemius


On Sep 25, 2009, at 8:33 PM, Lars Bishop wrote:


Dear R experts,

The lrm function in the Design package can perform penalized (Ridge)
logistic regression. It is my understanding that the ridge solutions  
are not
equivalent under scaling of the inputs, so one normally standardizes  
the
inputs. Do you know if input standardization is done internally in  
lrm or I

would have to do it prior to applying this function.

Also, as I'm new in R (coming from SAS) I don't know how well R will  
handle
relatively large data sets (e.g. 1/2 million observations on 40  
variables).


I don't have the answer to your first question but I routinely work  
with a dataset that is several times that large using the Design (and  
now) the rms packages. (You do need to have sufficient physical  
memory, but it is not R that is the limiting factor.)


--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] QQ plotting of various distributions...

2009-09-25 Thread Sunil Suchindran
#same shape

some_data - rgamma(500,shape=6,scale=2)
test_data - rgamma(500,shape=6,scale=2)
plot(sort(some_data),sort(test_data))
# You can also use qqplot(some_data,test_data)
abline(0,1)

# different shape

some_data - rgamma(500,shape=6,scale=2)
test_data - rgamma(500,shape=4,scale=2)
plot(sort(some_data),sort(test_data))
abline(0,1)

It is helpful to assess the sampling variability, by
creating repeated sets of test_data, and plotting
all of these along with your observations to create
a confidence envelope.

The SuppDists provides Inverse Gauss.


On Thu, Sep 17, 2009 at 11:46 AM, Petar Milin pmi...@ff.uns.ac.rs wrote:

 Hello!
 I am trying with this question again:
 I would like to test few distributional assumptions for some behavioral
 response data. There are few theories about true distribution of those data,
 like: normal, lognormal, gamma, ex-Gaussian (exponential-Gaussian), Wald
 (inverse Gaussian) etc. The best way would be via qq-plot, to show to
 students differences. First two are trivial:
 qqnorm(dat$X)
 qqnorm(log(dat$X))
 Then, things are getting more hairy. I am not sure how to make plots for
 the rest. I tried gamma with:
 qqmath(~ X, data=dat, distribution=function(X)
   qgamma(X, shape, scale))
 Which should be the same as:
 plot(qgamma(ppoints(dat$X), shape, scale), sort(dat$X))
 Shape and scale parameters I got via mhsmm package that has gammafit() for
 shape and scale parameters estimation.
 Am I on right track? Does anyone know how to plot the rest: ex-Gaussian
 (exponential-Gaussian), Wald (inverse Gaussian)?

 Thanks,
 PM

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Design Package - Penalized Logistic Reg. - Query

2009-09-25 Thread Frank E Harrell Jr

Lars Bishop wrote:

Dear R experts,

The lrm function in the Design package can perform penalized (Ridge)
logistic regression. It is my understanding that the ridge solutions are not
equivalent under scaling of the inputs, so one normally standardizes the
inputs. Do you know if input standardization is done internally in lrm or I
would have to do it prior to applying this function.


It's done internally, as buried in the documentation somewhere. 
Actually lrm puts the scaling factors (standard deviations for 
continuous variables) into the penalty matrix.


Frank



Also, as I'm new in R (coming from SAS) I don't know how well R will handle
relatively large data sets (e.g. 1/2 million observations on 40 variables).

I'll appreciate your comments.

Many thanks in advance.

Lars/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] basic cubic spline smoothing

2009-09-25 Thread spencerg
 The best reference I know for this is something I wrote with Jim 
Ramsay and Giles Hooker:  Functional Data Analysis with R and Matlab 
(Springer, 2009).  Others may have better material. 



 After install.packages('fda'), I suggest you try 
system.file('scripts', package='fda'), as suggested in the Preface.  
This will point you the a subdirectory of your local installation of 
fda that contains files with names like fdarm-ch01.R, 
fdarm-ch02.R, ..., fdarm-ch11.R.  You will likely be most interested 
Figure 9.4, sections 9.4.2 and 9.4.3, script fdarm-ch09.R.  The script 
by itself may answer your question.  If not, you may wish to consult the 
book. 



 Hope this helps. 
 Spencer Graves



hm567 wrote:

hm567 wrote:
  

I am unsure about spar being the smoothness parameter, about where to put
the standard errors of the points, and about the return of the
smooth.spline function:



  
Smoothing Parameter  spar= 0.5  lambda= 0.006833112 



  

best regards,



  
Basically, the implementation based on the attached paper, for a standard

error of points =1.0,
the smoothing is too insensitive to the lambda smoothness parameter.
From 1 to almost 0.01, there is almost no smoothing... Only from 0.01 to 0
does one start to see smoothing in action with the limit at 0 being a
straight line.
Note that this implementation's parameter is (1 - parameter)

With R smooth.spline, 'spar' reflects well the smoothness in that:
. at 0%, the spline interpolates
. at 40% already, its shape is very different from the 0% one  ( for my
implementation, they are still same )
. at 90% it is almost a straight line
. at 100% it is definitely a straight line

This is the behavior that I wish to have.
It seems I need to change my lambda with some transformation that is similar
to the one in the doc of smooth.spline   (spar to lambda). Perhaps the
reverse one. But I can't see how to do it.

The other question is the standard errors. What do they correspond to in the
doc of smooth.spline?

Regards,
  



--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Downloading data from from internet

2009-09-25 Thread Duncan Temple Lang


Bogaso wrote:
 Thank you so much for those helps. However I need little more help. In the
 site
 http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php;
 if I scroll below then there is an option Historical CPI Index For USA
 Next if I click on Get Data then another table pops-up, however without
 any significant change in address bar. This tables holds more data starting
 from 1999. Can you please help me how to get the values of this table?
 


Hi again

Well, this is a little bit more involved, as this is an HTML form
and so we need to be able to emulate submitting a form with
values for the different parameters the form expects, along with
ensuring they are correct inputs.  Ordinarily, this would involve
looking at the source of the HTML document, finding the relevant
form element, getting its action attribute, and all its inputs
and figuring out the possible inputs.  This is straightforward
but involved. But we have an R package that does this reasonably
well in an automated form. This is the RHTMLForms from the
www.omegahat.org/R repository.

We can use this with
 install.packages(RHTMLForms, repos = http://www.omegahat.org/R;)

Then

library(RHTMLForms)

ff = 
getHTMLFormDescription(http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php;)

# The form we want is the third one. We can determine this
# from the names of the parameters.
# So we request that this form description be turned into an R function

g = createFunction(ff[[3]])

  # Now we call this.
xx = g(2001, 2008)


  # This returns the content of an HTML document
  # so we parse it and then pass this to readHTMLTable()
  # This is why we have methods for

library(XML)
doc = htmlParse(xx, asText = TRUE)
tbls = readHTMLTable(doc)

  # we want the last of the tables.
tbls[[length(tbls)]]


So hopefully that helps solve your problem and introduces another Omegahat 
package that
we hope people find through Google. The RHTMLForms package is an approach to the
poor-man's Web services - HTML forms- rather than REST and SOAP that are 
becoming more relevant
each day.  The RCurl and SSOAP address the latter.

  D.





 Thanks
 
 
 Duncan Temple Lang wrote:

 Thanks for explaining this, Charlie.

 Just for completeness and to make things a little easier,
 the XML package has a function named readHTMLTable()
 and you can call it with a URL and it will attempt
 to read all the tables in the page.

  tbls =
 readHTMLTable('http://www.rateinflation.com/consumer-price-index/usa-cpi.php')

 yields a list with 10 elements, and the table of interest with the data is
 the 10th one.

  tbls[[10]]

 The function does the XPath voodoo and sapply() work for you and uses some
 heuristics.
 There are various controls one can specify and also various methods for
 working
 with sub-parts of the HTML document directly.

   D.



 cls59 wrote:

 Bogaso wrote:
 Hi all,

 I want to download data from those two different sources, directly into
 R
 :

 http://www.rateinflation.com/consumer-price-index/usa-cpi.php
 http://eaindustry.nic.in/asp2/list_d.asp

 First one is CPI of US and 2nd one is WPI of India. Can anyone please
 give
 any clue how to download them directly into R. I want to make them zoo
 object for further analysis.

 Thanks,

 The following site did not load for me:

 http://eaindustry.nic.in/asp2/list_d.asp

 But I was able to extract the table from the US CPI site using Duncan
 Temple
 Lang's XML package:

   library(XML)


 First, download the website into R:

   html.raw - readLines(
 'http://www.rateinflation.com/consumer-price-index/usa-cpi.php' )

 Then, convert to an HTML object using the XML package:

   html.data - htmlTreeParse( html.raw, asText = T, useInternalNodes = T
 )

 A quick scan of the page source in the browser reveals that the table you
 want is encased in a div with a class of dynamicContent-- we will use a
 xpath specification[1] to retrieve all rows in that table:

   table.html - getNodeSet( html.data,
 '//d...@class=dynamicContent]/table/tr' )

 Now, the data values can be extracted from the cells in the rows using a
 little sapply and xpathXpply voodoo:

   table.data - t( sapply( table.html, function( row ){

 row.data -  xpathSApply( row, './td', xmlValue )
 return( row.data)

   }))


 Good luck!

 -Charlie
  
   [1]:  http://www.w3schools.com/XPath/xpath_syntax.asp

 -
 Charlie Sharpsteen
 Undergraduate
 Environmental Resources Engineering
 Humboldt State University
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, 

Re: [R] Downloading data from from internet

2009-09-25 Thread Bogaso

Thanks Duncan for your input. However I could not install the package
RHTMLForms, it is saying as not not available :

 install.packages(RHTMLForms, repos = http://www.omegahat.org/R;) 
Warning in install.packages(RHTMLForms, repos =
http://www.omegahat.org/R;) :
  argument 'lib' is missing: using
'C:\Users\Arrun's\Documents/R/win-library/2.9'
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package ‘RHTMLForms’ is not available

I found this package in net : http://www.omegahat.org/RHTMLForms/ However it
is gz file which I could not use as I am a window user. Can you please
provide me alternate source?

Thanks,



Duncan Temple Lang wrote:
 
 
 
 Bogaso wrote:
 Thank you so much for those helps. However I need little more help. In
 the
 site
 http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php;
 if I scroll below then there is an option Historical CPI Index For USA
 Next if I click on Get Data then another table pops-up, however without
 any significant change in address bar. This tables holds more data
 starting
 from 1999. Can you please help me how to get the values of this table?
 
 
 
 Hi again
 
 Well, this is a little bit more involved, as this is an HTML form
 and so we need to be able to emulate submitting a form with
 values for the different parameters the form expects, along with
 ensuring they are correct inputs.  Ordinarily, this would involve
 looking at the source of the HTML document, finding the relevant
 form element, getting its action attribute, and all its inputs
 and figuring out the possible inputs.  This is straightforward
 but involved. But we have an R package that does this reasonably
 well in an automated form. This is the RHTMLForms from the
 www.omegahat.org/R repository.
 
 We can use this with
  install.packages(RHTMLForms, repos = http://www.omegahat.org/R;)
 
 Then
 
 library(RHTMLForms)
 
 ff =
 getHTMLFormDescription(http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php;)
 
 # The form we want is the third one. We can determine this
 # from the names of the parameters.
 # So we request that this form description be turned into an R function
 
 g = createFunction(ff[[3]])
 
   # Now we call this.
 xx = g(2001, 2008)
 
 
   # This returns the content of an HTML document
   # so we parse it and then pass this to readHTMLTable()
   # This is why we have methods for
 
 library(XML)
 doc = htmlParse(xx, asText = TRUE)
 tbls = readHTMLTable(doc)
 
   # we want the last of the tables.
 tbls[[length(tbls)]]
 
 
 So hopefully that helps solve your problem and introduces another Omegahat
 package that
 we hope people find through Google. The RHTMLForms package is an approach
 to the
 poor-man's Web services - HTML forms- rather than REST and SOAP that are
 becoming more relevant
 each day.  The RCurl and SSOAP address the latter.
 
   D.
 
 
 
 
 
 Thanks
 
 
 Duncan Temple Lang wrote:

 Thanks for explaining this, Charlie.

 Just for completeness and to make things a little easier,
 the XML package has a function named readHTMLTable()
 and you can call it with a URL and it will attempt
 to read all the tables in the page.

  tbls =
 readHTMLTable('http://www.rateinflation.com/consumer-price-index/usa-cpi.php')

 yields a list with 10 elements, and the table of interest with the data
 is
 the 10th one.

  tbls[[10]]

 The function does the XPath voodoo and sapply() work for you and uses
 some
 heuristics.
 There are various controls one can specify and also various methods for
 working
 with sub-parts of the HTML document directly.

   D.



 cls59 wrote:

 Bogaso wrote:
 Hi all,

 I want to download data from those two different sources, directly
 into
 R
 :

 http://www.rateinflation.com/consumer-price-index/usa-cpi.php
 http://eaindustry.nic.in/asp2/list_d.asp

 First one is CPI of US and 2nd one is WPI of India. Can anyone please
 give
 any clue how to download them directly into R. I want to make them zoo
 object for further analysis.

 Thanks,

 The following site did not load for me:

 http://eaindustry.nic.in/asp2/list_d.asp

 But I was able to extract the table from the US CPI site using Duncan
 Temple
 Lang's XML package:

   library(XML)


 First, download the website into R:

   html.raw - readLines(
 'http://www.rateinflation.com/consumer-price-index/usa-cpi.php' )

 Then, convert to an HTML object using the XML package:

   html.data - htmlTreeParse( html.raw, asText = T, useInternalNodes =
 T
 )

 A quick scan of the page source in the browser reveals that the table
 you
 want is encased in a div with a class of dynamicContent-- we will use
 a
 xpath specification[1] to retrieve all rows in that table:

   table.html - getNodeSet( html.data,
 '//d...@class=dynamicContent]/table/tr' )

 Now, the data values can be extracted from the cells in the rows using
 a
 little sapply and xpathXpply voodoo:

   table.data - t( sapply( table.html, function( row ){

 row.data -  xpathSApply( row, './td',