date:20101104

Re: [R] Logical vectors

2010-11-04 Thread Gerrit Eichner


On Wed, 3 Nov 2010, Stephen Liu wrote:

[snip]


2)

x

[1] 1 2 3 4 5

temp - x  1
temp

[1] FALSE  TRUE  TRUE  TRUE  TRUE


Why NOT

temp

[1] TRUE  FALSE  FALSE FALSE  FALSE

?



Maybe because of the definition of  (greater (!) than)? Or do you 
expect 1 to be greater than 1 and not greater than 2, 3, 4, and 5?


 Regards  --  Gerrit

-
AOR Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logical vectors

2010-11-04 Thread Joshua Wiley

On Wed, Nov 3, 2010 at 10:50 PM, Stephen Liu sati...@yahoo.com wrote:
 Hi folks,

 Pls help me to understand follow;

 An Introduction to R

 2.4 Logical vectors
 http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics

 1)
 x
 [1] 1 2 3 4 5

a vector, x, is defined with 5 elements, {1, 2, 3, 4, 5}

 temp - x != 1

perform the logical test that x does not equal 1 returning either TRUE or FALSE.

1 = 1 so TRUE, 2 != 1 so FALSE, etc.  next we assign *the results* of
the logical test to the vector 'temp'

 temp
 [1] FALSE  TRUE  TRUE  TRUE  TRUE

print the vector to screen




 2)
 x
 [1] 1 2 3 4 5

note that x has not changed here, we assigned to temp, not to x.

 temp - x  1

now we assign the results of the logical test, x  1

{1 = 1 so FALSE, 2  1 so TRUE, 3  1 so TRUE, 4  1 so TRUE, 5  1 so TRUE}

we assign these results to a vector, 'temp'.  This *new* assignment
overwrites the old vector 'temp'

 temp
 [1] FALSE  TRUE  TRUE  TRUE  TRUE

print temp to screen, this is the results of our second logical test (x  1).



 Why NOT
 temp
 [1] TRUE  FALSE  FALSE FALSE  FALSE

My best guess of where you got confused is that we assigned the
results to 'temp', so 'x' remained unchanged {1, 2, 3, 4, 5}, or that
you confused '-' which is the assignment operator in R, to less than
negative... *OR*  less than or equal.  We could write this
equivalently:

 1:5  1
[1] FALSE  TRUE  TRUE  TRUE  TRUE

this was the logical test, whose results were assigned to the vector, temp.

 assign(x = temp, value = 1:5  1)

using the assign function (not often recommended) to avoid any
confusion with the assignment operator, -.

 temp
[1] FALSE  TRUE  TRUE  TRUE  TRUE

print to screen

HTH,

Josh


 ?


 TIA

 B.R.
 Stephen L



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem with RODBC installation

2010-11-04 Thread Jørgen Blystad Houge

Good morning,

I have some problems installing RODBC to R in a linux cluster. My R version
is:
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)

I get the following error:
 install.packages('RODBC')
Installing package(s) into
'/home/jorgehou/R/x86_64-unknown-linux-gnu-library/2.12'
(as 'lib' is unspecified)
trying URL 'http://stat.ethz.ch/CRAN/src/contrib/RODBC_1.3-2.tar.gz'
Content type 'application/x-gzip' length 1108358 bytes (1.1 Mb)
opened URL
==
downloaded 1.1 Mb

* installing *source* package 'RODBC' ...
checking for gcc... gcc -std=gnu99
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc -std=gnu99 accepts -g... yes
checking for gcc -std=gnu99 option to accept ANSI C... none needed
checking how to run the C preprocessor... gcc -std=gnu99 -E
checking for egrep... grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking sql.h usability... no
checking sql.h presence... no
checking for sql.h... no
checking sqlext.h usability... no
checking sqlext.h presence... no
checking for sqlext.h... no
configure: error: ODBC headers sql.h and sqlext.h not found
ERROR: configuration failed for package 'RODBC'
* removing '/home/jorgehou/R/x86_64-unknown-linux-gnu-library/2.12/RODBC'

The downloaded packages are in
'/tmp/Rtmpgb1Nxz/downloaded_packages'
Warning message:
In install.packages(RODBC) :
  installation of package 'RODBC' had non-zero exit status

I found some info on it here:
http://r.789695.n4.nabble.com/Problem-installing-RODBC-td2016736.html but
how should I use it???

(Yes, I am very novice to Linux (and R) so it might be a stupid
question)

Thanks!
Jørgen
--
Jørgen Blystad Houge
MSc Student Industrial Economics NTNU, Norway

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with RODBC installation

2010-11-04 Thread Prof Brian Ripley

Please read the RODBC manual (which comes with it).  You (or the 
cluster owner) need to install unixODBC, and if installing from RPMs 
etc, something like unixODBC-devel.


Please also note the R posting guide - no HTML mail, use an 
appropriate list (R-sig-db or R-devel here as this is about non-R 
programming).


On Thu, 4 Nov 2010, Jørgen Blystad Houge wrote:


Good morning,

I have some problems installing RODBC to R in a linux cluster. My R version
is:
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)

I get the following error:

install.packages('RODBC')

Installing package(s) into
'/home/jorgehou/R/x86_64-unknown-linux-gnu-library/2.12'
(as 'lib' is unspecified)
trying URL 'http://stat.ethz.ch/CRAN/src/contrib/RODBC_1.3-2.tar.gz'
Content type 'application/x-gzip' length 1108358 bytes (1.1 Mb)
opened URL
==
downloaded 1.1 Mb

* installing *source* package 'RODBC' ...
checking for gcc... gcc -std=gnu99
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc -std=gnu99 accepts -g... yes
checking for gcc -std=gnu99 option to accept ANSI C... none needed
checking how to run the C preprocessor... gcc -std=gnu99 -E
checking for egrep... grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking sql.h usability... no
checking sql.h presence... no
checking for sql.h... no
checking sqlext.h usability... no
checking sqlext.h presence... no
checking for sqlext.h... no
configure: error: ODBC headers sql.h and sqlext.h not found
ERROR: configuration failed for package 'RODBC'
* removing '/home/jorgehou/R/x86_64-unknown-linux-gnu-library/2.12/RODBC'

The downloaded packages are in
   '/tmp/Rtmpgb1Nxz/downloaded_packages'
Warning message:
In install.packages(RODBC) :
 installation of package 'RODBC' had non-zero exit status

I found some info on it here:
http://r.789695.n4.nabble.com/Problem-installing-RODBC-td2016736.html but
how should I use it???

(Yes, I am very novice to Linux (and R) so it might be a stupid
question)

Thanks!
J?rgen
--
J?rgen Blystad Houge
MSc Student Industrial Economics NTNU, Norway

[[alternative HTML version deleted]]




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with RODBC installation

2010-11-04 Thread Peter Dalgaard

On 11/04/2010 08:14 AM, Jørgen Blystad Houge wrote:
...
 '/tmp/Rtmpgb1Nxz/downloaded_packages'
 Warning message:
 In install.packages(RODBC) :
   installation of package 'RODBC' had non-zero exit status
 
 I found some info on it here:
 http://r.789695.n4.nabble.com/Problem-installing-RODBC-td2016736.html but
 how should I use it???
 
 (Yes, I am very novice to Linux (and R) so it might be a stupid
 question)


Well, the answer is in the output:

 configure: error: ODBC headers sql.h and sqlext.h not found

That's an installation issue with ODBC, not an R issue as such. Usually
a development package is missing, but exactly which one depends on your
particular flavour of Linux. In Fedora 13, it is here:

$ rpm -qf /usr/include/sqlext.h
unixODBC-devel-2.2.14-12.fc13.i686

so the unixODBC-devel package is required. In e.g. Ubuntu, it is -er-
somewhere else...

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to do bootstrap for the complex sample design?

2010-11-04 Thread Fei xu


Hello;
 
Our survey is structured as : To be investigated area is divided into 6 
regions, 
within each region, one urban community and one rural community are randomly 
selected,
then samples are randomly drawn from each selected uran and rural community.  
 
The problems is that in urban/rural stratum, we only have one sample. 
In this case, how to do bootstrap?
 
Any comments or hints are greatly appreciated!
 
Faye  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logical vectors

2010-11-04 Thread Stephen Liu

Hi Gerrit,

Thanks for your advice.


In;

2.4 Logical vectors
http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics

It states:-

The logical operators are , =, , =, == for exact equality and != for 
inequality 

# exact equality
!=   # inequality


I did follows;

 x - 1:5
 x
[1] 1 2 3 4 5

 temp - x != 1
 temp
[1] FALSE  TRUE  TRUE  TRUE  TRUE

That is correct.


 rm(temp)
 
 temp - x  1
 temp
[1] FALSE  TRUE  TRUE  TRUE  TRUE

That seems not correct.

My understanding is;
 [1] TRUE  FALSE  FALSE FALSE  FALSE

B.R.
Stephen L





- Original Message 
From: Gerrit Eichner gerrit.eich...@math.uni-giessen.de
To: Stephen Liu sati...@yahoo.com
Cc: r-help@r-project.org
Sent: Thu, November 4, 2010 2:34:55 PM
Subject: Re: [R] Logical vectors

On Wed, 3 Nov 2010, Stephen Liu wrote:

[snip]

 2)
 x
 [1] 1 2 3 4 5
 temp - x  1
 temp
 [1] FALSE  TRUE  TRUE  TRUE  TRUE


 Why NOT
 temp
 [1] TRUE  FALSE  FALSE FALSE  FALSE

 ?


Maybe because of the definition of  (greater (!) than)? Or do you 
expect 1 to be greater than 1 and not greater than 2, 3, 4, and 5?

  Regards  --  Gerrit

-
AOR Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner
-




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logical vectors

2010-11-04 Thread Stephen Liu

Hi Joshua,

Thanks for your advice.

 assign(x = temp, value = 1:5  1)

 using the assign function (not often recommended) to avoid any
 confusion with the assignment operator, -.

 temp
[1] FALSE  TRUE  TRUE  TRUE  TRUE


I got it.  Thanks


B.R.
Stephen L



- Original Message 
From: Joshua Wiley jwiley.ps...@gmail.com
To: Stephen Liu sati...@yahoo.com
Cc: r-help@r-project.org
Sent: Thu, November 4, 2010 2:46:15 PM
Subject: Re: [R] Logical vectors

On Wed, Nov 3, 2010 at 10:50 PM, Stephen Liu sati...@yahoo.com wrote:
 Hi folks,

 Pls help me to understand follow;

 An Introduction to R

 2.4 Logical vectors
 http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics

 1)
 x
 [1] 1 2 3 4 5

a vector, x, is defined with 5 elements, {1, 2, 3, 4, 5}

 temp - x != 1

perform the logical test that x does not equal 1 returning either TRUE or FALSE.

1 = 1 so TRUE, 2 != 1 so FALSE, etc.  next we assign *the results* of
the logical test to the vector 'temp'

 temp
 [1] FALSE  TRUE  TRUE  TRUE  TRUE

print the vector to screen




 2)
 x
 [1] 1 2 3 4 5

note that x has not changed here, we assigned to temp, not to x.

 temp - x  1

now we assign the results of the logical test, x  1

{1 = 1 so FALSE, 2  1 so TRUE, 3  1 so TRUE, 4  1 so TRUE, 5  1 so TRUE}

we assign these results to a vector, 'temp'.  This *new* assignment
overwrites the old vector 'temp'

 temp
 [1] FALSE  TRUE  TRUE  TRUE  TRUE

print temp to screen, this is the results of our second logical test (x  1).



 Why NOT
 temp
 [1] TRUE  FALSE  FALSE FALSE  FALSE

My best guess of where you got confused is that we assigned the
results to 'temp', so 'x' remained unchanged {1, 2, 3, 4, 5}, or that
you confused '-' which is the assignment operator in R, to less than
negative... *OR*  less than or equal.  We could write this
equivalently:

 1:5  1
[1] FALSE  TRUE  TRUE  TRUE  TRUE

this was the logical test, whose results were assigned to the vector, temp.

 assign(x = temp, value = 1:5  1)

using the assign function (not often recommended) to avoid any
confusion with the assignment operator, -.

 temp
[1] FALSE  TRUE  TRUE  TRUE  TRUE

print to screen

HTH,

Josh


 ?


 TIA

 B.R.
 Stephen L



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Best Fit line trouble with rsruby

2010-11-04 Thread Alex Gutteridge

On Wed, 3 Nov 2010 18:24:43 -0700 (PDT), Deadpool deadpoo...@comcast.net
wrote:
 Hello, I am using R, through rsruby, to create a graph and best fit line
 for
 a set of data points, regarding data collected in a Chemistry class. The
 problem is that although the graph functions perfectly properly, the
best
 fit line will not work.
 
 I initially used code I pretty much copied from a website with a
tutorial
 on
 this, which was:
 
 graphData.png(/code/Beer's-Law Graph.png)
 concentration = p1Conc
 absorbance = p1AbsorbanceArray
 graphData.assign('x', p1Conc)
 graphData.assign('y', p1AbsorbanceArray)
 fit = graphData.lm('x ~ y')
 graphData.plot(concentration, absorbance)
 graphData.abline(fit[coefficients][(Intercept)],
 fit[coefficients][y])
 puts fit[coefficients]
 graphData.eval_R(dev.off())
 
 (p1Conc and p1AbsorbanceArray are arrays)
 
 This worked for the graph, but the best fit line looked (and the
 infinitesimally small slope supported) like it was based off a single
 point.
 The site said they had to define something in the R interpreter first,
but
 didn't elaborate, so I gave it a go, and obviously it didn't work.

It looks to me like you have the response and explanatory variables
swapped in your model (or your plot).

Try:

fit = graphData.lm(y~x)
graphData.plot(concentration, absorbance)
graphData.abline(fit[coefficients][(Intercept)],fit[coefficients][x])

Or just swap the axes on your plot.

 I then tried something like this, as I thought the conversion from the
 array
 to the string in the assign function was causing the problem with the
best
 fit line.

No - that should be fine. You aren't converting an array into a string
just assigning a Ruby Array to an R variable (vector) with the given name.
 
 graphData = RSRuby.instance
 graphData.png(/code/Beer's-Law Graph.png)
 concentration = graphData.c(p1Conc[0..(p1SampNum - 1)])
 absorbance = graphData.c(p1AbsorbanceArray[0..(p1SampNum - 1)])
 fit = graphData.lm(concentration ~ absorbance)
 graphData.plot(concentration, absorbance)
 graphData.abline(fit[coefficients][(Intercept)],
 fit[coefficients][absorbance])
 puts fit[coefficients]
 print \n
 graphData.eval_R(dev.off())
 
 Basically trying to bypass that, and feed the numbers straight from the
 array into the best fit line, but the program was giving me an error,
 saying
 it didn't know what ~ was for an array (should note I tried it first
 without
 doing the graphData.c thing, but that didn't work and as the .c function
 didn't seem to store things as an array, I thought that might work, it
 didn't, as it does store data as an array).

RSRuby doesn't know about R formulas so a bare '~' is a syntax error in
Ruby. You must pass the model specification as a string as you did the
first time. Unfortunately this means you either have to do the .assign()
workaround to get the data into variables R can see or pass the data via
the 'data' argument to lm. See this irb session for an example of the
second technique:

wsp00614206:~ GUTTEA$ irb
 require 'rsruby'
= true
 r = RSRuby.instance
= #RSRuby:0x101176c20 @class_table={}, @default_mode=-1, @caching=true,
@cache={get=#RObj:0x101176798, helpfun=#RObj:0x101172cd8,
help=#RObj:0x101172cd8, NaN=NaN, FALSE=false, TRUE=true,
F=false, NA=-2147483648, eval=#RObj:0x101175230, T=true,
parse=#RObj:0x1011757a8}, @proc_table={}
 x = (1..10).to_a
= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 y = (11..20).to_a
= [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
 fit = r.lm(y~x,:data={'x' = x, 'y' = y})
= {model={x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], y=[11, 12, 13, 14,
15, 16, 17, 18, 19, 20]}, qr={qr=[[-3.16227766016838,
-17.3925271309261], [0.316227766016838, 9.08295106229247],
[0.316227766016838, 0.15621147358221], [0.316227766016838,
0.0461150970695743], [0.316227766016838, -0.0639812794430617],
[0.316227766016838, -0.174077655955698], [0.316227766016838,
-0.284174032468334], [0.316227766016838, -0.39427040898097],
[0.316227766016838, -0.504366785493606], [0.316227766016838,
-0.614463162006242]], pivot=[1, 2], rank=2, tol=1.0e-07,
qraux=[1.31622776601684, 1.26630785009485]}, assign=[0, 1],
rank=2, residuals={6=3.24300739408472e-16,
7=3.20784180753863e-16, 8=-3.4886619267584e-16,
9=-1.01851656610554e-15, 1=-3.63520003547369e-15,
2=1.72099416959944e-15, 3=1.22302883507243e-15,
10=-3.55899309985059e-16, 4=9.97467671492785e-16,
5=7.71906507913144e-16}, df.residual=8,
effects={=1.77635683940025e-15, x=9.08295106229247,
(Intercept)=-49.0153037326099}, xlevels={},
fitted.values={6=16.0, 7=17.0, 8=18.0, 9=19.0, 1=11.0,
2=12.0, 3=13.0, 10=20.0, 4=14.0, 5=15.0},
call=#RObj:0x10111c810, terms=#RObj:0x10111c798,
coefficients={x=1.0, (Intercept)=10.0}}
 fit[coefficients]
= {x=1.0, (Intercept)=10.0}

 So basically I'm stuck. Not sure if anyone has any experience with
rsruby,
 but any help would be appreciated. I'm pretty sure the fit =
 graphData.lm(etcetera) line is where the trouble is, but not sure how to
 handle it.

You got pretty close!

-- 
Alex Gutteridge

Re: [R] Logical vectors

2010-11-04 Thread Gerrit Eichner


On Thu, 4 Nov 2010, Stephen Liu wrote:

[snip]


In;

2.4 Logical vectors
http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics

It states:-

The logical operators are , =, , =, == for exact equality and != for
inequality 


   # exact equality

!=   # inequality


[snip]


Hello, Stephen,

in my understanding of the sentence

The logical operators are , =, , =, == for exact equality and != for 
inequality 


the phrase exact equality refers to the operator ==, i. e. to the last 
element == in the enumeration (, =, , =, ==), and not to its first.


 Regards  --  Gerrit

-
AOR Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading in irregular, daily time series data

2010-11-04 Thread veol


Hello,

I am trying to read in some time series data and am having trouble. Forgive
me, I am quite new to R. The data is in the form of:

TS.1
2000-07-28 1419.89
2000-07-31 1430.83
2000-08-01 1438.1
2000-08-02 1438.7
2000-08-03 1452.56
2000-08-04 1462.93
2000-08-07 1479.32
2000-08-08 1482.8
2000-08-09 1472.87
2000-08-10 1460.25
2000-08-11 1471.84
2000-08-14 1491.56
2000-08-15 1484.43
...

The data is daily data, but it is irregular (i.e. there are some missing
data points). 

I have tried reading in the data like so, but when I plot, the time series
does not preserve the dates.

sp500 - ts(read.table(SP500.txt,header=TRUE))
plot.ts(sp500)

I have searched the forums, and have found the current method to work:

sp500 - read.table(SP500.txt,header=TRUE)
date - sp500[,1]
data - sp500[,2]
z - aggregate(zoo(data), as.Date(date), tail, 1)
merge(z, zoo(, as.Date(unclass(time(as.ts(z))), fill=0)))
plot.zoo(z)

However, now I cannot analyze the acf of the time series, as R complains
giving an error:
Error in na.fail.default(as.ts(x)) : missing values in object

Is there any way to fix this?

I would actually prefer not using the zoo class, as it breaks the acf
function in R. Would the new zoo object be compatible with all the ts
functions?

Thank you!

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Reading-in-irregular-daily-time-series-data-tp3026688p3026688.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logical vectors

2010-11-04 Thread Ted Harding

On 04-Nov-10 08:56:42, Gerrit Eichner wrote:
 On Thu, 4 Nov 2010, Stephen Liu wrote:
 [snip]
 In;

 2.4 Logical vectors
 http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics

 It states:-

 The logical operators are , =, , =, == for exact equality and !=
 for inequality 

# exact equality
 !=   # inequality
 
 [snip]
 
 
 Hello, Stephen,
 in my understanding of the sentence
 
 The logical operators are , =, , =, == for exact equality and !=
 for inequality 
 
 the phrase exact equality refers to the operator ==, i. e. to the
 last element == in the enumeration (, =, , =, ==), and not to its
 first.
 
   Regards  --  Gerrit

This indicates that the sentence can be mis-read. It should be
cured by a small change in punctuation (hence I copy to R-devel):

  The logical operators are , =, , =; == for exact equality;
  and != for inequality 

Hoping this helps!
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 04-Nov-10   Time: 09:08:37
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logical vectors

2010-11-04 Thread Stephen Liu

H Gerrit,

 the phrase exact equality refers to the operator ==, i. e. to the last 
 element == in the enumeration (, =, , =, ==), and not to its first.


 x - 1:5
 x
[1] 1 2 3 4 5

 temp -x == 1
 temp
[1]  TRUE FALSE FALSE FALSE FALSE

I got it thanks.

B.R.
Stephen L




- Original Message 
From: Gerrit Eichner gerrit.eich...@math.uni-giessen.de
To: Stephen Liu sati...@yahoo.com
Cc: r-help@r-project.org
Sent: Thu, November 4, 2010 4:56:42 PM
Subject: Re: [R] Logical vectors

On Thu, 4 Nov 2010, Stephen Liu wrote:

[snip]

 In;

 2.4 Logical vectors
 http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics

 It states:-

 The logical operators are , =, , =, == for exact equality and != for
 inequality 

# exact equality
 !=   # inequality

[snip]


Hello, Stephen,

in my understanding of the sentence

The logical operators are , =, , =, == for exact equality and != for 
inequality 

the phrase exact equality refers to the operator ==, i. e. to the last 
element == in the enumeration (, =, , =, ==), and not to its first.

  Regards  --  Gerrit

-
AOR Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner
-




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 3D Elliptic Fourier

2010-11-04 Thread ZHANG Yingqi

dear all,
 Is it possible to do some 3D Elliptic Fourier analysis with R? I've 
read Morphometrics with R (Use R). There are several functions in the book 
like efourier, iefourier, and NEF to deal with the 2D closed outlines. 
Does anybody have any idea about how to deal with 3D outlines with R? Are there 
any known package, function, or publications about this?
 Thanks a lot!

Yingqi



Yingqi ZHANG

Beijing P.O. Box 643, China 100044
Institute of Vertebrate Paleontology and Paleoanthropology (IVPP)
Chinese Academy of Sciences
Tel: +86-10-88369378 Fax: +86-10-68337001
Email: arvico...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Suppressing (or changing) the colours when using spatstat plot.quadratcounts

2010-11-04 Thread David O'Sullivan


Hi,
Using the quadrats and quadratcount functions in spatstat, when I go to 
plot either of these, I get the quadrats coloured by their identity, 
i.e., using a color ramp applied to the sequence of quadrats.  This only 
happens when the quadrats are applied to an owin which is polygonal, 
i.e., when I have an irregularly shaped study area.


There doesn't seem to be any obvious way to over-ride this behaviour 
that I can find.  I would ideally like to be able to colour the quadrats 
by the number of points/events each contains, rather than by their identity.


Anyone run into this?  It seems like an obvious request, maybe I just 
haven't spotted the necessary option, although I think maybe not - it's 
something to do with how plot.tess() works.


Thanks

David
--
David O'Sullivan
Associate Professor of Geography
University of Auckland | Te Whare Wananga o Tamaki Makaurau
http://www.sges.auckland.ac.nz/the_school/our_people/osullivan_david/index.shtm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] density() function: differences with S-PLUS

2010-11-04 Thread Nicola Sturaro Sommacal (Quantide srl)

Dear William,
I obtained the same x values also without the from= and to= argument, using
bw instead width in R.

At this point I try to use a two step procedure for the y:
 - in the first step I obtained the x as below,
 - in the second step I used the minimum and the maximum values for the x as
from= and to= arguments.
In this way I obtain, in R, y values close to the S+ ones, but not the same.

R code and S+ code and output are below.

Thanks again.
Nicola



# R CODE
exdata = iris$Sepal.Length[iris$Species == setosa]
density(exdata, bw = 4, n = 50, cut = 0.75)$x  # SAME AS S+
density(exdata, bw = 4, n = 50, cut = 0.75)$y  # COMPLETELY DIFFERENT
density(exdata, width = 4, n = 50, from = 1.3, to = 8.8, cut = 0.75)$y  #
CLOSE TO S+


# SPLUS CODE AND OUTPUT
 exdata = iris[, 1, 1]
 density(exdata, width = 4)
$x:
 [1] 1.30 1.453061 1.606122 1.759184 1.912245 2.065306
 [7] 2.218367 2.371429 2.524490 2.677551 2.830612 2.983673
[13] 3.136735 3.289796 3.442857 3.595918 3.748980 3.902041
[19] 4.055102 4.208163 4.361224 4.514286 4.667347 4.820408
[25] 4.973469 5.126531 5.279592 5.432653 5.585714 5.738776
[31] 5.891837 6.044898 6.197959 6.351020 6.504082 6.657143
[37] 6.810204 6.963265 7.116327 7.269388 7.422449 7.575510
[43] 7.728571 7.881633 8.034694 8.187755 8.340816 8.493878
[49] 8.646939 8.80

$y:
 [1] 0.0007849649 0.0013097474 0.0021225491 0.0033616520
 [5] 0.0052059615 0.0078856717 0.0116917555 0.0169685132
 [9] 0.0241073754 0.0335286785 0.0456521053 0.0608554862
[13] 0.0794235072 0.1014901241 0.1269807991 0.1555625999
[17] 0.1866111931 0.2192033788 0.2521417640 0.2840144993
[21] 0.3132881074 0.3384260582 0.3580208688 0.3709241384
[25] 0.3763578665 0.3739920600 0.3639778683 0.3469316232
[29] 0.3238721233 0.2961200278 0.2651731505 0.2325739601
[33] 0.1997853985 0.1680884651 0.1385105802 0.1117884914
[37] 0.0883644110 0.0684099972 0.0518702141 0.0385181792
[41] 0.0280126487 0.0199513951 0.0139159044 0.0095050745
[45] 0.0063575653 0.0041639082 0.0026680819 0.0016700727
[49] 0.0010169912 0.0005962089




2010/11/3 William Dunlap wdun...@tibco.com

 Did you get my reply (1:31pm PST Tuesday)
 to your request?  It showed how you needed
 to use the from= and to= argument to density
 to get identical x components to the output
 and that the small differences in the y
 component were due to S+ truncating the
 gaussian kernel at +- 4 standard deviations
 from the center while R does not truncate
 the gaussian kernel (it output looks like it
 uses a Fourier transform to do the convolution).


 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of Nicola
  Sturaro Sommacal (Quantide srl)
  Sent: Wednesday, November 03, 2010 3:34 AM
  To: Joshua Wiley
  Cc: r-help@r-project.org
  Subject: Re: [R] density() function: differences with S-PLUS
 
  Dear Joshua,
 
  first of all, thank you very much for reply. I hoped that
  someone who's
  familiar with both S+ and R can reply to me, because I spent
  some hours to
  looking for a solution.
 
  If someone else would try, this is the SPLUS code and output,
  while below
  there is the R code. I obtain the same x values, while y values are
  differents for both examples.
 
  Thank you very much.
 
  Nicola
 
 
  ### S-PLUS CODE AND OUTPUT ###
 
   density(1:1000, width = 4)
  $x:
   [1]-2.018.5102039.0204159.5306180.04082
  100.55102   121.06122
   [8]   141.57143   162.08163   182.59184   203.10204   223.61224
  244.12245   264.63265
  [15]   285.14286   305.65306   326.16327   346.67347   367.18367
  387.69388   408.20408
  [22]   428.71429   449.22449   469.73469   490.24490   510.75510
  531.26531   551.77551
  [29]   572.28571   592.79592   613.30612   633.81633   654.32653
  674.83673   695.34694
  [36]   715.85714   736.36735   756.87755   777.38776   797.89796
  818.40816   838.91837
  [43]   859.42857   879.93878   900.44898   920.95918   941.46939
  961.97959   982.48980
  [50]  1003.0
 
  $y:
   [1] 4.565970e-006 1.31e-003 9.999374e-004 1.31e-003
  9.999471e-004
  1.31e-003
   [7] 9.999560e-004 1.30e-003 9.999643e-004 1.29e-003
  9.999718e-004
  1.28e-003
  [13] 9.999788e-004 1.26e-003 9.999852e-004 1.24e-003
  9.10e-004
  1.22e-003
  [19] 9.63e-004 1.19e-003 1.01e-003 1.16e-003
  1.06e-003
  1.13e-003
  [25] 1.10e-003 1.10e-003 1.13e-003 1.06e-003
  1.16e-003
  1.01e-003
  [31] 1.19e-003 9.63e-004 1.22e-003 9.10e-004
  1.24e-003
  9.999852e-004
  [37] 1.26e-003 9.999788e-004 1.28e-003 9.999718e-004
  1.29e-003
  9.999643e-004
  [43] 1.30e-003 9.999560e-004 1.31e-003 9.999471e-004
  1.31e-003
  9.999374e-004
  [49] 1.31e-003 4.432131e-006
 
 
   exdata = iris[, 1, 1]
   density(exdata, width = 4)
  $x:
   [1] 1.30 1.453061 1.606122 1.759184 1.912245 2.065306
  2.218367

[R] postForm() in RCurl and library RHTMLForms

2010-11-04 Thread sayan dasgupta

Hi RUsers,

Suppose I want to see the data on the website
url - http://www.nseindia.com/content/indices/ind_histvalues.htm;

for the index SP CNX NIFTY for
dates FromDate=01-11-2010,ToDate=02-11-2010

then read the html table from the page using readHTMLtable()

I am using this code
webpage - postForm(url,.params=list(
   FromDate=01-11-2010,
   ToDate=02-11-2010,
   IndexType=SP CNX NIFTY,
   Indicesdata=Get Details),
 .opts=list(useragent = getOption(HTTPUserAgent)))

But it doesn't give me desired result

Also I was trying to use the function getHTMLFormDescription from the
package RHTMLForms but there we can't use the argument
.opts=list(useragent = getOption(HTTPUserAgent)) which is needed for this
particular website


Thanks and Regards
Sayan Dasgupta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logical vectors

2010-11-04 Thread Stephen Liu

Hi Ted,

Thanks for your advice and the correction on the document concerned.

B.R.
Stephen L



- Original Message 
From: ted.hard...@wlandres.net ted.hard...@wlandres.net
To: r-help@r-project.org
Cc: Stephen Liu sati...@yahoo.com; R-Devel r-de...@stat.math.ethz.ch
Sent: Thu, November 4, 2010 5:08:42 PM
Subject: Re: [R] Logical vectors

On 04-Nov-10 08:56:42, Gerrit Eichner wrote:
 On Thu, 4 Nov 2010, Stephen Liu wrote:
 [snip]
 In;

 2.4 Logical vectors
 http://cran.r-project.org/doc/manuals/R-intro.html#R-and-statistics

 It states:-

 The logical operators are , =, , =, == for exact equality and !=
 for inequality 

# exact equality
 !=   # inequality
 
 [snip]
 
 
 Hello, Stephen,
 in my understanding of the sentence
 
 The logical operators are , =, , =, == for exact equality and !=
 for inequality 
 
 the phrase exact equality refers to the operator ==, i. e. to the
 last element == in the enumeration (, =, , =, ==), and not to its
 first.
 
   Regards  --  Gerrit

This indicates that the sentence can be mis-read. It should be
cured by a small change in punctuation (hence I copy to R-devel):

  The logical operators are , =, , =; == for exact equality;
  and != for inequality 

Hoping this helps!
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 04-Nov-10   Time: 09:08:37
-- XFMail --



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Importing triple-s (standard survey structure) files to R

2010-11-04 Thread Andrie de Vries


Dear R mailing list

I am about to start working on a project for a market research 
customer.  Their survey data is in the triple-s format, and I need to 
import this into R.


The triple-s (standard survey structure) format is an open format for 
the exchange of survey data.  It consists of two files, both in plain 
text format.

- A text data file (.asc)
- A metadata (.sss) file that describes the survey and data structure

According the triple-s website this format is supported by a long list 
of survey and statistical software, including SPSS.  However, a search 
of Google, R-Site and the R mailing list archives reveals nothing at all.


I would be interested to know if anyone has prototype code for importing 
triple-s.  If nothing exists, I plan to write and contribute a package 
to do this, a starting point would be very helpful.


The current specification of triple-s support an XML format, so the way 
forward is probably to re-use the XML package code and build upon this.


More information at the triple-s website:
http://www.triple-s.org/oft.htm

Regards

Andrie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Closing unreferenced result sets in dbi / RSQLite

2010-11-04 Thread Andreas Borg


Hello R-help members,

I have one problem with the database interface dbi (more specifically, I 
work with RSQLite). Consider the following example, which writes a test 
table to a temporary SQLite database and sends a query to read from it:


library(RSQLite)
df - as.data.frame(matrix(runif(4), nrow=2, ncol=2))
drv - dbDriver(SQLite)
con - dbConnect(drv)
dbWriteTable(con, df, df)
dbSendQuery(con, select * from df)


In the last line I forgot to assign the DBIResult object returned by 
dbSendQuery() to a variable, which happens from time to time when I work 
interactively. The following attempt to correct the mistake:


res - dbSendQuery(con, select * from df)

fails because the orphaned result set from the preceeding call is still 
active. Consequently, I have to close the connection to keep on working, 
which is especially annoying when working with a temporary data base 
where everything is discarded on disconnection. Is there any way to 
create a new reference to the pending result set or to close result sets 
which are not bound to a variable?


Thanks for any suggestion,

Andreas


--
Andreas Borg
Medizinische Informatik

UNIVERSITÄTSMEDIZIN
der Johannes Gutenberg-Universität
Institut für Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Straße 69, 55131 Mainz
www.imbei.uni-mainz.de

Telefon +49 (0) 6131 175062
E-Mail: b...@imbei.uni-mainz.de

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren 
Sie bitte sofort den
Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die 
unbefugte Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Closing unreferenced result sets in dbi / RSQLite

2010-11-04 Thread Michael Bedward

Hi Andreas,

Try this...

# forget to assign result set
dbSendQuery(con, select * from df)

# retrieve the result set just created
rs - dbListResults(con)[[1]]

Then you can do dbClearResult or whatever.

Michael


On 4 November 2010 19:56, Andreas Borg andreas.b...@unimedizin-mainz.de wrote:
 Hello R-help members,

 I have one problem with the database interface dbi (more specifically, I
 work with RSQLite). Consider the following example, which writes a test
 table to a temporary SQLite database and sends a query to read from it:

 library(RSQLite)
 df - as.data.frame(matrix(runif(4), nrow=2, ncol=2))
 drv - dbDriver(SQLite)
 con - dbConnect(drv)
 dbWriteTable(con, df, df)
 dbSendQuery(con, select * from df)


 In the last line I forgot to assign the DBIResult object returned by
 dbSendQuery() to a variable, which happens from time to time when I work
 interactively. The following attempt to correct the mistake:

 res - dbSendQuery(con, select * from df)

 fails because the orphaned result set from the preceeding call is still
 active. Consequently, I have to close the connection to keep on working,
 which is especially annoying when working with a temporary data base where
 everything is discarded on disconnection. Is there any way to create a new
 reference to the pending result set or to close result sets which are not
 bound to a variable?

 Thanks for any suggestion,

 Andreas


 --
 Andreas Borg
 Medizinische Informatik

 UNIVERSITÄTSMEDIZIN
 der Johannes Gutenberg-Universität
 Institut für Medizinische Biometrie, Epidemiologie und Informatik
 Obere Zahlbacher Straße 69, 55131 Mainz
 www.imbei.uni-mainz.de

 Telefon +49 (0) 6131 175062
 E-Mail: b...@imbei.uni-mainz.de

 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
 Informationen. Wenn Sie nicht der
 richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
 informieren Sie bitte sofort den
 Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die
 unbefugte Weitergabe
 dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Orthogonalization with different inner products

2010-11-04 Thread Michael Friendly


See gsorth() in the heplots package.

On 11/3/2010 1:55 PM, adet...@uw.edu wrote:

Suppose one wanted to consider random variables X_1,...X_n and from each subtract off the 
piece which is correlated with the previous variables in the list. i.e. make new 
variables Z_i so that Z_1=X_1 and Z_i=X_i-cov(X_i,Z_1)Z_1/var(Z_1)-...- 
cov(X_i,Z__{i-1})Z__{i-1}/var(Z_{i-1})  I have code to do this but I keep getting a 
non-conformable array error in the line with the covariance.  Does anyone 
have any suggestions?  Here is my code:

gov=read.table(file.choose(), sep=\t,header=T)

gov1=gov[3:length(gov[1,])]
n_indices=length(names(gov1))

x=data.matrix(gov1)


v=x
R=matrix(rep(0,length(x[,1])*length(x[1,])),length(x[,1]))

for(j in 1:n_indices){
u=matrix(rep(0,length(v[,1])),length(v[,1]))

for(i in 1:j-1){
u = u+cov(v[,j],v[,i])*v[,i]/var(v[,i])#(error here)
}
v[,j]=v[,j]-u

}

Thanks,
 Andrew



[[alternative HTML version deleted]]




--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation for choosing regression trees

2010-11-04 Thread Jonathan P Daily

Forgive me if I misunderstand your goals but I have no idea what you are 
trying to determine or what your data is. I can say, however, that setting 
mindev to 0 has always overfit data for me, and that you are more than 
likely looking at a situation in which that 1 node tree is more accurate.

Also, if you look at ?cv.tree, the default function to use is 
prune.tree(). Perhaps prune.tree() is trimming down to that terminal node?

If you want an alternative look at CART methods that may account for some 
of your issues, I would recommend the packages 'rpart' and 'party', as 
they may be more informative.
--
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it.
 - Jubal Early, Firefly



From:
Shiyao Liu lsy...@iastate.edu
To:
r-help@r-project.org
Date:
11/03/2010 09:04 PM
Subject:
[R] cross-validation for choosing regression trees
Sent by:
r-help-boun...@r-project.org



Dear All,

We came across a problem when using the tree package to analyze our data
set.

First, in the tree function, if we use the default value mindev=0.01,
the resulting regression tree has a single node. So, we set mindev=0, 
and
obtain a tree with 931 terminal nodes.

However, when we further use the cv.tree function to run a 10-fold
cross-validation, the error message is:

Error in prune.tree(list(frame = list(var = 1L, n = 6676, dev =
3.28220789569792,  : can not prune singlenode tree.

Is the cv.tree function respecting the mindev chosen in the tree 
function
or what else might be wrong?

Thanks,
Shiyao

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sorting data from one column with strings

2010-11-04 Thread Ramsvatn Silje


Hello,

I have tried to find this out some other way, but unsuccessful I have to
try this list.
I assume this should be quite simple.

I have a dataset with 4 columns, Sample_no, Species, Nitrogen,
Carbon in csv format. In the species column I have many different
species with varying number of obs per species

Eg

Sample_no Species   Nitrogen  Carbon
1   Cod 15.2-19.0
2   Haddock 14.8-20.2
3   Cod 15.6-18.5
4   Cod 13.2-20.1
5   Haddock 14.3-18.8
Etc..

And I want to calculate, mean, standard dev etc per species for the
observations Nitrogen and Carbon. And later do plots and stats with
the different species. I will in the end have many species, so need it to
be automatic I can't enter code for every species separate.

Can anyone help me with this? Or if this is the wrong list to sendt this
question to, where do I send it?

Thank you very much in advance.


Best regards

Silje Ramsvatn

PhD-candidate
University of Tromsø
Norway

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting data from one column with strings

2010-11-04 Thread David Winsemius



On Nov 4, 2010, at 8:28 AM, Ramsvatn Silje wrote:



Hello,

I have tried to find this out some other way, but unsuccessful I  
have to

try this list.
I assume this should be quite simple.

I have a dataset with 4 columns, Sample_no, Species, Nitrogen,
Carbon in csv format. In the species column I have many different
species with varying number of obs per species

Eg

Sample_no   Species NitrogenCarbon
1   Cod 15.2-19.0
2   Haddock 14.8-20.2
3   Cod 15.6-18.5
4   Cod 13.2-20.1
5   Haddock 14.3-18.8
Etc..

And I want to calculate, mean, standard dev etc per species for the
observations Nitrogen and Carbon. And later do plots and stats  
with
the different species. I will in the end have many species, so need  
it to

be automatic I can't enter code for every species separate.



http://finzi.psych.upenn.edu/R/library/prettyR/html/brkdn.html

http://finzi.psych.upenn.edu/R/library/Hmisc/html/describe.html
e.g

library(Hmisc)
with( dfrm, describe( ~Species) )

I think you could also probably do lapply(split(dfrm, dfrm$species),  
describe)


the Hmisc::describe function is especially good at first examining a  
vector and applying the appropriate methods to the type of data. There  
are several other packages with different describe functions.


And there are several other packages such as doBy and plyr that will  
offer other concise methods for doing your by-category statistics.


--
David.


Can anyone help me with this? Or if this is the wrong list to sendt  
this

question to, where do I send it?

Thank you very much in advance.


Best regards

Silje Ramsvatn

PhD-candidate
University of Tromsø
Norway

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multi-level cox ph with time-dependent covariates

2010-11-04 Thread Terry Therneau

Your question has two levels:
   1. What is the right model for this data
   2. Can model __ be fit

Wrt 2 and coxme: For a reliable fit you need to have more events than
random effects.  Thus for patient/tissue I would want to see multiple
events per patient/tissue pair.  This is statistical issue -- when there
are too few events the confidence intervals for the random effects end
up being a mile wide.  (Exception, if the number of events is very
large, 10^5 say as sometimes occurs in economics studies, the estimates
can work.)  
   coxme works fine with start,stop data.

Wrt question 1.  Your models assume that marker1, marker2, ... each have
the same effect across tissue types.  Adding a random effect gave per
subject or per subject/tissue intercepts.  Do you instead want to do
shrinkage of the marker1, .. coefficients?

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Odp: Sorting data from one column with strings

2010-11-04 Thread Petr PIKAL

Hi

r-help-boun...@r-project.org napsal dne 04.11.2010 13:28:06:

 
 Hello,
 
 I have tried to find this out some other way, but unsuccessful I have to
 try this list.
 I assume this should be quite simple.
 
 I have a dataset with 4 columns, Sample_no, Species, Nitrogen,
 Carbon in csv format. In the species column I have many different
 species with varying number of obs per species
 
 Eg
 
 Sample_no   Species   Nitrogen   Carbon
 1  Cod  15.2  -19.0
 2  Haddock   14.8  -20.2
 3  Cod  15.6  -18.5
 4  Cod  13.2  -20.1
 5  Haddock   14.3  -18.8
 Etc..
 
 And I want to calculate, mean, standard dev etc per species for the
 observations Nitrogen and Carbon. And later do plots and stats with
 the different species. I will in the end have many species, so need it 
to
 be automatic I can't enter code for every species separate.

No need for sorting. You can us R. Particularly ?tapply, ?by or ?aggregate 
commands. Regarding plots you can consider lattice or ggplot2, but you can 
get good results also with base graphics.

aggregate(your.data[,3:4], list(yourdata$Species), function(x) c(mean(x), 
sd(x)))
xyplot(nitrogen~carbon|species, data=your.data)

Regards
Petr


 
 Can anyone help me with this? Or if this is the wrong list to sendt this
 question to, where do I send it?
 
 Thank you very much in advance.
 
 
 Best regards
 
 Silje Ramsvatn
 
 PhD-candidate
 University of Tromsø
 Norway
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to set initial values in lme ?

2010-11-04 Thread Hoai Thu Thai


Hello R-users,

Does anyone know how to set initial values in lme? I have a problem of 
non convergence and would like to try different inital values.
Is lmeScale the right function to do it and how many parameters do we 
need to specify, only fixed parameters or also random effects ?


Thanks for any advice,

Best regards,

--
THAI Hoai Thu
INSERM U738 - Université Paris 7
16 rue Henri Huchard
75018 Paris, FRANCE
Tel: 01 57 27 75 39
Email: hoai-thu.t...@inserm.fr

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-04 Thread Matevž Pavlič

Hi David, 

I am still having troubles with that loop ...

This code gives me (kinda) the name of the column/field in a data frame. Filed 
names are form W1-W10. But there is a space between W and a number -- W 10, 
and column (field) names do not contain numbers. 

for(i in 1:10) 
{
vari - paste(W,i)
}
vari

[1] W 10

Now as i understand than i would call different columns to R with 

w-lit[[vari]]

Or am i wrong again?

Then I would probably need another loop to create the names of the variables on 
R, i.e. w1 to w10. Is that a general idea for the procedure?


Thank for the help, m

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Wednesday, November 03, 2010 10:41 PM
To: Matevž Pavlič
Cc: r-help@r-project.org
Subject: Re: [R] Loop


On Nov 3, 2010, at 5:03 PM, Matevž Pavlič wrote:

 Hi,

 Thanks for the help and the manuals. Will come very handy i am sure.

 But regarding the code i don't hink this is what i wantbasically i 
 would like to repeat bellow code :

 w1-table(lit$W1)
 w1-as.data.frame(w1)

It appears you are not reading for meaning. Burns has advised you how to 
construct column names and use them in your initial steps. The `$` function is 
quite limited in comparison to `[[` , so he was showing you a method that would 
be more effective.  BTW the as.data.frame step is unnecessary, since the first 
thing write.table does is coerce an object to a data.frame. The write.table 
name is misleading. It should be write.data.frame. You cannot really write 
tables with write.table.

You would also use:

  file=paste(vari, csv, sep=.) as the file argument to write.table

 write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)

What are these next actions supposed to do after the file is written?  
Are you trying to store a group of related w objects that will later be 
indexed in sequence? If so, then a list would make more sense.

--
David.

 w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)

 20 times, where W1-20 (capital letters) are the fields in a data.frame 
 called lit and w1-20 are the data.frames being created.

 Hope that explains it better,

 m

 -Original Message-
 From: Patrick Burns [mailto:pbu...@pburns.seanet.com]
 Subject: Re: [R] Loop

 If I understand properly, you'll want
 something like:

 lit[[w2]]

 instead of

 lit$w2

 more accurately:

 for(i in 1:20) {
 vari - paste(w, i)
 lit[[vari]]

 ...
 }

 The two documents mentioned in my
 signature may help you.

 On 03/11/2010 20:23, Matevž Pavlič wrote:
 Hi all,

 I managed to do what i want (with the great help of thi mailing
 list)  manually . Now i would like to automate it. I would probably 
 need a for loop for to help me with this...but of course  I have no 
 idea how to do that in R.  Bellow is the code that i would like to be 
 replicated for a number of times (let say 20). I would like to 
 achieve  that w1 would change to w2, w3, w4 ... up to w20 and by that 
 create 20 data.frames that I would than bind together with cbind.

 (i did it like shown bellow -manually)

 w1-table(lit$W1)
 w1-as.data.frame(w1)
 write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)
 w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)

 w2-table(lit$W2)

 w2-as.data.frame(w2)

 write.table(w2,file=w2.csv,sep=;,row.names=T, dec=.)

 w2- w2[order(w2$Freq, decreasing=TRUE),]

 w2-head(w2, 20)
 .
 .
 .

 Thanks for the help,m



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-04 Thread David Winsemius



On Nov 4, 2010, at 9:21 AM, Matevž Pavlič wrote:


Hi David,

I am still having troubles with that loop ...

This code gives me (kinda) the name of the column/field in a data  
frame. Filed names are form W1-W10. But there is a space between W  
and a number -- W 10, and column (field) names do not contain  
numbers.



for(i in 1:10)
{
vari - paste(W,i)


Should be:

vari - paste(w, i, sep=)



}
vari


[1] W 10





Now as i understand than i would call different columns to R with


w-lit[[vari]]


Or am i wrong again?


Maybe. Since you overwrote the first nine values there is only one  
element in vari outside the loop. I would do the assignment inside the  
loop and I suggested that the results be store in a list that is  
indexed either by vari or by i (but without the quotes if you are  
typing lit[[vari]]


--
David.


Then I would probably need another loop to create the names of the  
variables on R, i.e. w1 to w10. Is that a general idea for the  
procedure?



Thank for the help, m

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Wednesday, November 03, 2010 10:41 PM
To: Matevž Pavlič
Cc: r-help@r-project.org
Subject: Re: [R] Loop


On Nov 3, 2010, at 5:03 PM, Matevž Pavlič wrote:


Hi,

Thanks for the help and the manuals. Will come very handy i am sure.

But regarding the code i don't hink this is what i  
wantbasically i

would like to repeat bellow code :

w1-table(lit$W1)
w1-as.data.frame(w1)


It appears you are not reading for meaning. Burns has advised you  
how to construct column names and use them in your initial steps.  
The `$` function is quite limited in comparison to `[[` , so he was  
showing you a method that would be more effective.  BTW the  
as.data.frame step is unnecessary, since the first thing write.table  
does is coerce an object to a data.frame. The write.table name is  
misleading. It should be write.data.frame. You cannot really write  
tables with write.table.


You would also use:

 file=paste(vari, csv, sep=.) as the file argument to write.table


write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)


What are these next actions supposed to do after the file is written?
Are you trying to store a group of related w objects that will  
later be indexed in sequence? If so, then a list would make more  
sense.


--
David.


w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)

20 times, where W1-20 (capital letters) are the fields in a  
data.frame

called lit and w1-20 are the data.frames being created.

Hope that explains it better,



m

-Original Message-
From: Patrick Burns [mailto:pbu...@pburns.seanet.com]
Subject: Re: [R] Loop

If I understand properly, you'll want
something like:

lit[[w2]]

instead of

lit$w2

more accurately:

for(i in 1:20) {
vari - paste(w, i)
lit[[vari]]

...
}

The two documents mentioned in my
signature may help you.

On 03/11/2010 20:23, Matevž Pavlič wrote:

Hi all,

I managed to do what i want (with the great help of thi mailing
list)  manually . Now i would like to automate it. I would probably
need a for loop for to help me with this...but of course  I have no
idea how to do that in R.  Bellow is the code that i would like to  
be

replicated for a number of times (let say 20). I would like to
achieve  that w1 would change to w2, w3, w4 ... up to w20 and by  
that

create 20 data.frames that I would than bind together with cbind.

(i did it like shown bellow -manually)

w1-table(lit$W1)
w1-as.data.frame(w1)
write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)
w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)

w2-table(lit$W2)

w2-as.data.frame(w2)

write.table(w2,file=w2.csv,sep=;,row.names=T, dec=.)

w2- w2[order(w2$Freq, decreasing=TRUE),]

w2-head(w2, 20)
.
.
.

Thanks for the help,m






David Winsemius, MD
West Hartford, CT



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting data from one column with strings

2010-11-04 Thread Eigenhuis, Annemarie

Try tapply().

For example:

tapply(data$Nitrogen,factor(data$Species),mean)

For the Nitrogen column, the mean is calculated for each Species. (if the data 
frame below is in the object data)

Regards,
Annemarie Eigenhuis 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ramsvatn Silje
Sent: donderdag 4 november 2010 13:28
To: R-help@r-project.org
Subject: [R] Sorting data from one column with strings


Hello,

I have tried to find this out some other way, but unsuccessful I have to try 
this list.
I assume this should be quite simple.

I have a dataset with 4 columns, Sample_no, Species, Nitrogen, Carbon 
in csv format. In the species column I have many different species with varying 
number of obs per species

Eg

Sample_no Species   Nitrogen  Carbon
1   Cod 15.2-19.0
2   Haddock 14.8-20.2
3   Cod 15.6-18.5
4   Cod 13.2-20.1
5   Haddock 14.3-18.8
Etc..

And I want to calculate, mean, standard dev etc per species for the 
observations Nitrogen and Carbon. And later do plots and stats with the 
different species. I will in the end have many species, so need it to be 
automatic I can't enter code for every species separate.

Can anyone help me with this? Or if this is the wrong list to sendt this 
question to, where do I send it?

Thank you very much in advance.


Best regards

Silje Ramsvatn

PhD-candidate
University of Tromsø
Norway

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject)

2010-11-04 Thread Roes Da

hello,i'm roesda from indonesia
I have trouble when they have to perform parameter estimation by MLE method
using the R programming.because, the distribution  that will be used instead
of not like the distribution that already known distributions such as gamma
distribution, Poisson or binomial.  the distribution that i would estimate
the parameters are the joint distribution between the negative binomial
distribution and Lindley. how do I translate it in R if the distribution is
still new as I mentioned? i hope everyone can help me. thank you very much
Simak
Baca secara fonetik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot output

2010-11-04 Thread ashz


Dear All, 

I have this script:

dat - data.frame(Month = hstat$Date,C_avg = hstat$C.avg,C_stdev =
hstat$C.stdev)
ggplot(data = dat, aes(x = Month, y = C_avg, ymin = C_avg - C_stdev, ymax =
C_avg + C_stdev)) +
  geom_point() +
  geom_line() +
  geom_errorbar()
  
dat - data.frame(Month = hstat$Date,K_avg = hstat$K.avg,K_stdev =
hstat$K.stdev)
ggplot(data = dat, aes(x = Month, y = K_avg, ymin = K_avg - K_stdev, ymax =
K_avg + K_stdev)) +
  geom_point() +
  geom_line() +
  geom_errorbar()
  
dat - data.frame(Month = hstat$Date,S_avg = hstat$S.avg,S_stdev =
hstat$S.stdev)
ggplot(data = dat, aes(x = Month, y = S_avg, ymin = S_avg - S_stdev, ymax =
S_avg + S_stdev)) +
  geom_point() +
  geom_line() +
  geom_errorbar()

Running the script generates 3 separate graphs, how can I output them next
to each other?  

Thanks

-- 
View this message in context: 
http://r.789695.n4.nabble.com/ggplot-output-tp3027026p3027026.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot output

2010-11-04 Thread Eigenhuis, Annemarie

Have you tried ?split.screen 



Annemarie Eigenhuis, MSc

University of Amsterdam
Department of Psychology, clinical area
Roetersstraat 15
1018 WB Amsterdam
The Netherlands

phone: +31(0)205256815
email: a.eigenh...@uva.nl 


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of ashz
Sent: donderdag 4 november 2010 14:37
To: r-help@r-project.org
Subject: [R] ggplot output


Dear All, 

I have this script:

dat - data.frame(Month = hstat$Date,C_avg = hstat$C.avg,C_stdev =
hstat$C.stdev)
ggplot(data = dat, aes(x = Month, y = C_avg, ymin = C_avg - C_stdev,
ymax = C_avg + C_stdev)) +
  geom_point() +
  geom_line() +
  geom_errorbar()
  
dat - data.frame(Month = hstat$Date,K_avg = hstat$K.avg,K_stdev =
hstat$K.stdev)
ggplot(data = dat, aes(x = Month, y = K_avg, ymin = K_avg - K_stdev,
ymax = K_avg + K_stdev)) +
  geom_point() +
  geom_line() +
  geom_errorbar()
  
dat - data.frame(Month = hstat$Date,S_avg = hstat$S.avg,S_stdev =
hstat$S.stdev)
ggplot(data = dat, aes(x = Month, y = S_avg, ymin = S_avg - S_stdev,
ymax = S_avg + S_stdev)) +
  geom_point() +
  geom_line() +
  geom_errorbar()

Running the script generates 3 separate graphs, how can I output them
next to each other?  

Thanks

--
View this message in context:
http://r.789695.n4.nabble.com/ggplot-output-tp3027026p3027026.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot output

2010-11-04 Thread ONKELINX, Thierry

The easiest way it to create one long dataset with four variables:
Month, avg, stdev and type. Type will be either K, C or S.
Then you just need to add some facetting to your code

ggplot(data = dat, aes(x = Month, y = avg, ymin = avg - stdev, ymax =
avg + stdev)) +
   geom_point() +
   geom_line() +
   geom_errorbar() +
   facet_wrap(~type, nrow = 1)

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] Namens ashz
 Verzonden: donderdag 4 november 2010 14:37
 Aan: r-help@r-project.org
 Onderwerp: [R] ggplot output
 
 
 Dear All, 
 
 I have this script:
 
 dat - data.frame(Month = hstat$Date,C_avg = hstat$C.avg,C_stdev =
 hstat$C.stdev)
 ggplot(data = dat, aes(x = Month, y = C_avg, ymin = C_avg - 
 C_stdev, ymax = C_avg + C_stdev)) +
   geom_point() +
   geom_line() +
   geom_errorbar()
   
 dat - data.frame(Month = hstat$Date,K_avg = hstat$K.avg,K_stdev =
 hstat$K.stdev)
 ggplot(data = dat, aes(x = Month, y = K_avg, ymin = K_avg - 
 K_stdev, ymax = K_avg + K_stdev)) +
   geom_point() +
   geom_line() +
   geom_errorbar()
   
 dat - data.frame(Month = hstat$Date,S_avg = hstat$S.avg,S_stdev =
 hstat$S.stdev)
 ggplot(data = dat, aes(x = Month, y = S_avg, ymin = S_avg - 
 S_stdev, ymax = S_avg + S_stdev)) +
   geom_point() +
   geom_line() +
   geom_errorbar()
 
 Running the script generates 3 separate graphs, how can I 
 output them next to each other?  
 
 Thanks
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/ggplot-output-tp3027026p3027026.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot output

2010-11-04 Thread ONKELINX, Thierry

Split.screen() and par() don't work with ggplot2



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] Namens Eigenhuis, Annemarie
 Verzonden: donderdag 4 november 2010 14:53
 Aan: ashz; r-help@r-project.org
 Onderwerp: Re: [R] ggplot output
 
 Have you tried ?split.screen 
 
 
 
 Annemarie Eigenhuis, MSc
 
 University of Amsterdam
 Department of Psychology, clinical area
 Roetersstraat 15
 1018 WB Amsterdam
 The Netherlands
 
 phone: +31(0)205256815
 email: a.eigenh...@uva.nl 
 
 
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org]
 On Behalf Of ashz
 Sent: donderdag 4 november 2010 14:37
 To: r-help@r-project.org
 Subject: [R] ggplot output
 
 
 Dear All, 
 
 I have this script:
 
 dat - data.frame(Month = hstat$Date,C_avg = hstat$C.avg,C_stdev =
 hstat$C.stdev)
 ggplot(data = dat, aes(x = Month, y = C_avg, ymin = C_avg - 
 C_stdev, ymax = C_avg + C_stdev)) +
   geom_point() +
   geom_line() +
   geom_errorbar()
   
 dat - data.frame(Month = hstat$Date,K_avg = hstat$K.avg,K_stdev =
 hstat$K.stdev)
 ggplot(data = dat, aes(x = Month, y = K_avg, ymin = K_avg - 
 K_stdev, ymax = K_avg + K_stdev)) +
   geom_point() +
   geom_line() +
   geom_errorbar()
   
 dat - data.frame(Month = hstat$Date,S_avg = hstat$S.avg,S_stdev =
 hstat$S.stdev)
 ggplot(data = dat, aes(x = Month, y = S_avg, ymin = S_avg - 
 S_stdev, ymax = S_avg + S_stdev)) +
   geom_point() +
   geom_line() +
   geom_errorbar()
 
 Running the script generates 3 separate graphs, how can I 
 output them next to each other?  
 
 Thanks
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/ggplot-output-tp3027026p3027026.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems with points in plots when importing from pdf to an SVG editor

2010-11-04 Thread Rafael Björk

Dear R-users

When trying to import graphics from an pdf-file to a Vector graphics editor
(I use Inkscape, but i've confirmed the same problem on adobe products), all
points in the graphics turn out as qs.
This example displays the beaviour:

pdf(file=points are weird.pdf)
plot(1:5)
dev.off()

When importing the file to inkscape, I get five neatly arranged little qs.
The obvious workaround would be to change the points into another plotting
character, but this isn't the first time i've encountered this behaviour,
and it would be nice to solve it properly instead.
I realize this might not strictly be a question related to R, but if someone
who've encountered the problem has found a solution or workaround it would
be greatly appreciated.

Kind regards/
Rafael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoiding too many loops - reshaping data

2010-11-04 Thread Hadley Wickham

 Beware of facile comparisons of this sort -- they may be apples and nematodes.

And they also imply that the main time sink is the computation.  In my
experience, figuring out how to solve the problem using takes
considerably more time than 18 / 1000 seconds, and so investing your
energy in learning idioms that apply in a wide range of situations is
far more useful than figuring out the fastest solution to a single
problem.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with points in plots when importing from pdf to an SVG editor

2010-11-04 Thread Ivan Calandra


Hi,

Try with RSvgDevice::devSVG()
I don't have any problems with either Inkscape or Illustrator (CS4)

HTH,
Ivan

Le 11/4/2010 15:04, Rafael Björk a écrit :

Dear R-users

When trying to import graphics from an pdf-file to a Vector graphics editor
(I use Inkscape, but i've confirmed the same problem on adobe products), all
points in the graphics turn out as qs.
This example displays the beaviour:

pdf(file=points are weird.pdf)
plot(1:5)
dev.off()

When importing the file to inkscape, I get five neatly arranged little qs.
The obvious workaround would be to change the points into another plotting
character, but this isn't the first time i've encountered this behaviour,
and it would be nice to solve it properly instead.
I realize this might not strictly be a question related to R, but if someone
who've encountered the problem has found a solution or workaround it would
be greatly appreciated.

Kind regards/
Rafael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot output

2010-11-04 Thread ashz


Dear Thierry,

Your solution looks very elgant but I can not find a proper example.

Can you provide me one?

Thx

-- 
View this message in context: 
http://r.789695.n4.nabble.com/ggplot-output-tp3027026p3027108.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-04 Thread Petr PIKAL

Hi

r-help-boun...@r-project.org napsal dne 04.11.2010 14:21:38:

 Hi David, 
 
 I am still having troubles with that loop ...
 
 This code gives me (kinda) the name of the column/field in a data frame. 
Filed
 names are form W1-W10. But there is a space between W and a number -- 
W 10,
 and column (field) names do not contain numbers. 
 
 for(i in 1:10) 
 {
 vari - paste(W,i)
 }
 vari
 
 [1] W 10
 
 Now as i understand than i would call different columns to R with 
 
 w-lit[[vari]]
 
 Or am i wrong again?
 
 Then I would probably need another loop to create the names of the 
variables 
 on R, i.e. w1 to w10. Is that a general idea for the procedure?

Beware of such loops. Instead of littering your workspace with 
files/objects constructed by some paste(whatever, i) solution you can save 
results in list or data.frame or matrix and simply use basic subsetting 
procedures or lapply/sapply functions.

I must say I never used such paste(...) construction yet and I work with R 
for quite a long time.

Regards
Petr


 
 
 Thank for the help, m
 
 -Original Message-
 From: David Winsemius [mailto:dwinsem...@comcast.net] 
 Sent: Wednesday, November 03, 2010 10:41 PM
 To: Matevž Pavlič
 Cc: r-help@r-project.org
 Subject: Re: [R] Loop
 
 
 On Nov 3, 2010, at 5:03 PM, Matevž Pavlič wrote:
 
  Hi,
 
  Thanks for the help and the manuals. Will come very handy i am sure.
 
  But regarding the code i don't hink this is what i wantbasically i 

  would like to repeat bellow code :
 
  w1-table(lit$W1)
  w1-as.data.frame(w1)
 
 It appears you are not reading for meaning. Burns has advised you how to 

 construct column names and use them in your initial steps. The `$` 
function is
 quite limited in comparison to `[[` , so he was showing you a method 
that 
 would be more effective.  BTW the as.data.frame step is unnecessary, 
since the
 first thing write.table does is coerce an object to a data.frame. The 
 write.table name is misleading. It should be write.data.frame. You 
cannot 
 really write tables with write.table.
 
 You would also use:
 
   file=paste(vari, csv, sep=.) as the file argument to write.table
 
  write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)
 
 What are these next actions supposed to do after the file is written? 
 Are you trying to store a group of related w objects that will later 
be 
 indexed in sequence? If so, then a list would make more sense.
 
 --
 David.
 
  w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)
 
  20 times, where W1-20 (capital letters) are the fields in a data.frame 

  called lit and w1-20 are the data.frames being created.
 
  Hope that explains it better,
 
  m
 
  -Original Message-
  From: Patrick Burns [mailto:pbu...@pburns.seanet.com]
  Subject: Re: [R] Loop
 
  If I understand properly, you'll want
  something like:
 
  lit[[w2]]
 
  instead of
 
  lit$w2
 
  more accurately:
 
  for(i in 1:20) {
  vari - paste(w, i)
  lit[[vari]]
 
  ...
  }
 
  The two documents mentioned in my
  signature may help you.
 
  On 03/11/2010 20:23, Matevž Pavlič wrote:
  Hi all,
 
  I managed to do what i want (with the great help of thi mailing
  list)  manually . Now i would like to automate it. I would probably 
  need a for loop for to help me with this...but of course  I have no 
  idea how to do that in R.  Bellow is the code that i would like to be 

  replicated for a number of times (let say 20). I would like to 
  achieve  that w1 would change to w2, w3, w4 ... up to w20 and by that 

  create 20 data.frames that I would than bind together with cbind.
 
  (i did it like shown bellow -manually)
 
  w1-table(lit$W1)
  w1-as.data.frame(w1)
  write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)
  w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)
 
  w2-table(lit$W2)
 
  w2-as.data.frame(w2)
 
  write.table(w2,file=w2.csv,sep=;,row.names=T, dec=.)
 
  w2- w2[order(w2$Freq, decreasing=TRUE),]
 
  w2-head(w2, 20)
  .
  .
  .
 
  Thanks for the help,m
 
 
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Converting Strings to Variable names

2010-11-04 Thread Anand Bambhania

Hi all,

I am processing 24 samples data and combine them in single table called
CombinedSamples using following:

CombinedSamples-rbind(Sample1,Sample2,Sample3)

Now variables Sample1, Sample2 and Sample3 have many different columns.

To make it more flexible for other samples I'm replacing above code with a
for loop:

#Sample is a string vector containing all 24 sample names

for (k in 1:length(Sample))
{
  CombinedSamples-rbind(get(Sample[k]))
}

This code only stores last sample data as CombinedSample gets overwritten
every time. Using CombinedSamples[k] or CombinedSamples[k,] causes
dimension related errors as each Sample has several rows and not just 24. So
how can I assign data of all 24 samples to CombinedSamples?

Thanks,

Anand

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to do bootstrap for the complex sample design?

2010-11-04 Thread Robert A LaBudde


At 01:38 AM 11/4/2010, Fei xu wrote:


Hello;

Our survey is structured as : To be investigated area is divided 
into 6 regions,
within each region, one urban community and one rural community are 
randomly selected,

then samples are randomly drawn from each selected uran and rural community.

The problems is that in urban/rural stratum, we only have one sample.
In this case, how to do bootstrap?

Any comments or hints are greatly appreciated!

Faye


Just make a table of your data, with each row corresponding to a 
measurement. You columns will be Region, UrbanCommunity, 
RuralCommunity and your response variables.


Bootstrap resampling is just generating random row indices into this 
table, with replacement. I.e.,


index- sample(1:N, N, replace=TRUE)

Then your resample is myTable[index,].

Because you chose UrbanCommunity and RuralCommunity randomly, this 
shouldn't be a problem. The fact that you choose a subsample size of 
1 means you won't be able to estimate within-region variances unless 
you make some serious assumptions (e.g., UrbanCommunity effect 
independent of Region effect).



Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot output

2010-11-04 Thread ONKELINX, Thierry

Have a look at the ggplot2 website. It has a lot of examples
http://had.co.nz/ggplot2/ look at the bottom of this page for
facet_grid() and facet_wrap()

http://had.co.nz/ggplot2/facet_wrap.html direct link to facet_wrap()



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] Namens ashz
 Verzonden: donderdag 4 november 2010 15:32
 Aan: r-help@r-project.org
 Onderwerp: Re: [R] ggplot output
 
 
 Dear Thierry,
 
 Your solution looks very elgant but I can not find a proper example.
 
 Can you provide me one?
 
 Thx
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/ggplot-output-tp3027026p3027108.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-04 Thread Matevž Pavlič

Hi all, 

I understand that you most of you this is a peice of cake but i am a complete 
newbie in thisso any example would be greatly aprpeciated and also any hint 
as how to get around in R. Frankly i sometimes see the help files kinda 
confusing.

M

-Original Message-
From: Petr PIKAL [mailto:petr.pi...@precheza.cz] 
Sent: Thursday, November 04, 2010 3:40 PM
To: Matevž Pavlič
Cc: r-help@r-project.org
Subject: Re: [R] Loop

Hi

r-help-boun...@r-project.org napsal dne 04.11.2010 14:21:38:

 Hi David,
 
 I am still having troubles with that loop ...
 
 This code gives me (kinda) the name of the column/field in a data frame. 
Filed
 names are form W1-W10. But there is a space between W and a number --
W 10,
 and column (field) names do not contain numbers. 
 
 for(i in 1:10)
 {
 vari - paste(W,i)
 }
 vari
 
 [1] W 10
 
 Now as i understand than i would call different columns to R with
 
 w-lit[[vari]]
 
 Or am i wrong again?
 
 Then I would probably need another loop to create the names of the
variables 
 on R, i.e. w1 to w10. Is that a general idea for the procedure?

Beware of such loops. Instead of littering your workspace with files/objects 
constructed by some paste(whatever, i) solution you can save results in list or 
data.frame or matrix and simply use basic subsetting procedures or 
lapply/sapply functions.

I must say I never used such paste(...) construction yet and I work with R for 
quite a long time.

Regards
Petr


 
 
 Thank for the help, m
 
 -Original Message-
 From: David Winsemius [mailto:dwinsem...@comcast.net]
 Sent: Wednesday, November 03, 2010 10:41 PM
 To: Matevž Pavlič
 Cc: r-help@r-project.org
 Subject: Re: [R] Loop
 
 
 On Nov 3, 2010, at 5:03 PM, Matevž Pavlič wrote:
 
  Hi,
 
  Thanks for the help and the manuals. Will come very handy i am sure.
 
  But regarding the code i don't hink this is what i wantbasically 
  i

  would like to repeat bellow code :
 
  w1-table(lit$W1)
  w1-as.data.frame(w1)
 
 It appears you are not reading for meaning. Burns has advised you how 
 to

 construct column names and use them in your initial steps. The `$`
function is
 quite limited in comparison to `[[` , so he was showing you a method
that 
 would be more effective.  BTW the as.data.frame step is unnecessary,
since the
 first thing write.table does is coerce an object to a data.frame. The 
 write.table name is misleading. It should be write.data.frame. You
cannot 
 really write tables with write.table.
 
 You would also use:
 
   file=paste(vari, csv, sep=.) as the file argument to write.table
 
  write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)
 
 What are these next actions supposed to do after the file is written? 
 Are you trying to store a group of related w objects that will later
be 
 indexed in sequence? If so, then a list would make more sense.
 
 --
 David.
 
  w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)
 
  20 times, where W1-20 (capital letters) are the fields in a 
  data.frame

  called lit and w1-20 are the data.frames being created.
 
  Hope that explains it better,
 
  m
 
  -Original Message-
  From: Patrick Burns [mailto:pbu...@pburns.seanet.com]
  Subject: Re: [R] Loop
 
  If I understand properly, you'll want something like:
 
  lit[[w2]]
 
  instead of
 
  lit$w2
 
  more accurately:
 
  for(i in 1:20) {
  vari - paste(w, i)
  lit[[vari]]
 
  ...
  }
 
  The two documents mentioned in my
  signature may help you.
 
  On 03/11/2010 20:23, Matevž Pavlič wrote:
  Hi all,
 
  I managed to do what i want (with the great help of thi mailing
  list)  manually . Now i would like to automate it. I would probably 
  need a for loop for to help me with this...but of course  I have no 
  idea how to do that in R.  Bellow is the code that i would like to 
  be

  replicated for a number of times (let say 20). I would like to 
  achieve  that w1 would change to w2, w3, w4 ... up to w20 and by 
  that

  create 20 data.frames that I would than bind together with cbind.
 
  (i did it like shown bellow -manually)
 
  w1-table(lit$W1)
  w1-as.data.frame(w1)
  write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)
  w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)
 
  w2-table(lit$W2)
 
  w2-as.data.frame(w2)
 
  write.table(w2,file=w2.csv,sep=;,row.names=T, dec=.)
 
  w2- w2[order(w2$Freq, decreasing=TRUE),]
 
  w2-head(w2, 20)
  .
  .
  .
 
  Thanks for the help,m
 
 
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

[R] removing indexing

2010-11-04 Thread Luis Ridao

R-help,

I was wondering how to remove indexing from an output, e.g.,

 aVector-1:10
 aVector
 [1]  1  2  3  4  5  6  7  8  9 10

 someFunction(aVector)
1  2  3  4  5  6  7  8  9 10


Thanks in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to do bootstrap for the complex sample design?

2010-11-04 Thread Tim Hesterberg

Faye wrote:
Our survey is structured as : To be investigated area is divided into
6 regions, within each region, one urban community and one rural
community are randomly selected, then samples are randomly drawn from
each selected uran and rural community.

The problems is that in urban/rural stratum, we only have one sample.
In this case, how to do bootstrap?

You are lucky that your sample size is 1.  If it were 2 you would
probably have proceeded without realizing that the answers were wrong.

Suppose you had two samples in each stratum.  If you proceed naturally,
drawing bootstrap samples of size 2 from each stratum, this would
underestimate variability by a factor of 2.

In general the ordinary nonparametric bootstrap estimates of variability
are biased downward by a factor of (n-1)/n -- exactly for the mean, 
approximately for other statistics.  In multiple-sample and stratified
situations, the bias depends on the stratum sizes.

Three remedies are:
* draw bootstrap samples of size n-1
* bootknife sampling - omit one observation (a jackknife sample), then
  draw a bootstrap sample of size n from that
* bootstrap from a kernel density estimate, with kernel covariance equal
  to empirical covariance (with divisor n-1) / n.
The latter two are described in 
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. 
Smoothing, Proceedings of the Section on Statistics and the Environment, 
American Statistical Association, 2924-2930.
http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

All three are undefined for samples of size 1.  You need to go to some
other bootstrap, e.g. a parametric bootstrap with variability estimated
from other data.

Tim Hesterberg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] removing indexing

2010-11-04 Thread Duncan Murdoch


On 04/11/2010 10:53 AM, Luis Ridao wrote:

R-help,

I was wondering how to remove indexing from an output, e.g.,

  aVector-1:10
  aVector
  [1]  1  2  3  4  5  6  7  8  9 10

  someFunction(aVector)
1  2  3  4  5  6  7  8  9 10



The cat() function gives you lots of flexibility in how things print.  
Your example is a little complicated, because you appear to want the 
spacing that print() gives without the indexing, which probably means 
using sprintf() or format().  If that's just a relic of cut and paste, then


cat(aVector, \n)

is fine.  (The \n is optional; it goes to a new line.)  If you really 
want the fancy spacing, something like


cat(format(aVector), \n)

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot output

2010-11-04 Thread Abhijit Dasgupta

The other way (in the same spirit as par(mfrow = ...) in base graphics) is to 
use the grid.arrange function in the gridExtra package. See it's documentation 
for examples.


On Nov 4, 2010, at 9:36 AM, ashz wrote:

 
 Dear All, 
 
 I have this script:
 
 dat - data.frame(Month = hstat$Date,C_avg = hstat$C.avg,C_stdev =
 hstat$C.stdev)
 ggplot(data = dat, aes(x = Month, y = C_avg, ymin = C_avg - C_stdev, ymax =
 C_avg + C_stdev)) +
  geom_point() +
  geom_line() +
  geom_errorbar()
 
 dat - data.frame(Month = hstat$Date,K_avg = hstat$K.avg,K_stdev =
 hstat$K.stdev)
 ggplot(data = dat, aes(x = Month, y = K_avg, ymin = K_avg - K_stdev, ymax =
 K_avg + K_stdev)) +
  geom_point() +
  geom_line() +
  geom_errorbar()
 
 dat - data.frame(Month = hstat$Date,S_avg = hstat$S.avg,S_stdev =
 hstat$S.stdev)
 ggplot(data = dat, aes(x = Month, y = S_avg, ymin = S_avg - S_stdev, ymax =
 S_avg + S_stdev)) +
  geom_point() +
  geom_line() +
  geom_errorbar()
 
 Running the script generates 3 separate graphs, how can I output them next
 to each other?  
 
 Thanks
 
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/ggplot-output-tp3027026p3027026.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] density() function: differences with S-PLUS

2010-11-04 Thread William Dunlap

I suspect that R's help(density) will tell about
the difference between its bw and width arguments.
In Splus help(density) says about width
width 
width of the window.
... 
   The standard error of a Gaussian window is width/4.
   For the other windows width is the width of the interval
   on which the window is non-zero. 
I believe R's bw argument is the standard deviation of
the density used for the kernel.  In R 'width' has the same meaning
as in S+.
 
The small difference between estimates when using the same
bandwidth is mainly due to S+ using a truncated gaussian kernel
(at 4 standard deviations out) and R not truncating the kernal.
Part of the difference is due to R using the Fourier transform to
do the convolution of the kernel and the data, while S+ uses
a direct approach.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 




From: mailingl...@sturaro.net [mailto:mailingl...@sturaro.net]
On Behalf Of Nicola Sturaro Sommacal (Quantide srl)
Sent: Thursday, November 04, 2010 2:36 AM
To: William Dunlap
Subject: Re: [R] density() function: differences with S-PLUS


Dear William, 
I obtained the same x values also without the from= and to=
argument, using bw instead width in R.

At this point I try to use a two step procedure for the y:
 - in the first step I obtained the x as below,
 - in the second step I used the minimum and the maximum values
for the x as from= and to= arguments.
In this way I obtain, in R, y values close to the S+ ones, but
not the same.

R code and S+ code and output are below.

Thanks again.
Nicola



# R CODE
exdata = iris$Sepal.Length[iris$Species == setosa]
density(exdata, bw = 4, n = 50, cut = 0.75)$x  # SAME AS S+
density(exdata, bw = 4, n = 50, cut = 0.75)$y  # COMPLETELY
DIFFERENT
density(exdata, width = 4, n = 50, from = 1.3, to = 8.8, cut =
0.75)$y  # CLOSE TO S+


# SPLUS CODE AND OUTPUT
 exdata = iris[, 1, 1]
 density(exdata, width = 4)
$x:
 [1] 1.30 1.453061 1.606122 1.759184 1.912245 2.065306
 [7] 2.218367 2.371429 2.524490 2.677551 2.830612 2.983673
[13] 3.136735 3.289796 3.442857 3.595918 3.748980 3.902041
[19] 4.055102 4.208163 4.361224 4.514286 4.667347 4.820408
[25] 4.973469 5.126531 5.279592 5.432653 5.585714 5.738776
[31] 5.891837 6.044898 6.197959 6.351020 6.504082 6.657143
[37] 6.810204 6.963265 7.116327 7.269388 7.422449 7.575510
[43] 7.728571 7.881633 8.034694 8.187755 8.340816 8.493878
[49] 8.646939 8.80

$y:
 [1] 0.0007849649 0.0013097474 0.0021225491 0.0033616520
 [5] 0.0052059615 0.0078856717 0.0116917555 0.0169685132
 [9] 0.0241073754 0.0335286785 0.0456521053 0.0608554862
[13] 0.0794235072 0.1014901241 0.1269807991 0.1555625999
[17] 0.1866111931 0.2192033788 0.2521417640 0.2840144993
[21] 0.3132881074 0.3384260582 0.3580208688 0.3709241384
[25] 0.3763578665 0.3739920600 0.3639778683 0.3469316232
[29] 0.3238721233 0.2961200278 0.2651731505 0.2325739601
[33] 0.1997853985 0.1680884651 0.1385105802 0.1117884914
[37] 0.0883644110 0.0684099972 0.0518702141 0.0385181792
[41] 0.0280126487 0.0199513951 0.0139159044 0.0095050745
[45] 0.0063575653 0.0041639082 0.0026680819 0.0016700727
[49] 0.0010169912 0.0005962089




2010/11/3 William Dunlap wdun...@tibco.com


Did you get my reply (1:31pm PST Tuesday)
to your request?  It showed how you needed
to use the from= and to= argument to density
to get identical x components to the output
and that the small differences in the y
component were due to S+ truncating the
gaussian kernel at +- 4 standard deviations
from the center while R does not truncate
the gaussian kernel (it output looks like it
uses a Fourier transform to do the convolution).



Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org

 [mailto:r-help-boun...@r-project.org] On Behalf Of
Nicola
 Sturaro Sommacal (Quantide srl)

 Sent: Wednesday, November 03, 2010 3:34 AM
 To: Joshua Wiley
 Cc: r-help@r-project.org
 Subject: Re: [R] density() function: differences with
S-PLUS

 Dear Joshua,

 first of all, thank you very much for reply. I hoped
that

Re: [R] Problems with points in plots when importing from pdf to an SVG editor

2010-11-04 Thread Erik Iverson



Just read the help page :).
This is under Note in the ?pdf.


 On some systems the default plotting character ‘pch = 1’ is
 displayed in some PDF viewers incorrectly as a ‘q’ character.
 (These seem to be viewers based on the ‘poppler’ PDF rendering
 library). This may be due to incorrect or incomplete mapping of
 font names to those used by the system.  Adding the following
 lines to ‘~/.fonts.conf’ or ‘/etc/fonts/local.conf’ may circumvent
 this problem.



 alias binding=same
familyZapfDingbats/family
acceptfamilyDingbats/family/accept
 /alias


I've found that in my case, this happens when viewing a PDF
with that plotting character under old versions of Evince, but
not newer.

--Erik

Rafael Björk wrote:

Dear R-users

When trying to import graphics from an pdf-file to a Vector graphics editor
(I use Inkscape, but i've confirmed the same problem on adobe products), all
points in the graphics turn out as qs.
This example displays the beaviour:

pdf(file=points are weird.pdf)
plot(1:5)
dev.off()

When importing the file to inkscape, I get five neatly arranged little qs.
The obvious workaround would be to change the points into another plotting
character, but this isn't the first time i've encountered this behaviour,
and it would be nice to solve it properly instead.
I realize this might not strictly be a question related to R, but if someone
who've encountered the problem has found a solution or workaround it would
be greatly appreciated.

Kind regards/
Rafael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-04 Thread Erik Iverson


Hello,

The best way to get help from people on the list is
for you to give us *reproducible* examples of exactly
what is you want.

Usually, you can come up with some sample data and code
that corresponds to your situation, and that we can run
directly by cutting and pasting from the email.

You can find more details about this in the posting
guide, linked to at the bottom of every email.

Best,
--Erik

Matevž Pavlič wrote:

Hi all,

I understand that you most of you this is a peice of cake but i am a
complete newbie in thisso any example would be greatly
aprpeciated and also any hint as how to get around in R. Frankly i
sometimes see the help files kinda confusing.

M

-Original Message- From: Petr PIKAL
[mailto:petr.pi...@precheza.cz] Sent: Thursday, November 04, 2010
3:40 PM To: Matevž Pavlič Cc: r-help@r-project.org Subject: Re: [R]
Loop

Hi

r-help-boun...@r-project.org napsal dne 04.11.2010 14:21:38:


Hi David,

I am still having troubles with that loop ...

This code gives me (kinda) the name of the column/field in a data
frame.

Filed

names are form W1-W10. But there is a space between W and a number
--

W 10,

and column (field) names do not contain numbers.


for(i in 1:10) { vari - paste(W,i) } vari

[1] W 10

Now as i understand than i would call different columns to R with


w-lit[[vari]]

Or am i wrong again?

Then I would probably need another loop to create the names of the

variables

on R, i.e. w1 to w10. Is that a general idea for the procedure?


Beware of such loops. Instead of littering your workspace with
files/objects constructed by some paste(whatever, i) solution you can
save results in list or data.frame or matrix and simply use basic
subsetting procedures or lapply/sapply functions.

I must say I never used such paste(...) construction yet and I work
with R for quite a long time.

Regards Petr




Thank for the help, m

-Original Message- From: David Winsemius
[mailto:dwinsem...@comcast.net] Sent: Wednesday, November 03, 2010
10:41 PM To: Matevž Pavlič Cc: r-help@r-project.org Subject: Re:
[R] Loop


On Nov 3, 2010, at 5:03 PM, Matevž Pavlič wrote:


Hi,

Thanks for the help and the manuals. Will come very handy i am
sure.

But regarding the code i don't hink this is what i
wantbasically i



would like to repeat bellow code :

w1-table(lit$W1) w1-as.data.frame(w1)

It appears you are not reading for meaning. Burns has advised you
how to



construct column names and use them in your initial steps. The `$`

function is

quite limited in comparison to `[[` , so he was showing you a
method

that

would be more effective.  BTW the as.data.frame step is
unnecessary,

since the

first thing write.table does is coerce an object to a data.frame.
The write.table name is misleading. It should be
write.data.frame. You

cannot

really write tables with write.table.

You would also use:

file=paste(vari, csv, sep=.) as the file argument to
write.table


write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)

What are these next actions supposed to do after the file is
written? Are you trying to store a group of related w objects
that will later

be

indexed in sequence? If so, then a list would make more sense.

-- David.


w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)

20 times, where W1-20 (capital letters) are the fields in a 
data.frame



called lit and w1-20 are the data.frames being created.

Hope that explains it better, m

-Original Message- From: Patrick Burns
[mailto:pbu...@pburns.seanet.com] Subject: Re: [R] Loop

If I understand properly, you'll want something like:

lit[[w2]]

instead of

lit$w2

more accurately:

for(i in 1:20) { vari - paste(w, i) lit[[vari]]

... }

The two documents mentioned in my signature may help you.

On 03/11/2010 20:23, Matevž Pavlič wrote:

Hi all,

I managed to do what i want (with the great help of thi mailing
 list)  manually . Now i would like to automate it. I would
probably need a for loop for to help me with this...but of
course  I have no idea how to do that in R.  Bellow is the code
that i would like to be



replicated for a number of times (let say 20). I would like to
 achieve  that w1 would change to w2, w3, w4 ... up to w20 and
by that



create 20 data.frames that I would than bind together with
cbind.

(i did it like shown bellow -manually)

w1-table(lit$W1) w1-as.data.frame(w1) 
write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.) w1-

w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)

w2-table(lit$W2)

w2-as.data.frame(w2)

write.table(w2,file=w2.csv,sep=;,row.names=T, dec=.)

w2- w2[order(w2$Freq, decreasing=TRUE),]

w2-head(w2, 20) . . .

Thanks for the help,m

David Winsemius, MD West Hartford, CT

__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting Strings to Variable names

2010-11-04 Thread Erik Iverson




Anand Bambhania wrote:

Hi all,

I am processing 24 samples data and combine them in single table called
CombinedSamples using following:

CombinedSamples-rbind(Sample1,Sample2,Sample3)


Please use reproducible examples.



Now variables Sample1, Sample2 and Sample3 have many different columns.


Then you can't 'rbind' them, correct?

From ?rbind:

 If there are several matrix arguments, they must all have the same
 number of columns (or rows) and this will be the number of columns
 (or rows) of the result.


To make it more flexible for other samples I'm replacing above code with a
for loop:

#Sample is a string vector containing all 24 sample names

for (k in 1:length(Sample))
{
  CombinedSamples-rbind(get(Sample[k]))
}

This code only stores last sample data as CombinedSample gets overwritten
every time. Using CombinedSamples[k] or CombinedSamples[k,] causes
dimension related errors as each Sample has several rows and not just 24. So
how can I assign data of all 24 samples to CombinedSamples?


I don't know since I'm unsure of the structure of these objects.

If they all have the same structure, I'd store them in a list and
do:

CombinedSamples - do.call(rbind, sampleList)

otherwise perhaps using

?Reduce and ?merge.  If you can provide a more complete example
to the list, please do. You need not resort to a for loop/get
hack for this.

Best,
--Erik





Thanks,

Anand

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] postForm() in RCurl and library RHTMLForms

2010-11-04 Thread Santosh Srinivas

I dont have the implementation in the way you want it . Sorry  but
someone here will definitely know

The group showed me to do it this way though .

library(zoo)
library(RCurl)

sNiftyURL =
http://nseindia.com/content/indices/histdata/SP%20CNX%20NIFTY01-01-2000-02
-11-2010.csv
Nifty_Dat = getURLContent(sNiftyURL, verbose = TRUE, useragent =
getOption(HTTPUserAgent))
tblNifty - read.csv(textConnection(Nifty_Dat))
tblNifty - subset(tblNifty,select=c(Date,Close))
tblNifty$Date - as.Date(tblNifty$Date, format =%d-%b-%Y)
tblNifty -read.zoo((tblNifty))
closeAllConnections()

HTH.
S

From: sayan dasgupta [mailto:kitt...@gmail.com] 
Sent: 04 November 2010 15:09
To: r-help@r-project.org
Cc: dun...@wald.ucdavis.edu; santosh.srini...@gmail.com
Subject: postForm() in RCurl and library RHTMLForms

Hi RUsers,

Suppose I want to see the data on the website 
url - http://www.nseindia.com/content/indices/ind_histvalues.htm;

for the index SP CNX NIFTY for
dates FromDate=01-11-2010,ToDate=02-11-2010

then read the html table from the page using readHTMLtable()

I am using this code 
webpage - postForm(url,.params=list(
                       FromDate=01-11-2010,
                       ToDate=02-11-2010,
                       IndexType=SP CNX NIFTY,
                       Indicesdata=Get Details),
                 .opts=list(useragent = getOption(HTTPUserAgent)))

But it doesn't give me desired result 

Also I was trying to use the function getHTMLFormDescription from the
package RHTMLForms but there we can't use the argument 
.opts=list(useragent = getOption(HTTPUserAgent)) which is needed for this
particular website 


Thanks and Regards
Sayan Dasgupta

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with points in plots when importing from pdf to an SVG editor

2010-11-04 Thread Rafael Björk

Hi Erik!

I googled and found that very helpful message on this adress:
http://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/pdf.html
But when i type ?pdf I get directed here:
http://127.0.0.1:29358/library/grDevices/html/pdf.html

which doesn't contain that information.

Thanks for the help

2010/11/4 Erik Iverson er...@ccbr.umn.edu


 Just read the help page :).
 This is under Note in the ?pdf.


 On some systems the default plotting character pch = 1 is
 displayed in some PDF viewers incorrectly as a q character.
 (These seem to be viewers based on the poppler PDF rendering
 library). This may be due to incorrect or incomplete mapping of
 font names to those used by the system.  Adding the following
 lines to ~/.fonts.conf or /etc/fonts/local.conf may circumvent
 this problem.



 alias binding=same
familyZapfDingbats/family
acceptfamilyDingbats/family/accept
 /alias


 I've found that in my case, this happens when viewing a PDF
 with that plotting character under old versions of Evince, but
 not newer.

 --Erik

 Rafael Björk wrote:

 Dear R-users

 When trying to import graphics from an pdf-file to a Vector graphics
 editor
 (I use Inkscape, but i've confirmed the same problem on adobe products),
 all
 points in the graphics turn out as qs.
 This example displays the beaviour:

 pdf(file=points are weird.pdf)
 plot(1:5)
 dev.off()

 When importing the file to inkscape, I get five neatly arranged little
 qs.
 The obvious workaround would be to change the points into another plotting
 character, but this isn't the first time i've encountered this behaviour,
 and it would be nice to solve it properly instead.
 I realize this might not strictly be a question related to R, but if
 someone
 who've encountered the problem has found a solution or workaround it would
 be greatly appreciated.

 Kind regards/
 Rafael

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting Strings to Variable names

2010-11-04 Thread Mike Rennie

Hi Anand,

Try creating a variable where you can store your data, and append it in your
loop. See added lines of code to include below...

On Thu, Nov 4, 2010 at 9:43 AM, Anand Bambhania amb1netwo...@gmail.comwrote:

 Hi all,

 I am processing 24 samples data and combine them in single table called
 CombinedSamples using following:

 CombinedSamples-rbind(Sample1,Sample2,Sample3)

 Now variables Sample1, Sample2 and Sample3 have many different columns.

 To make it more flexible for other samples I'm replacing above code with a
 for loop:

 #Sample is a string vector containing all 24 sample names


#create a variable to stick your results

res- NULL


 for (k in 1:length(Sample))
 {
  CombinedSamples-rbind(get(Sample[k]))

  res-c(res, CombinedSamples)

 }

 Now, every iteration of your loop should append CombinedSamples to res, and
you won't overwrite your results every time.

HTH,

Mike



 This code only stores last sample data as CombinedSample gets overwritten
 every time. Using CombinedSamples[k] or CombinedSamples[k,] causes
 dimension related errors as each Sample has several rows and not just 24.
 So
 how can I assign data of all 24 samples to CombinedSamples?

 Thanks,

 Anand

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting data from one column with strings

2010-11-04 Thread Mike Rennie

(apologies for any double hits; forgot to reply all...)

Or, you could just go back to basics, and write yourself a general loop that
goes through whatever levels of a variable and gives you back whatever
statistics you want... below is an example where you estimate means for each
level, but you could estimate any number of statistical parameters...

dat-data.frame(c(rep(A,5), rep(B,5),rep(C,5)),c(1:15))
results-NULL
for(i in levels(dat[,1]))
  {
  sub.dat-subset(dat, dat[,1]==i)
  res-mean(sub.dat[,2])
  results-c(results,i,res)
  }
results.mat-matrix(results, ncol=2, byrow=TRUE)
results.mat


HTH,

Mike

On Thu, Nov 4, 2010 at 7:28 AM, Ramsvatn Silje silje.ramsv...@uit.nowrote:


 Hello,

 I have tried to find this out some other way, but unsuccessful I have to
 try this list.
 I assume this should be quite simple.

 I have a dataset with 4 columns, Sample_no, Species, Nitrogen,
 Carbon in csv format. In the species column I have many different
 species with varying number of obs per species

 Eg

 Sample_no Species   Nitrogen  Carbon
 1   Cod 15.2-19.0
 2   Haddock 14.8-20.2
 3   Cod 15.6-18.5
 4   Cod 13.2-20.1
 5   Haddock 14.3-18.8
 Etc..

 And I want to calculate, mean, standard dev etc per species for the
 observations Nitrogen and Carbon. And later do plots and stats with
 the different species. I will in the end have many species, so need it to
 be automatic I can't enter code for every species separate.

 Can anyone help me with this? Or if this is the wrong list to sendt this
 question to, where do I send it?

 Thank you very much in advance.


 Best regards

 Silje Ramsvatn

 PhD-candidate
 University of Tromsø
 Norway

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to work with long vectors

2010-11-04 Thread Changbin Du

HI, Dear R community,

I have one data set like this,  What I want to do is to calculate the
cumulative coverage. The following codes works for small data set (#rows =
100), but when feed the whole data set,  it still running after 24 hours.
Can someone give some suggestions for long vector?

idreads
Contig79:14
Contig79:28
Contig79:313
Contig79:414
Contig79:517
Contig79:620
Contig79:725
Contig79:827
Contig79:932
Contig79:1033
Contig79:1134

matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
sep=\t, skip=0, header=F,fill=T) #
dim(matt)
[1] 3384766   2

matt_plot-function(matt, outputfile) {
names(matt)-c(id,reads)

 cover-matt$reads


#calculate the cumulative coverage.
+ cover_per-function (data) {
+ output-numeric(0)
+ for (i in data) {
+   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
+   output-c(output, x)
+ }
+ return(output)
+ }


 result-cover_per(cover)


Thanks so much!


-- 
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Matrix Manipulation

2010-11-04 Thread emj83


Hi,

Is there a quick way to go from this matrix:
 A
 [,1] [,2] [,3]
[1,]111
[2,]222
[3,]333
[4,]444
[5,]5   NA5
[6,]   NA   NA6
[7,]   NA   NA   NA

to this matrix:
 B
 [,1] [,2] [,3]
[1,]1   NA   NA
[2,]2   NA1
[3,]312
[4,]423
[5,]534
[6,]   NA45
[7,]   NA   NA6

without using a loop? 
For example using a vector which describes how many NA's are required from
the top of the matrix- so in this case it would be c(0,2,1).

Many thanks Emma


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Matrix-Manipulation-tp3027266p3027266.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sorting data from one column with strings

2010-11-04 Thread jim holtman

try sqldf:

 x
  Sample_no Species Nitrogen Carbon
1 1 Cod 15.2  -19.0
2 2 Haddock 14.8  -20.2
3 3 Cod 15.6  -18.5
4 4 Cod 13.2  -20.1
5 5 Haddock 14.3  -18.8
 require(sqldf)
 sqldf(select Species, avg(Nitrogen) Nitrogen, avg(Carbon) Carbon from x 
 group by Species)
  Species Nitrogen Carbon
1 Cod 14.7  -19.2
2 Haddock 14.55000  -19.5


On Thu, Nov 4, 2010 at 8:28 AM, Ramsvatn Silje silje.ramsv...@uit.no wrote:

 Hello,

 I have tried to find this out some other way, but unsuccessful I have to
 try this list.
 I assume this should be quite simple.

 I have a dataset with 4 columns, Sample_no, Species, Nitrogen,
 Carbon in csv format. In the species column I have many different
 species with varying number of obs per species

 Eg

 Sample_no     Species       Nitrogen      Carbon
 1               Cod             15.2            -19.0
 2               Haddock 14.8            -20.2
 3               Cod             15.6            -18.5
 4               Cod             13.2            -20.1
 5               Haddock 14.3            -18.8
 Etc..

 And I want to calculate, mean, standard dev etc per species for the
 observations Nitrogen and Carbon. And later do plots and stats with
 the different species. I will in the end have many species, so need it to
 be automatic I can't enter code for every species separate.

 Can anyone help me with this? Or if this is the wrong list to sendt this
 question to, where do I send it?

 Thank you very much in advance.


 Best regards

 Silje Ramsvatn

 PhD-candidate
 University of Tromsø
 Norway

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-04 Thread Petr PIKAL

Hi

r-help-boun...@r-project.org napsal dne 04.11.2010 15:49:31:

 Hi all, 
 
 I understand that you most of you this is a peice of cake but i am a 
complete 
 newbie in thisso any example would be greatly aprpeciated and also 
any 
 hint as how to get around in R. Frankly i sometimes see the help files 
kinda confusing.

OK. Instead of 
   w1-table(lit$W1)
   w1-as.data.frame(w1)
   write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)
   w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)

Suppose you have data frame or matrix, and you want to have 5 most common 
values from each column

# prepare matrix
x-sample(1:20, 100, replace=T)
mat-matrix(x, ncol=10)

#apply user defined function for each column
apply(mat, 2, function(x) head(sort(table(x), decreasing=T),5))
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 5091 5135 5174 5133 5199 5097 5165 5157 5134  5068
[2,] 5073 5111 5143 5064 5113 5078 5102 5157 5131  5065
[3,] 5058 5092 5115 5051 5079 5064 5088 5128 5076  5063
[4,] 5056 5073 5114 5047 5059 5044 5037 5064 5071  5063
[5,] 5047 5064 5072 5041 5057 5041 5035 5058 5032  5061

If you want to do it in loop (can be quicker sometimes) and save it to 
list make a list

lll-vector(list, 10)

and fill it with your results

for (i in 1:10) lll[[i]]-head(sort(table(mat[,i]), decreasing=T),5)

and now you can call values from this lll list simply by

lll[5]
[[1]]

   9   15   136   16 
5199 5113 5079 5059 5057 

lll[[5]]

   9   15   136   16 
5199 5113 5079 5059 5057

or even

lll[[5]][3]
  13 
5079

without need for writing to individual files pasting together letters and 
numbers etc. 

There shall be R-intro document in your installation and it is worth 
reading. It is not so big, you can manage it in less then month if you 
complete more than 3 pages per day.

Regards
Petr



 
 M
 
 -Original Message-
 From: Petr PIKAL [mailto:petr.pi...@precheza.cz] 
 Sent: Thursday, November 04, 2010 3:40 PM
 To: Matevž Pavlič
 Cc: r-help@r-project.org
 Subject: Re: [R] Loop
 
 Hi
 
 r-help-boun...@r-project.org napsal dne 04.11.2010 14:21:38:
 
  Hi David,
  
  I am still having troubles with that loop ...
  
  This code gives me (kinda) the name of the column/field in a data 
frame. 
 Filed
  names are form W1-W10. But there is a space between W and a number --
 W 10,
  and column (field) names do not contain numbers. 
  
  for(i in 1:10)
  {
  vari - paste(W,i)
  }
  vari
  
  [1] W 10
  
  Now as i understand than i would call different columns to R with
  
  w-lit[[vari]]
  
  Or am i wrong again?
  
  Then I would probably need another loop to create the names of the
 variables 
  on R, i.e. w1 to w10. Is that a general idea for the procedure?
 
 Beware of such loops. Instead of littering your workspace with 
files/objects 
 constructed by some paste(whatever, i) solution you can save results in 
list 
 or data.frame or matrix and simply use basic subsetting procedures or 
lapply/
 sapply functions.
 
 I must say I never used such paste(...) construction yet and I work with 
R for
 quite a long time.
 
 Regards
 Petr
 
 
  
  
  Thank for the help, m
  
  -Original Message-
  From: David Winsemius [mailto:dwinsem...@comcast.net]
  Sent: Wednesday, November 03, 2010 10:41 PM
  To: Matevž Pavlič
  Cc: r-help@r-project.org
  Subject: Re: [R] Loop
  
  
  On Nov 3, 2010, at 5:03 PM, Matevž Pavlič wrote:
  
   Hi,
  
   Thanks for the help and the manuals. Will come very handy i am sure.
  
   But regarding the code i don't hink this is what i wantbasically 

   i
 
   would like to repeat bellow code :
  
   w1-table(lit$W1)
   w1-as.data.frame(w1)
  
  It appears you are not reading for meaning. Burns has advised you how 
  to
 
  construct column names and use them in your initial steps. The `$`
 function is
  quite limited in comparison to `[[` , so he was showing you a method
 that 
  would be more effective.  BTW the as.data.frame step is unnecessary,
 since the
  first thing write.table does is coerce an object to a data.frame. The 
  write.table name is misleading. It should be write.data.frame. You
 cannot 
  really write tables with write.table.
  
  You would also use:
  
file=paste(vari, csv, sep=.) as the file argument to write.table
  
   write.table(w1,file=w1.csv,sep=;,row.names=T, dec=.)
  
  What are these next actions supposed to do after the file is written? 
  Are you trying to store a group of related w objects that will later
 be 
  indexed in sequence? If so, then a list would make more sense.
  
  --
  David.
  
   w1- w1[order(w1$Freq, decreasing=TRUE),] w1-head(w1, 20)
  
   20 times, where W1-20 (capital letters) are the fields in a 
   data.frame
 
   called lit and w1-20 are the data.frames being created.
  
   Hope that explains it better,
  
   m
  
   -Original Message-
   From: Patrick Burns [mailto:pbu...@pburns.seanet.com]
   Subject: Re: [R] Loop
  
   If I understand properly, you'll want something like:

[R] Plotting a grid of directly specified colours

2010-11-04 Thread Peter Davenport

Dear R-help,

Could any of you direct me to a function for plotting a grid of colours,
directly specified by a matrix of hex colour codes?  In other words I'm
looking for a heatmap() or image()-like function to which I can specify the
colour of each grid location directly, rather than providing a numerical
matrix and a 1D-colour scale (heatmap, image, levelplots,NeatMap...). I'm
surprised I haven't found anything simple with RSiteSearch,
help.search, net.

I'd like to use this function to encode one variable as chroma and a
second as luminance (hcl colour space), so that the two variables can be
visualised in a single heatmap (the variable are fold-change and a q-value,
a significance measure). If anyone has any thoughts/warnings to offer re
this idea then I'd love to hear them (it must have been tried before, but
I've not come across any examples) .

Best wishes and thank you,

Peter Davenport

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix Manipulation

2010-11-04 Thread jim holtman

try this:

 x
 V2 V3 V4
[1,]  1  1  1
[2,]  2  2  2
[3,]  3  3  3
[4,]  4  4  4
[5,]  5 NA  5
[6,] NA NA  6
[7,] NA NA NA
 offset - c(0,2,1)
 # add the control to the data and make two copies so we can offset
 x.new - rbind(offset, x, x)
 result - apply(x.new, 2, function(.col){
+ .col[seq(nrow(x) - .col[1L] + 2L, length = nrow(x))]
+ })
 result
 V2 V3 V4
  1 NA NA
  2 NA  1
  3  1  2
  4  2  3
  5  3  4
 NA  4  5
 NA NA  6


On Thu, Nov 4, 2010 at 11:47 AM, emj83 stp08...@shef.ac.uk wrote:

 Hi,

 Is there a quick way to go from this matrix:
 A
     [,1] [,2] [,3]
 [1,]    1    1    1
 [2,]    2    2    2
 [3,]    3    3    3
 [4,]    4    4    4
 [5,]    5   NA    5
 [6,]   NA   NA    6
 [7,]   NA   NA   NA

 to this matrix:
 B
     [,1] [,2] [,3]
 [1,]    1   NA   NA
 [2,]    2   NA    1
 [3,]    3    1    2
 [4,]    4    2    3
 [5,]    5    3    4
 [6,]   NA    4    5
 [7,]   NA   NA    6

 without using a loop?
 For example using a vector which describes how many NA's are required from
 the top of the matrix- so in this case it would be c(0,2,1).

 Many thanks Emma


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Matrix-Manipulation-tp3027266p3027266.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix Manipulation

2010-11-04 Thread emj83


Many thanks-its worked a treat :-)

Emma
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Matrix-Manipulation-tp3027266p3027307.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to work with long vectors

2010-11-04 Thread jim holtman

Is this what you want:

 x
id reads
1   Contig79:1 4
2   Contig79:2 8
3   Contig79:313
4   Contig79:414
5   Contig79:517
6   Contig79:620
7   Contig79:725
8   Contig79:827
9   Contig79:932
10 Contig79:1033
11 Contig79:1134
 x$percent - x$reads / max(x$reads) * 100
 x
id reads   percent
1   Contig79:1 4  11.76471
2   Contig79:2 8  23.52941
3   Contig79:313  38.23529
4   Contig79:414  41.17647
5   Contig79:517  50.0
6   Contig79:620  58.82353
7   Contig79:725  73.52941
8   Contig79:827  79.41176
9   Contig79:932  94.11765
10 Contig79:1033  97.05882
11 Contig79:1134 100.0


On Thu, Nov 4, 2010 at 11:46 AM, Changbin Du changb...@gmail.com wrote:
 HI, Dear R community,

 I have one data set like this,  What I want to do is to calculate the
 cumulative coverage. The following codes works for small data set (#rows =
 100), but when feed the whole data set,  it still running after 24 hours.
 Can someone give some suggestions for long vector?

 id                reads
 Contig79:1    4
 Contig79:2    8
 Contig79:3    13
 Contig79:4    14
 Contig79:5    17
 Contig79:6    20
 Contig79:7    25
 Contig79:8    27
 Contig79:9    32
 Contig79:10    33
 Contig79:11    34

 matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
 sep=\t, skip=0, header=F,fill=T) #
 dim(matt)
 [1] 3384766       2

 matt_plot-function(matt, outputfile) {
 names(matt)-c(id,reads)

  cover-matt$reads


 #calculate the cumulative coverage.
 + cover_per-function (data) {
 + output-numeric(0)
 + for (i in data) {
 +           x-(100*sum(ifelse(data = i, 1, 0))/length(data))
 +           output-c(output, x)
 +                 }
 + return(output)
 + }


  result-cover_per(cover)


 Thanks so much!


 --
 Sincerely,
 Changbin
 --

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to work with long vectors

2010-11-04 Thread Henrique Dallazuanna

Try this:

rev(100 * cumsum(matt$reads  1) / length(matt$reads) )

On Thu, Nov 4, 2010 at 1:46 PM, Changbin Du changb...@gmail.com wrote:

 HI, Dear R community,

 I have one data set like this,  What I want to do is to calculate the
 cumulative coverage. The following codes works for small data set (#rows =
 100), but when feed the whole data set,  it still running after 24 hours.
 Can someone give some suggestions for long vector?

 idreads
 Contig79:14
 Contig79:28
 Contig79:313
 Contig79:414
 Contig79:517
 Contig79:620
 Contig79:725
 Contig79:827
 Contig79:932
 Contig79:1033
 Contig79:1134


 matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
 sep=\t, skip=0, header=F,fill=T) #
 dim(matt)
 [1] 3384766   2

 matt_plot-function(matt, outputfile) {
 names(matt)-c(id,reads)

  cover-matt$reads


 #calculate the cumulative coverage.
 + cover_per-function (data) {
 + output-numeric(0)
 + for (i in data) {
 +   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
 +   output-c(output, x)
 + }
 + return(output)
 + }


  result-cover_per(cover)


 Thanks so much!


 --
 Sincerely,
 Changbin
 --

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting a grid of directly specified colours

2010-11-04 Thread baptiste Auguié

Hi,

try this,

library(grid)
grid.raster(matrix(colors(),ncol=50),interp=F)

HTH,

baptiste


On Nov 4, 2010, at 5:00 PM, Peter Davenport wrote:

 Dear R-help,
 
 Could any of you direct me to a function for plotting a grid of colours,
 directly specified by a matrix of hex colour codes?  In other words I'm
 looking for a heatmap() or image()-like function to which I can specify the
 colour of each grid location directly, rather than providing a numerical
 matrix and a 1D-colour scale (heatmap, image, levelplots,NeatMap...). I'm
 surprised I haven't found anything simple with RSiteSearch,
 help.search, net.
 
 I'd like to use this function to encode one variable as chroma and a
 second as luminance (hcl colour space), so that the two variables can be
 visualised in a single heatmap (the variable are fold-change and a q-value,
 a significance measure). If anyone has any thoughts/warnings to offer re
 this idea then I'd love to hear them (it must have been tried before, but
 I've not come across any examples) .
 
 Best wishes and thank you,
 
 Peter Davenport
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to work with long vectors

2010-11-04 Thread Changbin Du

Thanks, Jim!

This is not what I want,  What I want is calculate the percentage of reads
bigger or equal to that reads in each position.MY output is like the
following:
for row 1, all the reads is = 4, so the cover_per is 100,
for row 2, 99 % reads =4, so the cover_per is 99.
 head(final)
  cover_per reads
1   100 4
299 8
39813
49714
59617
69520

I attached the input file with this email. This file is only 100 rows, very
small. MY original data set is 3384766 rows.

 
matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
sep=\t, skip=0, header=F,fill=T) #
 dim(matt)
[1] 3384766   2

Thanks so much for your time!

 matt-read.table(/home/cdu/operon/dimer5_0623/matt_test.txt, sep=\t,
skip=0, header=F,fill=T) #
 names(matt)-c(id,reads)
 dim(matt)
[1] 100   2
 cover-matt$reads


 #calculate the cumulative coverage.
 cover_per-function (data) {
+ output-numeric(0)
+ for (i in data) {
+   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
+   output-c(output, x)
+ }
+ return(output)
+ }


 result-cover_per(cover)
 head(result)
[1] 100  99  98  97  96  95

 final-data.frame(result, cover)

 names(final)-c(cover_per, reads)
 head(final)
  cover_per reads
1   100 4
299 8
39813
49714
59617
69520





On Thu, Nov 4, 2010 at 9:18 AM, jim holtman jholt...@gmail.com wrote:

 Is this what you want:

  x
id reads
 1   Contig79:1 4
 2   Contig79:2 8
 3   Contig79:313
 4   Contig79:414
 5   Contig79:517
 6   Contig79:620
 7   Contig79:725
 8   Contig79:827
 9   Contig79:932
 10 Contig79:1033
 11 Contig79:1134
  x$percent - x$reads / max(x$reads) * 100
  x
id reads   percent
 1   Contig79:1 4  11.76471
 2   Contig79:2 8  23.52941
 3   Contig79:313  38.23529
 4   Contig79:414  41.17647
 5   Contig79:517  50.0
 6   Contig79:620  58.82353
 7   Contig79:725  73.52941
 8   Contig79:827  79.41176
 9   Contig79:932  94.11765
 10 Contig79:1033  97.05882
 11 Contig79:1134 100.0


 On Thu, Nov 4, 2010 at 11:46 AM, Changbin Du changb...@gmail.com wrote:
  HI, Dear R community,
 
  I have one data set like this,  What I want to do is to calculate the
  cumulative coverage. The following codes works for small data set (#rows
 =
  100), but when feed the whole data set,  it still running after 24 hours.
  Can someone give some suggestions for long vector?
 
  idreads
  Contig79:14
  Contig79:28
  Contig79:313
  Contig79:414
  Contig79:517
  Contig79:620
  Contig79:725
  Contig79:827
  Contig79:932
  Contig79:1033
  Contig79:1134
 
 
 matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
  sep=\t, skip=0, header=F,fill=T) #
  dim(matt)
  [1] 3384766   2
 
  matt_plot-function(matt, outputfile) {
  names(matt)-c(id,reads)
 
   cover-matt$reads
 
 
  #calculate the cumulative coverage.
  + cover_per-function (data) {
  + output-numeric(0)
  + for (i in data) {
  +   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
  +   output-c(output, x)
  + }
  + return(output)
  + }
 
 
   result-cover_per(cover)
 
 
  Thanks so much!
 
 
  --
  Sincerely,
  Changbin
  --
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?




-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856
Contig79:1  4
Contig79:2  8
Contig79:3  13
Contig79:4  14
Contig79:5  17
Contig79:6  20
Contig79:7  25
Contig79:8  27
Contig79:9  32
Contig79:10 33
Contig79:11 34
Contig79:12 36
Contig79:13 39
Contig79:14 40
Contig79:15 44
Contig79:16 49
Contig79:17 55
Contig79:18 56
Contig79:19 59
Contig79:20 60
Contig79:21 62
Contig79:22 64
Contig79:23 64
Contig79:24 68
Contig79:25 68
Contig79:26 68
Contig79:27 70
Contig79:28 73
Contig79:29 76
Contig79:30 77
Contig79:31 78
Contig79:32 78
Contig79:33 79
Contig79:34 80
Contig79:35 80
Contig79:36 84
Contig79:37 87
Contig79:38 87
Contig79:39 88
Contig79:40 88
Contig79:41 89

Re: [R] Plotting a grid of directly specified colours

2010-11-04 Thread Barry Rowlingson

On Thu, Nov 4, 2010 at 4:00 PM, Peter Davenport pwdavenp...@gmail.com wrote:
 Dear R-help,

 Could any of you direct me to a function for plotting a grid of colours,
 directly specified by a matrix of hex colour codes?  In other words I'm
 looking for a heatmap() or image()-like function to which I can specify the
 colour of each grid location directly, rather than providing a numerical
 matrix and a 1D-colour scale (heatmap, image, levelplots,NeatMap...). I'm
 surprised I haven't found anything simple with RSiteSearch,
 help.search, net.

 I'd like to use this function to encode one variable as chroma and a
 second as luminance (hcl colour space), so that the two variables can be
 visualised in a single heatmap (the variable are fold-change and a q-value,
 a significance measure). If anyone has any thoughts/warnings to offer re
 this idea then I'd love to hear them (it must have been tried before, but
 I've not come across any examples) .

 I've kludged this kind of thing in the past. Create a matrix of
1:(nrow*ncol), and specify the col as your colour matrix.

Example:

m = matrix(c(red,green,blue,yellow,orange,black),2,3)
mc = matrix(1:(nrow(m)*ncol(m)),nrow(m),ncol(m))
image(mc,col=m)

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to work with long vectors

2010-11-04 Thread Changbin Du

HI, Henrique,

Thanks for the great help!

I compared the output from your codes:
 te-rev(100 * cumsum(matt$reads  1) / length(matt$reads) )
 te
  [1] 100  99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84
83
 [19]  82  81  80  79  78  77  76  75  74  73  72  71  70  69  68  67  66
65
 [37]  64  63  62  61  60  59  58  57  56  55  54  53  52  51  50  49  48
47
 [55]  46  45  44  43  42  41  40  39  38  37  36  35  34  33  32  31  30
29
 [73]  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12
11
 [91]  10   9   8   7   6   5   4   3   2   1

 the output from my code,
 result
  [1] 100  99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84
83
 [19]  82  81  80  79  79  77  77  77  74  73  72  71  70  70  68  67  67
65
 [37]  64  64  62  62  60  59  58  57  56  56  54  53  52  51  51  49  48
47
 [55]  46  45  45  43  42  41  40  39  38  37  36  35  34  33  32  31  30
29
 [73]  28  27  27  27  24  24  22  21  20  19  19  19  19  15  14  14  12
11
 [91]  10   9   8   7   7   5   4   3   2   1

There is no tie in your output. Look at the data set: There are ties in the
data set. Your codes work fast, but I think the results is not accurate.
Thanks so much for the great help!

 matt[c(1:35), ]
id reads
1   Contig79:1 4
2   Contig79:2 8
;
;
22 Contig79:2264
23 Contig79:2364
24 Contig79:2468
25 Contig79:2568
26 Contig79:2668

I also attached the testing file with this email. Thanks!



On Thu, Nov 4, 2010 at 9:12 AM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this:

 rev(100 * cumsum(matt$reads  1) / length(matt$reads) )

 On Thu, Nov 4, 2010 at 1:46 PM, Changbin Du changb...@gmail.com wrote:

 HI, Dear R community,

 I have one data set like this,  What I want to do is to calculate the
 cumulative coverage. The following codes works for small data set (#rows =
 100), but when feed the whole data set,  it still running after 24 hours.
 Can someone give some suggestions for long vector?

 idreads
 Contig79:14
 Contig79:28
 Contig79:313
 Contig79:414
 Contig79:517
 Contig79:620
 Contig79:725
 Contig79:827
 Contig79:932
 Contig79:1033
 Contig79:1134


 matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
 sep=\t, skip=0, header=F,fill=T) #
 dim(matt)
 [1] 3384766   2

 matt_plot-function(matt, outputfile) {
 names(matt)-c(id,reads)

  cover-matt$reads


 #calculate the cumulative coverage.
 + cover_per-function (data) {
 + output-numeric(0)
 + for (i in data) {
 +   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
 +   output-c(output, x)
 + }
 + return(output)
 + }


  result-cover_per(cover)


 Thanks so much!


 --
 Sincerely,
 Changbin
 --

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O




-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856
Contig79:1  4
Contig79:2  8
Contig79:3  13
Contig79:4  14
Contig79:5  17
Contig79:6  20
Contig79:7  25
Contig79:8  27
Contig79:9  32
Contig79:10 33
Contig79:11 34
Contig79:12 36
Contig79:13 39
Contig79:14 40
Contig79:15 44
Contig79:16 49
Contig79:17 55
Contig79:18 56
Contig79:19 59
Contig79:20 60
Contig79:21 62
Contig79:22 64
Contig79:23 64
Contig79:24 68
Contig79:25 68
Contig79:26 68
Contig79:27 70
Contig79:28 73
Contig79:29 76
Contig79:30 77
Contig79:31 78
Contig79:32 78
Contig79:33 79
Contig79:34 80
Contig79:35 80
Contig79:36 84
Contig79:37 87
Contig79:38 87
Contig79:39 88
Contig79:40 88
Contig79:41 89
Contig79:42 93
Contig79:43 94
Contig79:44 98
Contig79:45 99
Contig79:46 99
Contig79:47 102
Contig79:48 103
Contig79:49 108
Contig79:50 112
Contig79:51 112
Contig79:52 113
Contig79:53 116
Contig79:54 118
Contig79:55 120
Contig79:56 124
Contig79:57 124
Contig79:58 126
Contig79:59 128
Contig79:60 130
Contig79:61 133
Contig79:62 134
Contig79:63 136
Contig79:64 139
Contig79:65 144
Contig79:66 145
Contig79:67 146
Contig79:68 148
Contig79:69 149
Contig79:70 151
Contig79:71 156
Contig79:72 157
Contig79:73 158
Contig79:74 159
Contig79:75 159
Contig79:76 159
Contig79:77 160
Contig79:78 160
Contig79:79 161
Contig79:80 163

Re: [R] how to work with long vectors

2010-11-04 Thread Martin Morgan

On 11/04/2010 09:45 AM, Changbin Du wrote:
 Thanks, Jim!
 
 This is not what I want,  What I want is calculate the percentage of reads
 bigger or equal to that reads in each position.MY output is like the
 following:

Hi Changbin -- I might be repeating myself, but the Bioconductor
packages IRanges and GenomicRanges are designed to work with this sort
of data, and include 'coverage' functions that do what you're interested
in. Look into ?GRanges if interested.


http://bioconductor.org/help/bioc-views/release/BiocViews.html#___HighThroughputSequencing

Martin

 for row 1, all the reads is = 4, so the cover_per is 100,
 for row 2, 99 % reads =4, so the cover_per is 99.
 head(final)
   cover_per reads
 1   100 4
 299 8
 39813
 49714
 59617
 69520
 
 I attached the input file with this email. This file is only 100 rows, very
 small. MY original data set is 3384766 rows.
 
  
 matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
 sep=\t, skip=0, header=F,fill=T) #
 dim(matt)
 [1] 3384766   2
 
 Thanks so much for your time!
 
 matt-read.table(/home/cdu/operon/dimer5_0623/matt_test.txt, sep=\t,
 skip=0, header=F,fill=T) #
 names(matt)-c(id,reads)
 dim(matt)
 [1] 100   2
 cover-matt$reads


 #calculate the cumulative coverage.
 cover_per-function (data) {
 + output-numeric(0)
 + for (i in data) {
 +   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
 +   output-c(output, x)
 + }
 + return(output)
 + }


 result-cover_per(cover)
 head(result)
 [1] 100  99  98  97  96  95

 final-data.frame(result, cover)

 names(final)-c(cover_per, reads)
 head(final)
   cover_per reads
 1   100 4
 299 8
 39813
 49714
 59617
 69520
 
 
 
 
 
 On Thu, Nov 4, 2010 at 9:18 AM, jim holtman jholt...@gmail.com wrote:
 
 Is this what you want:

 x
id reads
 1   Contig79:1 4
 2   Contig79:2 8
 3   Contig79:313
 4   Contig79:414
 5   Contig79:517
 6   Contig79:620
 7   Contig79:725
 8   Contig79:827
 9   Contig79:932
 10 Contig79:1033
 11 Contig79:1134
 x$percent - x$reads / max(x$reads) * 100
 x
id reads   percent
 1   Contig79:1 4  11.76471
 2   Contig79:2 8  23.52941
 3   Contig79:313  38.23529
 4   Contig79:414  41.17647
 5   Contig79:517  50.0
 6   Contig79:620  58.82353
 7   Contig79:725  73.52941
 8   Contig79:827  79.41176
 9   Contig79:932  94.11765
 10 Contig79:1033  97.05882
 11 Contig79:1134 100.0


 On Thu, Nov 4, 2010 at 11:46 AM, Changbin Du changb...@gmail.com wrote:
 HI, Dear R community,

 I have one data set like this,  What I want to do is to calculate the
 cumulative coverage. The following codes works for small data set (#rows
 =
 100), but when feed the whole data set,  it still running after 24 hours.
 Can someone give some suggestions for long vector?

 idreads
 Contig79:14
 Contig79:28
 Contig79:313
 Contig79:414
 Contig79:517
 Contig79:620
 Contig79:725
 Contig79:827
 Contig79:932
 Contig79:1033
 Contig79:1134


 matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
 sep=\t, skip=0, header=F,fill=T) #
 dim(matt)
 [1] 3384766   2

 matt_plot-function(matt, outputfile) {
 names(matt)-c(id,reads)

  cover-matt$reads


 #calculate the cumulative coverage.
 + cover_per-function (data) {
 + output-numeric(0)
 + for (i in data) {
 +   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
 +   output-c(output, x)
 + }
 + return(output)
 + }


  result-cover_per(cover)


 Thanks so much!


 --
 Sincerely,
 Changbin
 --

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list

Re: [R] how to work with long vectors

2010-11-04 Thread Changbin Du

Thanks Martin, I will try this.

On Thu, Nov 4, 2010 at 10:06 AM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 11/04/2010 09:45 AM, Changbin Du wrote:
  Thanks, Jim!
 
  This is not what I want,  What I want is calculate the percentage of
 reads
  bigger or equal to that reads in each position.MY output is like the
  following:

 Hi Changbin -- I might be repeating myself, but the Bioconductor
 packages IRanges and GenomicRanges are designed to work with this sort
 of data, and include 'coverage' functions that do what you're interested
 in. Look into ?GRanges if interested.



 http://bioconductor.org/help/bioc-views/release/BiocViews.html#___HighThroughputSequencing

 Martin

  for row 1, all the reads is = 4, so the cover_per is 100,
  for row 2, 99 % reads =4, so the cover_per is 99.
  head(final)
cover_per reads
  1   100 4
  299 8
  39813
  49714
  59617
  69520
 
  I attached the input file with this email. This file is only 100 rows,
 very
  small. MY original data set is 3384766 rows.
 
   
 
 matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
  sep=\t, skip=0, header=F,fill=T) #
  dim(matt)
  [1] 3384766   2
 
  Thanks so much for your time!
 
  matt-read.table(/home/cdu/operon/dimer5_0623/matt_test.txt, sep=\t,
  skip=0, header=F,fill=T) #
  names(matt)-c(id,reads)
  dim(matt)
  [1] 100   2
  cover-matt$reads
 
 
  #calculate the cumulative coverage.
  cover_per-function (data) {
  + output-numeric(0)
  + for (i in data) {
  +   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
  +   output-c(output, x)
  + }
  + return(output)
  + }
 
 
  result-cover_per(cover)
  head(result)
  [1] 100  99  98  97  96  95
 
  final-data.frame(result, cover)
 
  names(final)-c(cover_per, reads)
  head(final)
cover_per reads
  1   100 4
  299 8
  39813
  49714
  59617
  69520
 
 
 
 
 
  On Thu, Nov 4, 2010 at 9:18 AM, jim holtman jholt...@gmail.com wrote:
 
  Is this what you want:
 
  x
 id reads
  1   Contig79:1 4
  2   Contig79:2 8
  3   Contig79:313
  4   Contig79:414
  5   Contig79:517
  6   Contig79:620
  7   Contig79:725
  8   Contig79:827
  9   Contig79:932
  10 Contig79:1033
  11 Contig79:1134
  x$percent - x$reads / max(x$reads) * 100
  x
 id reads   percent
  1   Contig79:1 4  11.76471
  2   Contig79:2 8  23.52941
  3   Contig79:313  38.23529
  4   Contig79:414  41.17647
  5   Contig79:517  50.0
  6   Contig79:620  58.82353
  7   Contig79:725  73.52941
  8   Contig79:827  79.41176
  9   Contig79:932  94.11765
  10 Contig79:1033  97.05882
  11 Contig79:1134 100.0
 
 
  On Thu, Nov 4, 2010 at 11:46 AM, Changbin Du changb...@gmail.com
 wrote:
  HI, Dear R community,
 
  I have one data set like this,  What I want to do is to calculate the
  cumulative coverage. The following codes works for small data set
 (#rows
  =
  100), but when feed the whole data set,  it still running after 24
 hours.
  Can someone give some suggestions for long vector?
 
  idreads
  Contig79:14
  Contig79:28
  Contig79:313
  Contig79:414
  Contig79:517
  Contig79:620
  Contig79:725
  Contig79:827
  Contig79:932
  Contig79:1033
  Contig79:1134
 
 
 
 matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
  sep=\t, skip=0, header=F,fill=T) #
  dim(matt)
  [1] 3384766   2
 
  matt_plot-function(matt, outputfile) {
  names(matt)-c(id,reads)
 
   cover-matt$reads
 
 
  #calculate the cumulative coverage.
  + cover_per-function (data) {
  + output-numeric(0)
  + for (i in data) {
  +   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
  +   output-c(output, x)
  + }
  + return(output)
  + }
 
 
   result-cover_per(cover)
 
 
  Thanks so much!
 
 
  --
  Sincerely,
  Changbin
  --
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
  --
  Jim Holtman
  Cincinnati, OH
  +1 513 646 9390
 
  What is the problem that you are trying to solve?
 
 
 
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal,

Re: [R] how to work with long vectors

2010-11-04 Thread Phil Spector


Changbin -
   Does

100 * sapply(matt$reads,function(x)sum(matt$reads = x))/length(matt$reads)

give what you want?

By the way, if you want to use a loop (there's nothing wrong with that),
then try to avoid the most common mistake that people make with loops in R:
having your result grow inside the loop.  Here's a better way to use a loop
to solve your problem:

cover_per_1 - function(data){
   l = length(data)
   output = numeric(l)
   for(i in 1:l)output[i] = 100 * sum(ifelse(data = data[i], 1, 
0))/length(data)
   output
}

Using some random data, and comparing to your original cover_per function:


dat = rnorm(1000)
system.time(one - cover_per(dat))

   user  system elapsed
  0.816   0.000   0.824 

system.time(two - cover_per_1(dat))

   user  system elapsed
  0.792   0.000   0.805

Not that big a speedup, but it does increase quite a bit as the problem gets
larger.

There are two obvious ways to speed up your function:
   1)  Eliminate the ifelse function, since automatic coersion from
   logical to numeric does the same thing.
   2)  Multiply by 100 and divide by the length outside the loop:

cover_per_2 - function(data){
   l = length(data)
   output = numeric(l)
   for(i in 1:l)output[i] = sum(data = data[i])
   100 * output / l
}


system.time(three - cover_per_2(dat))

   user  system elapsed
  0.024   0.000   0.027

That makes the loop just about equivalent to the sapply solution:


system.time(four - 100*sapply(dat,function(x)sum(dat = x))/length(dat))

   user  system elapsed
  0.024   0.000   0.026

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu








On Thu, 4 Nov 2010, Changbin Du wrote:


HI, Dear R community,

I have one data set like this,  What I want to do is to calculate the
cumulative coverage. The following codes works for small data set (#rows =
100), but when feed the whole data set,  it still running after 24 hours.
Can someone give some suggestions for long vector?

idreads
Contig79:14
Contig79:28
Contig79:313
Contig79:414
Contig79:517
Contig79:620
Contig79:725
Contig79:827
Contig79:932
Contig79:1033
Contig79:1134

matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
sep=\t, skip=0, header=F,fill=T) #
dim(matt)
[1] 3384766   2

matt_plot-function(matt, outputfile) {
names(matt)-c(id,reads)

cover-matt$reads


#calculate the cumulative coverage.
+ cover_per-function (data) {
+ output-numeric(0)
+ for (i in data) {
+   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
+   output-c(output, x)
+ }
+ return(output)
+ }


result-cover_per(cover)


Thanks so much!


--
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to work with long vectors

2010-11-04 Thread Changbin Du

Thanks Phil, that is great! I WILL try this and let you know how it goes.



On Thu, Nov 4, 2010 at 10:16 AM, Phil Spector spec...@stat.berkeley.eduwrote:

 Changbin -
   Does

100 * sapply(matt$reads,function(x)sum(matt$reads =
 x))/length(matt$reads)

 give what you want?

By the way, if you want to use a loop (there's nothing wrong with that),
 then try to avoid the most common mistake that people make with loops in R:
 having your result grow inside the loop.  Here's a better way to use a loop
 to solve your problem:

 cover_per_1 - function(data){
   l = length(data)
   output = numeric(l)
   for(i in 1:l)output[i] = 100 * sum(ifelse(data = data[i], 1,
 0))/length(data)
   output
 }

 Using some random data, and comparing to your original cover_per function:

  dat = rnorm(1000)
 system.time(one - cover_per(dat))

   user  system elapsed
  0.816   0.000   0.824

 system.time(two - cover_per_1(dat))

   user  system elapsed
  0.792   0.000   0.805

 Not that big a speedup, but it does increase quite a bit as the problem
 gets
 larger.

 There are two obvious ways to speed up your function:
   1)  Eliminate the ifelse function, since automatic coersion from
   logical to numeric does the same thing.
   2)  Multiply by 100 and divide by the length outside the loop:

 cover_per_2 - function(data){
   l = length(data)
   output = numeric(l)
   for(i in 1:l)output[i] = sum(data = data[i])
   100 * output / l
 }

  system.time(three - cover_per_2(dat))

   user  system elapsed
  0.024   0.000   0.027

 That makes the loop just about equivalent to the sapply solution:

  system.time(four - 100*sapply(dat,function(x)sum(dat = x))/length(dat))

   user  system elapsed
  0.024   0.000   0.026

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu









 On Thu, 4 Nov 2010, Changbin Du wrote:

  HI, Dear R community,

 I have one data set like this,  What I want to do is to calculate the
 cumulative coverage. The following codes works for small data set (#rows =
 100), but when feed the whole data set,  it still running after 24 hours.
 Can someone give some suggestions for long vector?

 idreads
 Contig79:14
 Contig79:28
 Contig79:313
 Contig79:414
 Contig79:517
 Contig79:620
 Contig79:725
 Contig79:827
 Contig79:932
 Contig79:1033
 Contig79:1134


 matt-read.table(/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth,
 sep=\t, skip=0, header=F,fill=T) #
 dim(matt)
 [1] 3384766   2

 matt_plot-function(matt, outputfile) {
 names(matt)-c(id,reads)

 cover-matt$reads


 #calculate the cumulative coverage.
 + cover_per-function (data) {
 + output-numeric(0)
 + for (i in data) {
 +   x-(100*sum(ifelse(data = i, 1, 0))/length(data))
 +   output-c(output, x)
 + }
 + return(output)
 + }


 result-cover_per(cover)


 Thanks so much!


 --
 Sincerely,
 Changbin
 --

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] candisc plot subset of all groups

2010-11-04 Thread farful


Hello,

I'm doing CDA, and want to view a plot one group at a time. How can I plot
just one group?

Example:
iris.mod - lm(cbind(Petal.Length, Sepal.Length, Petal.Width, Sepal.Width) ~
Species, data=iris)
iris.can - candisc(iris.mod, data=iris)
plot(iris.can)

In this example the plot shows all three groups - how do I get it to only
show one group?

Thanks!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/candisc-plot-subset-of-all-groups-tp3027532p3027532.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ANOVA table and lmer

2010-11-04 Thread James Booth



The following output results from fitting models using lmer and lm to 
data arising from a split-plot experiment (#320 from Small Data Sets 
by Hand et al. 1994). The data is given at the bottom of this message. 
My question is why is the sum of squares for variety (V) different in 
the ANOVA table generated from the lmer model fit from that generated by 
the lm model fit. The decomposition of the sum of squares should be the 
same regardless of whether block is treated as random of fixed. Or am I 
misinterpreting the ANOVA table from the lmer fit?


I noticed that other people have asked similar questions in the past, 
but I haven't seen a satisfactory explanation.


Jim Booth.

 B=factor(block)
 V=factor(variety)
 N=factor(nitrogen)
 Y=yield
 lmm.split=lmer(Y~V+N+V:N+(1|B)+(1|B:V)+(1|B:N))
 anova(lmm.split)
Analysis of Variance Table
Df  Sum Sq Mean Sq F value
V2   526.1   263.0  1.4853
N3 20020.5  6673.5 37.6856
V:N  6   321.853.6  0.3028
 lm.split=lm(Y~B*V+B*N+V*N)
 anova(lm.split)
Analysis of Variance Table
Response: Y
  Df  Sum Sq Mean Sq F valuePr(F)
B  5 15875.3  3175.1 15.4114 1.609e-07 ***
V  2  1786.4   893.2  4.3354   0.02219 *
N  3 20020.5  6673.5 32.3926 1.540e-09 ***
B:V   10  6013.3   601.3  2.9188   0.01123 *
B:N   15  1788.2   119.2  0.5786   0.86816
V:N6   321.753.6  0.2603   0.95103
Residuals 30  6180.6   206.0

 split
   block variety nitrogen yield
1  1   10   111
2  1   11   130
3  1   12   157
4  1   14   174
5  1   20   117
6  1   21   114
7  1   22   161
8  1   24   141
9  1   30   105
10 1   31   140
11 1   32   118
12 1   34   156
13 2   1061
14 2   1191
15 2   1297
16 2   14   100
17 2   2070
18 2   21   108
19 2   22   126
20 2   24   149
21 2   3096
22 2   31   124
23 2   32   121
24 2   34   144
25 3   1068
26 3   1164
27 3   12   112
28 3   1486
29 3   2060
30 3   21   102
31 3   2289
32 3   2496
33 3   3089
34 3   31   129
35 3   32   132
36 3   34   124
37 4   1074
38 4   1189
39 4   1281
40 4   14   122
41 4   2064
42 4   21   103
43 4   22   132
44 4   24   133
45 4   3070
46 4   3189
47 4   32   104
48 4   34   117
49 5   1062
50 5   1190
51 5   12   100
52 5   14   116
53 5   2080
54 5   2182
55 5   2294
56 5   24   126
57 5   3063
58 5   3170
59 5   32   109
60 5   3499
61 6   1053
62 6   1174
63 6   12   118
64 6   14   113
65 6   2089
66 6   2182
67 6   2286
68 6   24   104
69 6   3097
70 6   3199
71 6   32   119
72 6   34   121

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] map on irregular grids

2010-11-04 Thread Wolfgang Polasek

Hi all

   how to find a function for plotting polygon surface, like
polgon3d(xc,yc,obs)

xc, yc ... coordinates
obs observations
result: persp plot with grid net over the coordinates

W.Polasek

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] matlab code into R

2010-11-04 Thread Marcelo Lima

Hello,

I'm trying to write the following matlab code into R:

N = zeros(n-1); for i=2:(n-1)

N(1,i) = 1/(pi * (i-1));

end for i=2:(n-2)

for j=i:(n-1) N(i,j) = N(i-1,j-1);

end;

end for i=2:(n-1)

end

for j=1:i N(i,j) = -N(j,i);

end;


any suggestions?


Thanks


can i just add the following line to my calculation N=1/(pi*(i-1)

-- 
Marcelo Andrade de Lima
UNIFESP - Universidade Federal de São Paulo
Departamento de Bioquímica
Disciplina de Biologia Molecular
Rua Três de Maio 100, 4 andar - Vila Clementino, 04044-020
Lab +55 11 55764438 R.1188
Cell +55 11 92725274
ml...@unifesp.br

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plotting a vector data

2010-11-04 Thread Nasrin Pak

Hi;
I have 30 data sets and I managed to take the average of a variable in each
set and put them in a vector like variable(It contains NaN data as well).
x- matrix( list.files(C:/updated_CFL_Rad_files/2007/11,full=TRUE))
 for(i in 1:30) {
  radiation.data -read.table(x[i], header = TRUE,sep = ,, quote =  ,
dec = .)
  attach(radiation.data)
  names(radiation.data)
  mean.radiation[i]- mean(PAR_avg,na.rm = TRUE)
 }
How can I plot this vector (mean.radiation[i]) vs i ?
I tried to do so but there was an error:
Error in plot.window(...) : need finite 'ylim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf

-- 
Sincerely

Nasrin  Pak

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot output

2010-11-04 Thread Dennis Murphy

Hi:

This isn't very difficult if you use a little imagination. We want three
separate plots of monthly means by variable with attached error bars. This
requires faceting, so we need to create a factor whose levels are the
variable names. We also need to generate enough data to summarize by mean
and standard deviation per month in order to generate the point/line/error
bar plots. Here's one attempt.

# Generate fake data: if you don't have an example, make one up! :)
# 20 obs. per month, simulate from normal distribution with a
# given vector of monthly means and standard deviations

## Set up time variable, monthly means and SDs ##
# I'm lazy: I just used the three letter month abbreviations in month.abb()
# to use as my 'time' variable

Month - factor(rep(month.abb, each = 20), levels = month.abb)
# monthly means
mmeans - c(12, 15, 18, 20, 24, 30, 26, 23, 18, 15, 12, 10)
# monthly sd's
msd - rep(c(3, 5, 4, 3), each = 3)

# Generate data ###
# Simulate 240 observations for each of the three variables
C - rnorm(240, m = rep(mmeans, each = 20), s = rep(msd, each = 20))
K - rnorm(240, m = rep(mmeans+2, each = 20), s = rep(msd+1, each = 20))
S - rnorm(240, m = rep(mmeans-2, each = 20), s = rep(msd+2, each = 20))
# Combine into a data frame
d - data.frame(Month, C, K, S)

library(ggplot2)   # plyr and reshape get loaded with ggplot2

 Process data: get monthly means/SDs for each variable ##
# Use ddply to get monthly means and sd's for each variable
# (Yes, there are more efficient ways to do this, but there are only
three...)

md - ddply(d, .(Month), summarise, C_avg = mean(C), C_stdev = sd(C),
  K_avg = mean(K), K_stdev = sd(K), S_avg = mean(S), S_stdev =
sd(S))

# Melt the data from 'wide' to 'long' - the idea is to stack C, K and S
values
# and use their names as a factor variable. This is a very useful trick for
# faceting or grouping.
# grep() is used to select variable names that end in 'avg' or 'stdev'; the
$
# sign in a regular expression indicates that action.
# Thanks to Kohske Takahashi for the clue to melting multiple groups of
variables
# in a post on the ggplot2 list.

dmelt - data.frame(
 melt(md, id = 'Month', measure = c(grep('avg$', names(md,
 sd = melt(md, id = 'Month', measure = c(grep('stdev$',
names(md$value
)
# Some housecleaning: change value to Mean in names and create a new
variable that
# only uses the variable name (C, K, S) as a factor level, to be used for
labeling
# the facets. This reduces the amount of ggplot() code we need to write.

names(dmelt)[3] - 'Mean'
dmelt$Variable - substring(dmelt$variable, 1, 1)

# Should be straightforward - scale code is used to avoid overlapping labels
in a
# confined graphics space.

g - ggplot(dmelt, aes(x = Month, y = Mean))
g + geom_point() + geom_line(aes(group = 1)) +
geom_errorbar(aes(ymin = Mean - sd, ymax = Mean + sd), width = 0.4) +
facet_wrap(~ Variable, nrow = 1) +
scale_x_discrete(breaks = levels(Month),
 labels = substring(month.abb, 1, 1))

# Just for the heck of it, here's a monthly plot of each variable's means
(no error bars) as an alternative:

g + geom_point(aes(colour = Variable), size = 3) +
 geom_line(aes(colour = Variable, group = Variable), size = 1)

Notice that by limiting the aesthetics in g, I was able to insert additional
aesthetics ymin and xmin into geom_errorbar() in the first plot and colour +
group in the second plot and still use the same g as a foundation.

HTH,
Dennis

On Thu, Nov 4, 2010 at 7:32 AM, ashz a...@walla.co.il wrote:


 Dear Thierry,

 Your solution looks very elgant but I can not find a proper example.

 Can you provide me one?

 Thx

 --
 View this message in context:
 http://r.789695.n4.nabble.com/ggplot-output-tp3027026p3027108.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Echo to file using Rscript

2010-11-04 Thread Bernd Wiese



I use R 2.12.0 in Windows XP.
For debugging and control I am trying to get a file with contains the 
echo when code is copied into the R Gui. This works e.g. with the command:


R CMD BATCH --no-restore  D:\path\script.r

then a file called script.Rout is generated in the same folder. It 
contains the code and the corresponding output.

This does not work using:

Rscript --no-restore D:\Ketzin\Invers\V1\read_obs.r

A partial alternative is to include to the beginning of the code

sink(log.dat,type=c(output,message))

but the file does not contain the code, only the output. Is there a way 
to geneate anything analogue to the *.Rout file? if possible only with 
command line, without the sink() command?


Best, Bernd

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matlab code into R

2010-11-04 Thread Phil Spector


Well, there's the obvious:

N = matrix(0,n-1,n-1)
for(i in 2:(n-1))
 N[1,i] = 1/(pi * (i-1))
for(i in 2:(n-2))
   for(j in i:(n-1))
  N[i,j] = N[i-1,j-1]
for(i in 2:(n-1))
   for(j in 1:i)
  N[i,j] = -N[j,i]

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu




Hello,

I'm trying to write the following matlab code into R:

N = zeros(n-1); for i=2:(n-1)

N(1,i) = 1/(pi * (i-1));

end for i=2:(n-2)

for j=i:(n-1) N(i,j) = N(i-1,j-1);

end;

end for i=2:(n-1)

end

for j=1:i N(i,j) = -N(j,i);

end;


any suggestions?


Thanks


can i just add the following line to my calculation N=1/(pi*(i-1)

--
Marcelo Andrade de Lima
UNIFESP - Universidade Federal de S?o Paulo
Departamento de Bioqu?mica
Disciplina de Biologia Molecular
Rua Tr?s de Maio 100, 4 andar - Vila Clementino, 04044-020
Lab +55 11 55764438 R.1188
Cell +55 11 92725274
ml...@unifesp.br

[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotting time series for particular months!

2010-11-04 Thread govindas



Hi all, 

I have a matrix as given below...

year month day prec
1   1980    10   1 13.4
2   1980    10   2  0.0
3   1980    10   3  0.0
4   1980    10   4  0.0
5   1980    10   5  0.0
6   1980    10   6  0.0
7   1980    10   7  0.0
8   1980    10   8  0.0
9   1980    10   9  0.0
10  1980    10  10  0.0
11  1980    10  11  7.4
12  1980    10  12  5.4
13  1980    10  13  7.2
14  1980    10  14  0.0
15  1980    10  15  0.0
16  1980    10  16  0.0
17  1980    10  17 41.2
18  1980    10  18  0.0
19  1980    10  19  0.0
20  1980    10  20  0.0
21  1980    10  21  0.0
22  1980    10  22  0.0
23  1980    10  23  0.0
24  1980    10  24  0.0
25  1980    10  25  0.0
26  1980    10  26  0.0
27  1980    10  27  2.0
28  1980    10  28  0.0
29  1980    10  29  0.0
30  1980    10  30  0.0
31  1980    10  31  0.0
32  1980    11   1  0.0
33  1980    11   2  0.0
34  1980    11   3  0.0
35  1980    11   4  0.0
36  1980    11   5 12.4

the precipitation values extend from 1980 to 2005, but only for october, 
november and december. I would like to plot just these 3 months for the given 
time period (1980 - 2005). Is there a way to get these values in the x-axis. 
i.e. i need a plot with its axis reading 1980, 1981, 1982 .. 2005 (even if 
the months are specified then it should be of more use). for now, i get an axis 
like .. 0, 500, ... 2000 (the plot is giving the index values).

i tried changing the freq as given below, but did not work!
ts.chn - ts(chn.arr[1:2386,4], start=c(1980, 10), end=c(2005, 12), freq=365)
plot(ts.chn)

-- 
Regards,
Mahalakshmi
Graduate Student
#20, Department of Geography
Michigan State University
East Lansing, MI 48824
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to do bootstrap for the complex sample design?

2010-11-04 Thread Thomas Lumley

On Fri, Nov 5, 2010 at 3:51 AM, Tim Hesterberg timhesterb...@gmail.com wrote:
 Faye wrote:
Our survey is structured as : To be investigated area is divided into
6 regions, within each region, one urban community and one rural
community are randomly selected, then samples are randomly drawn from
each selected uran and rural community.

The problems is that in urban/rural stratum, we only have one sample.
In this case, how to do bootstrap?

 You are lucky that your sample size is 1.  If it were 2 you would
 probably have proceeded without realizing that the answers were wrong.

 Suppose you had two samples in each stratum.  If you proceed naturally,
 drawing bootstrap samples of size 2 from each stratum, this would
 underestimate variability by a factor of 2.

 In general the ordinary nonparametric bootstrap estimates of variability
 are biased downward by a factor of (n-1)/n -- exactly for the mean,
 approximately for other statistics.  In multiple-sample and stratified
 situations, the bias depends on the stratum sizes.

 Three remedies are:
 * draw bootstrap samples of size n-1
 * bootknife sampling - omit one observation (a jackknife sample), then
  draw a bootstrap sample of size n from that
 * bootstrap from a kernel density estimate, with kernel covariance equal
  to empirical covariance (with divisor n-1) / n.
 The latter two are described in
 Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. 
 Smoothing, Proceedings of the Section on Statistics and the Environment, 
 American Statistical Association, 2924-2930.
 http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

 All three are undefined for samples of size 1.  You need to go to some
 other bootstrap, e.g. a parametric bootstrap with variability estimated
 from other data.


And the 'survey' package supplies the first option. (It also supplies
a bootstrap sample of size n that allows finite population
corrections, designed for situations with a large n and a high
sampling fraction, such as some business surveys.)

With a sample size of 1 per stratum there are no design-unbiased
estimators of the standard error, so as others have said you need
external data.

   -thomas


-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] glmnet_1.5 uploaded to CRAN

2010-11-04 Thread Trevor Hastie

This is a new version of glmnet, that incorporates some bug fixes and 
speedups.

* a new convergence criterion which which offers 10x or more speedups for
 saturated fits (mainly effects logistic, Poisson and Cox)
* one can now predict directly from a cv.object - see the help files for 
cv.glmnet
  and predict.cv.glmnet
* other new methods are deviance()  for glmnet and coef() for cv.glmnet

Here is the description of the package.

glmnet is a package that fits the regularization path for linear, two- and 
multi-class logistic regression
models, poisson regression and the Cox model, with elastic net regularization 
(tunable mixture of L1 and L2 penalties).
glmnet uses pathwise coordinate descent, and is very fast.

Some of the features of glmnet:

* by default it computes the path at 100 uniformly spaced (on the log scale) 
values of the regularization parameter
* glmnet is very fast, even for large data sets.
* recognizes and exploits sparse input matrices (ala Matrix package). 
Coefficient matrices are output in sparse matrix representation.
* penalty is (1-a)*||\beta||_2^2 +a*||beta||_1  where a is between 0 and 1;  
a=0 is the Lasso penalty, a=1 is the ridge penalty.
   For many correlated predictors, a=.95 or thereabouts improves the 
performance of the lasso.
* convenient predict, plot, print, and coef methods
* variable-wise penalty modulation allows each variable to be penalized by a 
scalable amount; if zero that variable always enters
* glmnet uses a symmetric parametrization for multinomial, with constraints 
enforced by the penalization.
* a comprehensive set of cross-validation routines are provided for all models 
and several error measures
* offsets and weights can be provided for all models


Examples of glmnet speed trials:
Newsgroup data: N=11,000, p= 0.75 Million, two class logistic. 100 values along 
lasso path.   Time = 2mins
14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values along 
lasso path. Time = 30secs

Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani.

See our paper http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf for 
implementation details,
and comparisons with other related software.



---
  Trevor Hastie   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977  
  (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
  URL: http://www-stat.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
   390 Serra Mall, Stanford University, CA 94305-4065  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ANOVA table and lmer

2010-11-04 Thread Mark Difford


Hi Jim,

 The decomposition of the sum of squares should be the same regardless of 
 whether block is treated as random of fixed.

Should it? By whose reckoning? The models you are comparing are different.
Simple consideration of the terms listed in the (standard) ANOVA output
shows that this is so, so how could the sum-of-squares be the same?

 I noticed that other people have asked similar questions in the past, but
 I haven't seen a 
 satisfactory explanation.

Maybe, but it has been answered (by me, and surely by others). However,
canonical would be Venables and Ripley's MASS (: 283--286).

The models you need to compare are the following:
##
Aov.mod - aov(Y ~ V * N + Error(B/V/N), data = oats) 
Lme.mod - lme(Y ~ V * N, random = ~1 | B/V/N, data = oats)
Lmer.mod - lmer(Y~ V * N +(1|B)+(1|B:V)+(1|B:N), data = oats)

summary(Aov.mod)
anova(Lme.mod)
anova(Lmer.mod)

HTH, Mark Difford.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/ANOVA-table-and-lmer-tp3027546p3027662.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] avoid a loop

2010-11-04 Thread cory n

Let's suppose I have userids and associated attributes...  columns a and b

a - c(1,1,1,2,2,3,3,3,3)
b - c(a,b,c,a,d,a, b, e, f)

so a unique list of a would be

id - unique(a)

I want a matrix like this...

 [,1] [,2] [,3]
[1,]312
[2,]121
[3,]214

Where element i,j is the number of items in b that id[i] and id[j] share...

So for example, in element [1,3] of the result matrix, I want to see
2.  That is, id's 1 and 3 share two common elements in b, namely a
and b.

This is hard to articulate, so sorry for the terrible description
here.  The way I have solved it is to do a double loop, looping over
every member of the id column and comparing it to every other member
of id to see how many elements of b they share.  This takes forever.

Thanks

cn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoid a loop

2010-11-04 Thread Sarah Goslee

Here's one possibility:

 library(ecodist)
 a - c(1,1,1,2,2,3,3,3,3)
 b - c(a,b,c,a,d,a, b, e, f)

 x - crosstab(a, b, rep(1, length(a)))
 x
  a b c d e f
1 1 1 1 0 0 0
2 1 0 0 1 0 0
3 1 1 0 0 1 1
 x %*% t(x)
  1 2 3
1 3 1 2
2 1 2 1
3 2 1 4

Sarah

On Thu, Nov 4, 2010 at 3:42 PM, cory n corynis...@gmail.com wrote:
 Let's suppose I have userids and associated attributes...  columns a and b

 a - c(1,1,1,2,2,3,3,3,3)
 b - c(a,b,c,a,d,a, b, e, f)

 so a unique list of a would be

 id - unique(a)

 I want a matrix like this...

     [,1] [,2] [,3]
 [1,]    3    1    2
 [2,]    1    2    1
 [3,]    2    1    4

 Where element i,j is the number of items in b that id[i] and id[j] share...

 So for example, in element [1,3] of the result matrix, I want to see
 2.  That is, id's 1 and 3 share two common elements in b, namely a
 and b.

 This is hard to articulate, so sorry for the terrible description
 here.  The way I have solved it is to do a double loop, looping over
 every member of the id column and comparing it to every other member
 of id to see how many elements of b they share.  This takes forever.

 Thanks

 cn




-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoid a loop

2010-11-04 Thread David Winsemius



On Nov 4, 2010, at 4:24 PM, Sarah Goslee wrote:


Here's one possibility:


library(ecodist)
a - c(1,1,1,2,2,3,3,3,3)
b - c(a,b,c,a,d,a, b, e, f)

x - crosstab(a, b, rep(1, length(a)))
x

 a b c d e f
1 1 1 1 0 0 0
2 1 0 0 1 0 0
3 1 1 0 0 1 1

x %*% t(x)

 1 2 3
1 3 1 2
2 1 2 1
3 2 1 4


Antoher way:

 sapply(1:3, function(y) {
 sapply(1:3,  function(x){
 length(intersect(b[a==y], b[a==x]) )
 } ) } )

 [,1] [,2] [,3]
[1,]312
[2,]121
[3,]214




Sarah

On Thu, Nov 4, 2010 at 3:42 PM, cory n corynis...@gmail.com wrote:
Let's suppose I have userids and associated attributes...  columns  
a and b


a - c(1,1,1,2,2,3,3,3,3)
b - c(a,b,c,a,d,a, b, e, f)

so a unique list of a would be

id - unique(a)

I want a matrix like this...

[,1] [,2] [,3]
[1,]312
[2,]121
[3,]214

Where element i,j is the number of items in b that id[i] and id[j]  
share...


So for example, in element [1,3] of the result matrix, I want to see
2.  That is, id's 1 and 3 share two common elements in b, namely a
and b.

This is hard to articulate, so sorry for the terrible description
here.  The way I have solved it is to do a double loop, looping over
every member of the id column and comparing it to every other member
of id to see how many elements of b they share.  This takes forever.

Thanks

cn





--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-04 Thread Matevž Pavlič

Hi again, 

Stil don't qute get it...

Here's what i did : 

mat-read.csv(litologija.csv, dec=., sep=;)
apply(mat, 2, function(x) head(sort(table(x),decreasing=T),10))

With that i get a table(list/matrix...) which gives the highest count of 
occurances of each value in a table (at least i think so) 
But the problem is because it does not tell which value occurs the most (has 
the highest count). 

If written like this :
apply(mat, 2, function(x) sort(table(x),decreasing=T))


I get decreasingly sorted values of counts of occurances of a specific field 
and the value of that field for each column:


$W2
x
 PEŠČEN   GRADUIRAN  INPROD  DO 
GLINAST   PROD,PREPEREL MELJAST   GRUŠČ 
  GLINA   ZMALO GRANULIRANA 
   18721542 552 519 458 
214 175 174 132 114 
 62  53  47  45 
   ZELO PEŠČENA   ZAGLINJEN  KARBONATNI  SKRILAVCA, 
  S   SKRILAVCA  GRANULIRAN  PEČŠEN   VEZAN 
   ZAOBLJEN GR.  DROBEN   SLABO 
 40  34  31  26  26 
 25  25  24  17  17 
 17  15  12  12 
 GRUŠČ,MELJASTO  PEŠEEN   DOBRO   GRAN. 
 PEŠČENJAKA HUDOURNIŠKI  MELJNA   PEŠČN  GIRADUIRAN 
   GLINAST,GOST   GRADUTRAN GRANUL. 
 11  11  11  10  10 
  9   8   8   8   6 
  6   6   6   6 
  PESEKZAMELJEN   GRADUIPANPREPEPEL   PŠČEN 
  GPADUIRAN  GRADUIRAN,GRADURAN POTOČNI PREPERL 
 SAVSKICONA  GLINASTEGAGRADUIRN 
  6   6   5   5   5 
  4   4   4   4   4 
  4   3   3   3 
   MELJAST,   PEČEN PEŠČEN,  PLASTI 
  DELNO  GLINA,GLINASTOGRADUIAN   GRADULRAN 
   GRDUIRAN  GRUŠČ.   KARB. KONGLOMERAT 
  3   3   3   3   2 
  2   2   2   2   2 
  2   2   2   2 
   KONGLOMERAT,MELJNEKOLIKOOKER  PESEK, 
PEŠČCEN PEŠČEN. PLASTEH PODPPEPEREL 
   RPOD  UMAZAN   ZAOBLJEN,   - 
  2   2   2   2   2 
  2   2   2   2   2 
  2   2   2   1 
(GRUŠČ)(KARBONATNI) APNENCADROBNOZRNAT,  ENAKOMEREN 
  GBADUIRANGLIANASTGLINASTA  GPADUIRALN   GPUŠČ 
 GRADAUIRANGRADUIRA GRADUIRANPEŠČEN   GRADUIRAU


But the first code somhove looses the acutal value of the field and just gives 
the count 
apply(mat, 2, function(x) head(sort(table(x),decreasing=T),10))

VrtinaID ZapStev GlobinaOd GlobinaDo USCS Opis   W1   W2   W3   W4   W5   W6   
W7  W8  W9  W10  W11  W12  W13  W14  W15
 [1,]   151248   282   290 2131   15 1820 1872 1677 1479 1441 
1465 1261 769 848 1088 1490 1968 2459 2943 3408
 [2,]   111119   198   235 1305   13 1791 1542 1495 1334 1317 
1247  829 652 783  660  606  603  381  381  301
 [3,]   111078   174   210  784   11  532  552  566  529  532  
716  511 575 576  416  464  384  368  282  279
 [4,]   11 835   147   173  691   11  471  519  390  351  358  
571  364 521 556  381  398  352  287  282  259
 [5,]   10 584   133   172  646   11  376  458  296  311  323  
195  252 329 429  343  397  336  244  242  224
 [6,]   10 389   123   142  386   10  253  214  237  268  310  
130  233 265 376  263  378  258  228  210  205
 [7,]   10 257   114   130  183   10  247  175  201  242  157  
130  179 258 267  219  230  239  197  185  155
 [8,]9 198   105   126  1489  135  174  157  170  146  
102  163 213 266  215  221  188  197  179  155
 [9,]9 171   10195   719  102  132  139  161  141   
89  145 199 140  192  205  168  191  160  122
[10,]

Re: [R] avoid a loop

2010-11-04 Thread Dennis Murphy

Hi:

To mimic Sarah Goslee's reply within base R, either of these work:

crossprod(t(as.matrix(xtabs( ~ a + b
crossprod(t(as.matrix(table(a, b

HTH,
Dennis

On Thu, Nov 4, 2010 at 12:42 PM, cory n corynis...@gmail.com wrote:

 Let's suppose I have userids and associated attributes...  columns a and b

 a - c(1,1,1,2,2,3,3,3,3)
 b - c(a,b,c,a,d,a, b, e, f)

 so a unique list of a would be

 id - unique(a)

 I want a matrix like this...

 [,1] [,2] [,3]
 [1,]312
 [2,]121
 [3,]214

 Where element i,j is the number of items in b that id[i] and id[j] share...

 So for example, in element [1,3] of the result matrix, I want to see
 2.  That is, id's 1 and 3 share two common elements in b, namely a
 and b.

 This is hard to articulate, so sorry for the terrible description
 here.  The way I have solved it is to do a double loop, looping over
 every member of the id column and comparing it to every other member
 of id to see how many elements of b they share.  This takes forever.

 Thanks

 cn

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoid a loop

2010-11-04 Thread Joshua Wiley

And to wrap it up and help you choose, here are four functions based
on these emails (the first one is my own slight variant):

library(ecodist)
a - sample(1:1000, 10^4, replace = TRUE)
b - sample(letters[1:6], 10^4, replace = TRUE)

foo1 - function() {
  x - table(a, b)
  return(x %*% t(x))
}

foo2 - function() {
  x - crosstab(a, b, rep(1, length(a)))
  return(x %*% t(x))
}

foo3 - function() {
  sapply(1:1000, function(y) {
sapply(1:1000, function(x) {
  length(intersect(b[a==y], b[a==x]))
})
  })
}

foo4 - function() {crossprod(t(as.matrix(table(a, b}

 system.time(x1 - foo1())
   user  system elapsed
  0.028   0.008   0.038
 system.time(x2 - foo2())
   user  system elapsed
  0.076   0.008   0.087
## I got tired of waiting
 system.time(x3 - foo3())
  menu-bar signals break
Timing stopped at: 104.951 1.336 110.909
 system.time(x4 - foo4())
   user  system elapsed
  0.024   0.020   0.043

 all.equal(x1, x2, check.attributes = FALSE)
[1] TRUE
 all.equal(x1, x4, check.attributes = FALSE)
[1] TRUE

This suggests the speeds are:

foo1  foo4  foo2  foo3

Cheers,

Josh

On Thu, Nov 4, 2010 at 12:42 PM, cory n corynis...@gmail.com wrote:
 Let's suppose I have userids and associated attributes...  columns a and b

 a - c(1,1,1,2,2,3,3,3,3)
 b - c(a,b,c,a,d,a, b, e, f)

 so a unique list of a would be

 id - unique(a)

 I want a matrix like this...

     [,1] [,2] [,3]
 [1,]    3    1    2
 [2,]    1    2    1
 [3,]    2    1    4

 Where element i,j is the number of items in b that id[i] and id[j] share...

 So for example, in element [1,3] of the result matrix, I want to see
 2.  That is, id's 1 and 3 share two common elements in b, namely a
 and b.

 This is hard to articulate, so sorry for the terrible description
 here.  The way I have solved it is to do a double loop, looping over
 every member of the id column and comparing it to every other member
 of id to see how many elements of b they share.  This takes forever.

 Thanks

 cn

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-04 Thread jim holtman

Is this closer to what you want, assuming that it is the value of the
most frequently occurring:

 apply(mat, 2, function(x) head(names(sort(table(x), decreasing=T)),5))
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1  14 5  1  4  14 6  18 11 19
[2,] 3  3  13 12 3  11 14 9  18 12
[3,] 2  18 20 8  11 12 17 14 14 7
[4,] 5  11 8  19 5  18 18 15 16 10
[5,] 18 13 11 11 17 3  4  16 8  16



2010/11/4 Matevž Pavlič matevz.pav...@gi-zrmk.si:
 Hi again,

 Stil don't qute get it...

 Here's what i did :

mat-read.csv(litologija.csv, dec=., sep=;)
apply(mat, 2, function(x) head(sort(table(x),decreasing=T),10))

 With that i get a table(list/matrix...) which gives the highest count of 
 occurances of each value in a table (at least i think so)
 But the problem is because it does not tell which value occurs the most (has 
 the highest count).

 If written like this :
apply(mat, 2, function(x) sort(table(x),decreasing=T))


 I get decreasingly sorted values of counts of occurances of a specific field 
 and the value of that field for each column:


 $W2
 x
         PEŠČEN       GRADUIRAN              IN            PROD              
 DO         GLINAST           PROD,        PREPEREL         MELJAST           
 GRUŠČ           GLINA               Z            MALO     GRANULIRANA
           1872            1542             552             519             
 458             214             175             174             132           
   114              62              53              47              45
           ZELO         PEŠČENA       ZAGLINJEN      KARBONATNI      
 SKRILAVCA,               S       SKRILAVCA      GRANULIRAN          PEČŠEN    
        VEZAN        ZAOBLJEN             GR.          DROBEN           SLABO
             40              34              31              26              
 26              25              25              24              17            
   17              17              15              12              12
         GRUŠČ,        MELJASTO          PEŠEEN           DOBRO           
 GRAN.      PEŠČENJAKA     HUDOURNIŠKI          MELJNA           PEŠČN      
 GIRADUIRAN        GLINAST,            GOST       GRADUTRAN         GRANUL.
             11              11              11              10              
 10               9               8               8               8            
    6               6               6               6               6
          PESEK        ZAMELJEN       GRADUIPAN        PREPEPEL           
 PŠČEN       GPADUIRAN      GRADUIRAN,        GRADURAN         POTOČNI         
 PREPERL          SAVSKI            CONA      GLINASTEGA        GRADUIRN
              6               6               5               5               
 5               4               4               4               4             
   4               4               3               3               3
       MELJAST,           PEČEN         PEŠČEN,          PLASTI                
            DELNO          GLINA,        GLINASTO        GRADUIAN       
 GRADULRAN        GRDUIRAN          GRUŠČ.           KARB.     KONGLOMERAT
              3               3               3               3               
 2               2               2               2               2             
   2               2               2               2               2
   KONGLOMERAT,            MELJ        NEKOLIKO            OKER          
 PESEK,         PEŠČCEN         PEŠČEN.         PLASTEH             POD        
 PPEPEREL            RPOD          UMAZAN       ZAOBLJEN,               -
              2               2               2               2               
 2               2               2               2               2             
   2               2               2               2               1
        (GRUŠČ)    (KARBONATNI)         APNENCA    DROBNOZRNAT,      
 ENAKOMEREN       GBADUIRAN        GLIANAST        GLINASTA      GPADUIRALN    
        GPUŠČ      GRADAUIRAN        GRADUIRA GRADUIRANPEŠČEN       GRADUIRAU


 But the first code somhove looses the acutal value of the field and just 
 gives the count
apply(mat, 2, function(x) head(sort(table(x),decreasing=T),10))

 VrtinaID ZapStev GlobinaOd GlobinaDo USCS Opis   W1   W2   W3   W4   W5   W6  
  W7  W8  W9  W10  W11  W12  W13  W14  W15
  [1,]       15    1248       282       290 2131   15 1820 1872 1677 1479 1441 
 1465 1261 769 848 1088 1490 1968 2459 2943 3408
  [2,]       11    1119       198       235 1305   13 1791 1542 1495 1334 1317 
 1247  829 652 783  660  606  603  381  381  301
  [3,]       11    1078       174       210  784   11  532  552  566  529  532 
  716  511 575 576  416  464  384  368  282  279
  [4,]       11     835       147       173  691   11  471  519  390  351  358 
  571  364 521 556  381  398  352  287  282  259
  [5,]       10     584       133       172  646   11  376  458  296  311  323 
  195  252 329 429  343  397  336  244  242  224
  [6,]       10     389       123

[R] creating vectors with three variables out of three datasets

2010-11-04 Thread DomDom


Hi there,

iÂ´ve got a problem with how to create a vector with three variables out of
three seperate ascii files.
These three ascii files contain pixel information of the same image but
different bands and i need a matrix of 
vectors, with each vector containing the corresponding pixel values for each
band. 

Up to now iÂ´ve seperately read out the ascii files into three matrices but
donÂ´t know how to put the corresponding pixel values together.

Looking forward to any help.
Thank you

Dominik


-- 
View this message in context: 
http://r.789695.n4.nabble.com/creating-vectors-with-three-variables-out-of-three-datasets-tp3027852p3027852.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating vectors with three variables out of three datasets

2010-11-04 Thread Erik Iverson




DomDom wrote:

Hi there,

iÂ´ve got a problem with how to create a vector with three variables out of
three seperate ascii files.
These three ascii files contain pixel information of the same image but
different bands and i need a matrix of 
vectors, with each vector containing the corresponding pixel values for each
band. 


Up to now iÂ´ve seperately read out the ascii files into three matrices but
donÂ´t know how to put the corresponding pixel values together.


Perhaps rbind or cbind, see ?rbind.

It would be useful if you gave us a small, reproducible example
of the type of data you have and what you want to do with it, please
see the Posting Guide.





Looking forward to any help.
Thank you

Dominik






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating vectors with three variables out of three datasets

2010-11-04 Thread DomDom


okay sorry.
i´ve got three ascii files with pixel values without any header information.

so if the first line of the three ascii files are:

ascii1: 11 12 13
ascii2: 14 15 16
ascii3: 17 18 19

i would like a new matrix with:
11,14,17;12,15,18;13,16,19;

thx


-- 
View this message in context: 
http://r.789695.n4.nabble.com/creating-vectors-with-three-variables-out-of-three-datasets-tp3027852p3027880.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mgui

2010-11-04 Thread Luis Felipe Parra

Hello I am using the mgui function in the following way:

mgui ( graf_cuenta_margen_interfaz,title=c(Gráficas,Histogramas
valoración (No lineal) Cuenta de
Margen),exec=Graficar,argText=list(fecha_adelante=Fecha
adelante),closeOnExec=TRUE,output=NULL,,helps=list(fecha_adelante=paste(La
valoración de cuantos días adelante se desea graficar. Las opciones son los
días que se hayan escogido en las
simulacion:,guiGetSafe(horizontes_text

if you notice for the help I am making a string which uses a variable that I
can modify

helps=list(fecha_adelante=paste(La valoración de cuantos días adelante se
desea graficar. Las opciones son los días que se hayan escogido en las
simulacion:,guiGetSafe(horizontes_text)))

The problem is when I modify this variable if I have already used this
option in my program when I use it again the variable seems not be
actualized even though if I call it on the R console using
guiGetSafe(horizontes_text) I can see the change. If for example I change
the without using this option in my program before the changes DO appear.
And if I want the change in my variable to appear in the program having used
the option before I have to close it and open it again.

Does anybody know how can I have the changes in my variable incorporated in
the program eventhough I have used the option before without opening and
closing it again?.

Thank you

Felipe Parra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NA values

2010-11-04 Thread feder


Hi,
I tried to manage exponential family state-space model with the packages
KFAS.
The problem is that my data set includes some NA observation and it seems
not working.
Any suggestion?

Thanks in advance,

Federico

-- 
View this message in context: 
http://r.789695.n4.nabble.com/R-pkgs-New-package-for-multivariate-Kalman-filtering-smoothing-simulation-and-forecasting-tp903589p3027907.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] count occurrence and distance of characters in string

2010-11-04 Thread Immanuel

Hello all,

I want to know how often one character occurs in a given string
and the distance from between every two occurences. (distance = other
characters between them).

thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop

2010-11-04 Thread Matevž Pavlič

Hi Jim, 

Actually, this is better, but both values are what i am looking for. Count and 
the value of the count. 
Is there a way to just paste those two together?

Thanks, m

-Original Message-
From: jim holtman [mailto:jholt...@gmail.com] 
Sent: Thursday, November 04, 2010 9:59 PM
To: Matevž Pavlič
Cc: Petr PIKAL; r-help@r-project.org
Subject: Re: [R] Loop

Is this closer to what you want, assuming that it is the value of the most 
frequently occurring:

 apply(mat, 2, function(x) head(names(sort(table(x), decreasing=T)),5))
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1  14 5  1 
 4  14 6  18 11 19
[2,] 3  3  13 12 3  11 14 9  18 12
[3,] 2  18 20 8  11 12 17 14 14 7
[4,] 5  11 8  19 5  18 18 15 16 10
[5,] 18 13 11 11 17 3  4  16 8  16



2010/11/4 Matevž Pavlič matevz.pav...@gi-zrmk.si:
 Hi again,

 Stil don't qute get it...

 Here's what i did :

mat-read.csv(litologija.csv, dec=., sep=;) apply(mat, 2, 
function(x) head(sort(table(x),decreasing=T),10))

 With that i get a table(list/matrix...) which gives the highest count 
 of occurances of each value in a table (at least i think so) But the problem 
 is because it does not tell which value occurs the most (has the highest 
 count).

 If written like this :
apply(mat, 2, function(x) sort(table(x),decreasing=T))


 I get decreasingly sorted values of counts of occurances of a specific field 
 and the value of that field for each column:


 $W2
 x
         PEŠČEN       GRADUIRAN              IN            PROD              
 DO         GLINAST           PROD,        PREPEREL         MELJAST           
 GRUŠČ           GLINA               Z            MALO     GRANULIRANA
           1872            1542             552             519             
 458             214             175             174             132           
   
 114              62              53              47              45
           ZELO         PEŠČENA       ZAGLINJEN      KARBONATNI      
 SKRILAVCA,               S       SKRILAVCA      GRANULIRAN          
 PEČŠEN           VEZAN        ZAOBLJEN             GR.          DROBEN        
    
 SLABO
             40              34              31              26              
 26              25              25              24              17            
   
 17              17              15              12              12
         GRUŠČ,        MELJASTO          PEŠEEN           DOBRO           
 GRAN.      PEŠČENJAKA     HUDOURNIŠKI          MELJNA           PEŠČN      
 GIRADUIRAN        GLINAST,            GOST       GRADUTRAN         GRANUL.
             11              11              11              10              
 10               9               8               8               8            
    
 6               6               6               6               6
          PESEK        ZAMELJEN       GRADUIPAN        PREPEPEL           
 PŠČEN       GPADUIRAN      GRADUIRAN,        GRADURAN         POTOČNI         
 PREPERL          SAVSKI            CONA      GLINASTEGA        
 GRADUIRN
              6               6               5               5               
 5               4               4               4               4             
   
 4               4               3               3               3
       MELJAST,           PEČEN         PEŠČEN,          PLASTI                
            
 DELNO          GLINA,        GLINASTO        GRADUIAN       GRADULRAN        
 GRDUIRAN          GRUŠČ.           KARB.     KONGLOMERAT
              3               3               3               3               
 2               2               2               2               2             
   
 2               2               2               2               2
   KONGLOMERAT,            MELJ        NEKOLIKO            OKER          
 PESEK,         PEŠČCEN         PEŠČEN.         PLASTEH             POD        
 PPEPEREL            RPOD          UMAZAN       ZAOBLJEN,               
 -
              2               2               2               2               
 2               2               2               2               2             
   
 2               2               2               2               1
        (GRUŠČ)    (KARBONATNI)         APNENCA    DROBNOZRNAT,      
 ENAKOMEREN       GBADUIRAN        GLIANAST        GLINASTA      
 GPADUIRALN           GPUŠČ      GRADAUIRAN        GRADUIRA 
 GRADUIRANPEŠČEN       GRADUIRAU


 But the first code somhove looses the acutal value of the field and 
 just gives the count
apply(mat, 2, function(x) head(sort(table(x),decreasing=T),10))

 VrtinaID ZapStev GlobinaOd GlobinaDo USCS Opis   W1   W2   W3   W4   
 W5   W6   W7  W8  W9  W10  W11  W12  W13  W14  W15
  [1,]       15    1248       282       290 2131   15 1820 1872 1677 
 1479 1441 1465 1261 769 848 1088 1490 1968 2459 2943 3408
  [2,]       11    1119       198       235 1305   13 1791 1542 1495 
 1334 1317 1247  829 652 783  660  606  603  381  381  301
  [3,]       11

1 2 >

1 - 100 of 124 matches

Mail list logo