date:20100113

Re: [R] = returns wrong result? Why

2010-01-13 Thread Peter Ehlers


Trafim Vanishek wrote:

Dear all,

Does anybody know the probable reason why = gives false when it should give
true?
These two variables are of the same type, and everything works in the cycle
but then it stops when they are equal.

this is the output result

Rk[47] = RB[21]

[1] FALSE


Rk[47]

[1] 0.002842007


RB[21]

[1] 0.002842007

Thanks a lot.


What makes you think that Rk[47] and RB[21] are equal? You're only
showing 9-decimal print versions.

 -Peter Ehlers



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary
403.202.3921

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] = returns wrong result? Why

2010-01-13 Thread Stephan Kolassa


Hi Trafim,

take a look at FAQ 7.31.

HTH
Stephan


Trafim Vanishek schrieb:

Dear all,

Does anybody know the probable reason why = gives false when it should give
true?
These two variables are of the same type, and everything works in the cycle
but then it stops when they are equal.

this is the output result

Rk[47] = RB[21]

[1] FALSE


Rk[47]

[1] 0.002842007


RB[21]

[1] 0.002842007

Thanks a lot.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dynamic data.frame headers

2010-01-13 Thread Mattias Nyström

I would like to create a data.frame with dynamic created headers. I will later 
fill it with percentiles. My percentiles vector is:
percentiles = c(0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 
0.95, 1.00)

From this vector I would like to have headers like:
p5, p10, p20, ..., p95, p100

Is it possible to create headers in such way, something like p+100*c?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dynamic data.frame headers

2010-01-13 Thread Hans Gardfjell

Hi Mattias,

Try this,

percentiles - c(0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 
0.95, 1.00)
test -  data.frame(matrix(NA,0,12))
names(test) - paste(p,percentiles*100,sep=)
 
test
 [1] p5   p10  p20  p30  p40  p50  p60  p70  p80  p90  p95  p100

Cheers, Hans

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mattias Nyström
Sent: Wednesday, January 13, 2010 10:11
To: r-help@r-project.org
Subject: [R] Dynamic data.frame headers

I would like to create a data.frame with dynamic created headers. I will later 
fill it with percentiles. My percentiles vector is:
percentiles = c(0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 
0.95, 1.00)

From this vector I would like to have headers like:
p5, p10, p20, ..., p95, p100

Is it possible to create headers in such way, something like p+100*c?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] convert factor data to numeric

2010-01-13 Thread Ahmet Temiz

hello

 could you give me a hint to convert data in factor type to numeric (float) ?

  regards

--
Open WebMail Project (http://openwebmail.org)


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert factor data to numeric

2010-01-13 Thread Dimitris Rizopoulos


check the following:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f


Best,
Dimitris


Ahmet Temiz wrote:

hello

 could you give me a hint to convert data in factor type to numeric (float) ?

  regards

--
Open WebMail Project (http://openwebmail.org)




--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert factor data to numeric

2010-01-13 Thread S Devriese

On 01/13/2010 10:47 AM, Ahmet Temiz wrote:
 hello
 
  could you give me a hint to convert data in factor type to numeric (float) ?
 
   regards
 
 --
 Open WebMail Project (http://openwebmail.org)
 
 
you could try as.numeric but without more details it is difficult to see
if this will work. How did you end up with a factor (e.g. through import)?

Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] getting p values

2010-01-13 Thread S Ellison


 Duncan Murdoch murd...@stats.uwo.ca 12/01/2010 18:07:46 
 I need to get the p values for a table with 15000 entries of t
values. 
...
Put the t values into a vector, then use pt() in an appropriate way 

... and don't forget any necessary correction for multiple comparisons;
see 
?p.adjust

Steve E



***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] expand.grid game

2010-01-13 Thread baptiste auguie

It did take me a good night's sleep to understand it. I was stuck with
the exact same question but I see now how the remaining balls are
shared among all 8 urns (therefore cases with 11, 12, 13, ... 17 balls
are also dealt with).

Thanks again,

baptiste


2010/1/12 Rolf Turner r.tur...@auckland.ac.nz:

 On 13/01/2010, at 9:19 AM, Greg Snow wrote:

 How trivial is probably subjective, I don't think it is much above
 trivial.  I would not have been surprised to see this question on an exam in
 my undergraduate (300 or junior level) probability course (the hard part was
 remembering the details from that class from over 20 years ago).  My
 favorite test question of all time came from that course: You have a deck
 of poker cards with the 3's removed (and jokers), you deal yourself 5 cards
 at random, what is the probability of getting a straight (not including
 straight flushes)?

 This problem is simpler.  Just think of the 8 places in the number as
 urns, and the 17 1's as balls to be put into the urns.  One ball has to go
 in the first urn, so you have 16 left, there are choose(16+8-1,8-1) ways to
 distribute 16 undistinguishable balls among 8 distinguishable urns. But that
 includes some solutions with more than 9 balls in an urn which violates the
 digits restriction, so subtract off the illegal counts.  If we place 10
 balls in the first urn, then we have 7 remaining balls to distribute between
 the 8 urns or choose( 7+8-1, 7), If we place 1 ball in the first urn and 10
 balls in one of the 7 other urns (7*), then there are choose( 6+8-1, 7 )
 ways to distribute the remaining 6 balls in the 8 urns.  Not too complicated
 once you remember (or look up) the formula for urns and balls.

 Sorry to be a thicko --- but doesn't the foregoing solution *leave in* the
 possibility
 of putting all 17 balls in the first urn?  Or 3 balls in the first urn, 12
 in the second,
 and the remaining 2 in any of the other six urns?  Etc.  I.e. don't more
 terms have to
 be subtracted?

        cheers,

                Rolf Turner

 ##
 Attention:This e-mail message is privileged and confidential. If you are not
 theintended recipient please delete the message and notify the sender.Any
 views or opinions presented are solely those of the author.

 This e-mail has been scanned and cleared by
 MailMarshalwww.marshalsoftware.com
 ##


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Recommended visualization for hierarchical data

2010-01-13 Thread Jim Lemon


On 01/13/2010 02:46 PM, Rex C. Eastbourne wrote:

On Tue, Jan 12, 2010 at 5:26 PM, Rex C. Eastbournerex.eastbou...@gmail.com
   

wrote:
 
   

Let's say I have data in the following schema that describes the number of
purchases a company has received from each County in the US:

State | County | Purchases
---
NJ | Mercer | 550
CA | Orange | 23


I would like to visualize what states contribute the most to the overall
total, and furthermore within those states, what Counties contribute the
most. What are some recommended R visualizations for this type of data? I
created a treemap using map.market from the portfolio library, like the
following:

http://zoonek2.free.fr/UNIX/48_R/g126.png

Although this is an attractive visual, I want something that makes it
easier to compare the relative sizes of components at a glance (hard with a
treemap because rectangles have different aspect ratios). Does anyone have a
recommended alternate visualization?

Thanks!

 

Just to clarify: I made up the above example for simplicity's sake to
illustrate what I meant by hierarchical data. My actual data is not
related to maps or geography, so a map-based visualization wouldn't work.

   

Hi Rex,
Have a look at the hierobarp function in the plotrix package. It 
produces nested bars that begin with the overall value.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate the percentages of the numbers in every column.

2010-01-13 Thread Petr PIKAL

Hi


r-help-boun...@r-project.org napsal dne 13.01.2010 01:36:31:

 tmp - scan()
 0 2 1 0 1 0 2 1 2 3 0 0 0 0 1 0 0 2 3 1
 
 dat - matrix(tmp, byrow=T, ncol=4)
 
 apply(dat, 2, function(x, min.val, max.val) {
 tmp - table(x)/length(x)
 res - rep(0, max.val - min.val + 1)
 res[as.numeric(names(tmp)) - min.val + 1] - tmp
 res
 }, 0, 3)
 
 Should do it (but I bet there is a more elegant way).

I am not sure if more elegant or efficient but

dat.m-melt(as.data.frame(dat))
xtabs(~value+variable, dat.m)/nrow(dat)

gives you similar result

Regards
Petr

 
 Regards,
 Simon Knapp
 
 On Wed, Jan 13, 2010 at 5:25 AM, Kelvin 6kelv...@gmail.com wrote:
 
  Dear friends,
 
  I have a table like this, I have A B C D ... levels, the first column
  you see is just the index, and there are different numbers in the
  table.
 
   A  B  C  D  ...
  10   2   1   0
  21   0   2   1
  32   3   0   0
  40   0   1   0
  50   2   3   1
  ...
 
  I want to calculate the frequencies or the percentages of the numbers
  in every column.
 
  How do I get a table like this, the first column is the levels of
  numbers, and the numbers inside the table are the percentages. All the
  percentages should add up to 1 in every column.
 
   A B  C D   ...
   0  0.2   0.3   0.1   0.1
   1  0.1   0.1   0.2   0.1
   2  0.1   0.2   0.2   0.2
   3  0.2   0.1   0.1   0
   ...
 
  Thanks your help!
 
  Kelvin
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R for windows 64 bit

2010-01-13 Thread alessia matano

Thanks to all of you,

therefore if I got it well, I could also use the command

And, say, --max-mem-size=8G would work

from outside R, to further increase the memory even if the machine is
6giga. I had an idea of that before, but I was not so sure!

Many thanks again for all your advices.
Best
alessia

2010/1/12 Uwe Ligges lig...@statistik.tu-dortmund.de:
 On 12.01.2010 21:07, Alexander Shenkin wrote:

 Hi Alessia,

 Note that, while your physical limit might be 6 GB, Windows memory
 management allows more memory than that to be allocated (aka Virtual
 Memory, or at least that's what they called it in XP).  Windows swaps
 out memory from RAM to the hard disk and back when necessary (please
 excuse the explanation if you already know all this).  For processing
 large vectors, this swapping might bring your system to a standstill.
 Regardless, the maximum memory for a windows process is larger than
 the physical RAM you have available.

 allie


 In this case 6Gb was the default (as physical maximum in the particular
 machine) and there was bug in the *experimental* version of R that did not
 allow to increase memory size from within R using memory.limit() which
 already has been fixed thanks to Brian Ripley.

 Uwe Ligges


 On 1/12/2010 6:27 AM, alessia matano wrote:

 Fine, it worked. I will try in this way.

 Just the last question and I won't bother you further today. My
 machine right now has just 6 giga of RAM (it will be increased to 16
 in a few days), and I see that with this experimental version
 memory.limit is 6135.

 How is the command to increase the memory usage until the maximum I
 can (5 giga?). If I am writing memory.limit(5000) it still gives me
 the error:

 don't be silly! Your machine has a 4Gb address limit

 which is quite odd.

 Many thanks
 Best
 A.

 2010/1/12 alessia matanoalexis@gmail.com:

 ok, perfect!
 I will try with it...many many thanks. Have you got there also the
 quantreg package, which has actually the same problem of sparseM
 (32bit version)?

 best
 alessia

 2010/1/12 Uwe Liggeslig...@statistik.tu-dortmund.de:


 On 12.01.2010 12:09, alessia matano wrote:

 I am sorry, I know it is an experimental version, and I have been
 misleading saying a new version.

 Therefore, I will wait for when they will be available officially,
 since it is just a few days.

 Or just use today my private repository I indicated in the other mail.

 Uwe Ligges



 However, I tried also to go to the cran pages and download them and
 insert into the library. For quantreg it worked, for sparseM it did
 not probably because it's a win32 version, as you said.



 2010/1/12 Prof Brian Ripleyrip...@stats.ox.ac.uk:

 On Tue, 12 Jan 2010, alessia matano wrote:


 Dear all,

 I just download and set this new version of R. I am now trying to
 download the packages I need which are sperseM and quantreg. I
 downloaded and insert into the library file the quantreg pacjkage
 and
 it seems to work. However, when I try to do the same with sparseM I
 get the following error message:

 Loading required package: SparseM
 Error in inDL(x, as.logical(local), as.logical(now), ...) :
  unable to load shared library
 'C:/PROGRA~1/R/R-211~1.0DE/library/SparseM/libs/SparseM.dll':
  LoadLibrary failure:  %1 non è un'applicazione di Win32 valida.


 Any help for it?

 Please do refer to the posting referred to in that thread (and
 Henrique,
 please do not post just the URL without the explanations).

 https://stat.ethz.ch/pipermail/r-devel/2010-January/056301.html

 You cannot mix 32-bit Windows binary packages with this experimental
 port
 (it is not a 'new version'): you need to install from the package
 sources.
  If that is too difficult for you, please do not try to use
 unsupported
 experimental builds (and Uwe Ligges may have some binary packages
 available
 for test in a few days).



 Thanks a lot
 alessia

 2010/1/11 Henrique Dallazuannawww...@gmail.com:

 Try this version (beta of development version):

 http://www.stats.ox.ac.uk/pub/RWin/Win64/R-2.11.0dev-win64.exe

 On Mon, Jan 11, 2010 at 2:29 PM, alessia
 matanoalexis@gmail.com
 wrote:

 Dear all,

 do you know if there is any particular version of R to implement
 with
 windows 64 bit, in such a way to increase the amount of memory it
 can
 use?

 How should I increase the memory, and more importantly to set a
 higher
 max vector size? It still stops me saying Could not allocate
 vector
 of size 145

 thanks to all
 alessia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide

[R] selection of multiple subscripts

2010-01-13 Thread e-letter

Readers,

For a data set 'x':

1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i

How to select multiple subscripts to plot? For example to plot values
1:3 and 9:10:

plot(x[1:3,1],x[,2])

and

plot(x[9:10,1],x[,2])

into one plot?

Yours,

rhelpatconference.jabber.org
r251

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Illustrating kernel distribution in wheat ears

2010-01-13 Thread Carl-Göran CG . Pettersson

Hi,

Thanks a lot for your suggestions and the very detailed instructions, I needed 
them...
Everything worked fine also in the full dataset, up until the last suggestion 
(the box plots)

Here I also got an error message, but a different one from what you got. And no 
output...

Here are the last two command lines and the error message:

 q - ggplot(spikes.long, aes(side, value))
 q + geom_boxplot() + facet_grid(~ cultivar)
Error in `[.data.frame`(plot$data, , setdiff(cond, names(df)), drop = FALSE) : 
  undefined columns selected

I used the same variable names and have done the steps suggested up to this 
point, but with a much bigger dataset than in the question sample.

Sorry to say, I don´t understand the error message..
But the first two variants of plots worked nice and are possible to use for me.

All the best
/CG


Från: Dennis Murphy [djmu...@gmail.com]
Skickat: den 11 januari 2010 15:03
Till: Carl-Göran CG. Pettersson
Kopia: r-help@r-project.org
Ämne: Re: [R] Illustrating kernel distribution in wheat ears

Hi:

It wasn't clear to me precisely what you wanted, but here are a couple of ideas 
in the hope that it will help.
I used ggplot2 for the graphics, so it requires some manipulation of your 
dataset from 'wide' format to 'long'.
I also add an indicator for side of the ear (odd is side one (L?), even is side 
2) and a variable I call 'loc' to
indicate the value associated with the splxx variable.

I read the data into a data frame called spikelets. The first step is to remove 
the rows of missing responses:

naind - apply(spikelets[, -1], 1, function(x) all(is.nahttp://is.na(x)))
spikelets2 - spikelets[!naind, ]

Next, I use the plyr package and its melt() function to convert the data frame 
from 'wide' to 'long' form:

library(ggplot2) # attaches the plyr package in the loading process
spikes.long - melt(spikelets2, id = 'cn')

The variable 'variable' contains the variable names as a vector (spl01, spl02, 
..., spl14)
Next, I create a variable called loc, which represents the numeric part of the 
spl variables, and then
create a variable side to distinguish one side of the awn from the other. 
'variable' is then removed...

spikes.long$loc - as.numeric(substring(spikes.long$variable, 4))
spikes.long$side - factor(2 - spikes.long$loc %% 2)
spikes.long$variable - NULL

Now we're in a position to plot. The first is a scatterplot of the response by 
location, stratified by cultivar;
it contains color to distinguish sides.

# With color:
p - qplot(loc, value, data = spikes.long, group = cn,
   colour = side)
p + facet_grid(cn ~ .)

The color is not terribly informative, so to get rid of it, remove the colour = 
side argument. One could
also merge the plots together and fit smooths to the different cultivars.

ggplot(spikes.long, aes(loc, value, colour = cn)) +
geom_point() + geom_smooth(se = FALSE)

I also came up with boxplot pairs by side for each cultivar, which is shown 
below:

q - ggplot(spikes.long, aes(side, value))
q + geom_boxplot() + facet_grid(~ cultivar)

For some reason, I kept getting these messages from every ggplot2 call:

Error in recordGraphics(drawGTree(x), list(x = x), getNamespace(grid)) :
  invalid graphics state

but all of the plots rendered as expected.


HTH,
Dennis

2010/1/10 Carl-Göran CG. Pettersson 
cg.petters...@vpe.slu.semailto:cg.petters...@vpe.slu.se
Dear all

R2.10  WinXP

I have a dataset dealing with the way different wheat cultivars build their 
yield.
Wheat ears are organised in spikelets where the spikelets can be numbered from 
the bottom, with even numbers on one side and odd on the other.
I know how many kernels there were in each spikelet after some months spent 
counting them...

Now I want to illustrate the differences between the cultivars in how the 
kernels are distributed in the ears.
In the best of all possible worlds it would be possible to place histograms or 
boxplots on adjecent sides of vertical lines representing different cultivars.
I have done some experimenting using boxplot() but I am stuck and out of ideas 
right now.

All ideas are welcome!
/CG


Here is a sample dataset with the countings of kernels for the first 14 
spikelets:

cn  spl01   spl02   spl03   spl04   spl05   spl06   spl07   spl08   spl09   
spl10   spl11   spl12   spl13   spl14
Lans1.8 3.1 3.5 3.8 3.8 4.1 4.2 4.3 4.4 
4.5 4.2 4.1 3.9 3.8
Kranich 0.6 2.4 3.4 4.2 4.5 4.7 4.9 4.9 4.8 
4.7 4.4 4.1 4.1 3.9
Loyal   1.1 2.7 3.6 3.7 4.1 4.4 4.4 4.6 4.3 
4.5 4.3 4.1 3.8 3.7
Boomer  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA
Oakley  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA
Hereford0.6 2.3 3.3 3.6 3.9

[R] Odp: selection of multiple subscripts

2010-01-13 Thread Petr PIKAL

Hi

see ?points or ?lines which you would surely found out if you bother to 
look at ?plot help page

Regards
Petr

r-help-boun...@r-project.org napsal dne 13.01.2010 13:36:57:

 Readers,
 
 For a data set 'x':
 
 1 a
 2 b
 3 c
 4 d
 5 e
 6 f
 7 g
 8 h
 9 i
 
 How to select multiple subscripts to plot? For example to plot values
 1:3 and 9:10:
 
 plot(x[1:3,1],x[,2])
 
 and
 
 plot(x[9:10,1],x[,2])
 
 into one plot?
 
 Yours,
 
 rhelpatconference.jabber.org
 r251
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] selection of multiple subscripts

2010-01-13 Thread Duncan Murdoch


On 13/01/2010 7:36 AM, e-letter wrote:

Readers,

For a data set 'x':

1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i

How to select multiple subscripts to plot? For example to plot values
1:3 and 9:10:

plot(x[1:3,1],x[,2])

and

plot(x[9:10,1],x[,2])

into one plot?


Neither of those will work, because your x[,2] vector is longer than the 
other vector.


What you want is something like this:

plot(col2 ~ col1, data=x[c(1:3, 9:10),])

where col1 and col2 are the names of those two columns.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to do FMOLS and DOLS?

2010-01-13 Thread John Hust


Hi,

Can R do FMOLS(Fully Modified OLS) and DOLS(Dynamic OLS)?

I cannot find any useful thing in the present package.

Thanks in advance!
-- 
View this message in context: 
http://n4.nabble.com/How-to-do-FMOLS-and-DOLS-tp1012976p1012976.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] selection of multiple subscripts

2010-01-13 Thread e-letter

On 13/01/2010, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 13/01/2010 7:36 AM, e-letter wrote:
 Readers,

 For a data set 'x':

 1 a
 2 b
 3 c
 4 d
 5 e
 6 f
 7 g
 8 h
 9 i

 How to select multiple subscripts to plot? For example to plot values
 1:3 and 9:10:

 plot(x[1:3,1],x[,2])

 and

 plot(x[9:10,1],x[,2])

 into one plot?

 Neither of those will work, because your x[,2] vector is longer than the
 other vector.

 What you want is something like this:

 plot(col2 ~ col1, data=x[c(1:3, 9:10),])

Thanks, I now understand the concatenate function would help but
forgot the syntax. Anyway I've just realised that the search database
for R yields no result for '?concatenate' which is surprising.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] selection of multiple subscripts

2010-01-13 Thread e-letter

On 13/01/2010, e-letter inp...@gmail.com wrote:
 On 13/01/2010, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 13/01/2010 7:36 AM, e-letter wrote:
 Readers,

 For a data set 'x':

 1 a
 2 b
 3 c
 4 d
 5 e
 6 f
 7 g
 8 h
 9 i

 How to select multiple subscripts to plot? For example to plot values
 1:3 and 9:10:

 plot(x[1:3,1],x[,2])

 and

 plot(x[9:10,1],x[,2])

 into one plot?

 Neither of those will work, because your x[,2] vector is longer than the
 other vector.

 What you want is something like this:

 plot(col2 ~ col1, data=x[c(1:3, 9:10),])

 Thanks, I now understand the concatenate function would help but
 forgot the syntax. Anyway I've just realised that the search database
 for R yields no result for '?concatenate' which is surprising.

For the benefit of other novices: for the data set, the subscripts
should have read:

1:3

and

8:9

Alternatively, the data set should have included:

10 j

:)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] column width in .dbf files using write.dbf ... to be continued

2010-01-13 Thread Arnaud Mosnier

Dear UseRs,

I did not have any answer to my previous message (Is there a way to define
manually columns width when using write.dbf function from the library
foreign ?), so I tried to modify write.dbf function to do what I want.

Here is my modified version :

write.dbfMODIF - function (dataframe, file, factor2char = TRUE, max_nchar =
254, width = d)
{
allowed_classes - c(logical, integer, numeric, character,
factor, Date)
if (!is.data.frame(dataframe))
dataframe - as.data.frame(dataframe)
if (any(sapply(dataframe, function(x) !is.null(dim(x)
stop(cannot handle matrix/array columns)
cl - sapply(dataframe, function(x) class(x[1L]))
asis - cl == AsIs
cl[asis  sapply(dataframe, mode) == character] - character
if (length(cl0 - setdiff(cl, allowed_classes)))
stop(data frame contains columns of unsupported class(es) ,
paste(cl0, collapse = ,))
m - ncol(dataframe)
DataTypes - c(logical = L, integer = N, numeric = F,
character = C, factor = if (factor2char) C else N,
Date = D)[cl]
for (i in seq_len(m)) {
x - dataframe[[i]]
if (is.factor(x))
dataframe[[i]] - if (factor2char)
as.character(x)
else as.integer(x)
else if (inherits(x, Date))
dataframe[[i]] - format(x, %Y%m%d)
}
precision - integer(m)
scale - integer(m)
dfnames - names(dataframe)
for (i in seq_len(m)) {
nlen - nchar(dfnames[i], b)
x - dataframe[, i]
if (is.logical(x)) {
precision[i] - 1L
scale[i] - 0L
}
else if (is.integer(x)) {
rx - range(x, na.rm = TRUE)
rx[!is.finite(rx)] - 0
if (any(rx == 0))
rx - rx + 1
mrx - as.integer(max(ceiling(log10(abs(rx +
3L)
precision[i] - min(max(nlen, mrx), 19L)
scale[i] - 0L
}
else if (is.double(x)) {
precision[i] - 19L
rx - range(x, na.rm = TRUE)
rx[!is.finite(rx)] - 0
mrx - max(ceiling(log10(abs(rx
scale[i] - min(precision[i] - ifelse(mrx  0L, mrx +
3L, 3L), 15L)
}
else if (is.character(x)) {
if (width == d) {
   mf - max(nchar(x[!is.na(x)], b))
p - max(nlen, mf)
if (p  max_nchar)
warning(gettext(character column %d will be truncated
to %d bytes,
  i, max_nchar), domain = NA)
precision[i] - min(p, max_nchar)
scale[i] - 0L
} else {
if (width  max_nchar)
warning(gettext(character column %d will be truncated
to %d bytes,
  i, max_nchar), domain = NA)
precision[i] - min(width, max_nchar)
}
}
else stop(unknown column type in data frame)
}
if (any(is.na(precision)))
stop(NA in precision)
if (any(is.na(scale)))
stop(NA in scale)
invisible(.Call(DoWritedbf, as.character(file), dataframe,
as.integer(precision), as.integer(scale), as.character(DataTypes)))
}


However, when I wanted to use this function ... it does not find the
DoWritedbf function that is called in the last lines (a function written in
C).

Is there a way to temporally replace the original write.dbf function by this
one in the foreign package ?

Thanks,

Arnaud

R version 2.10.0 (2009-10-26)
i386-pc-mingw32

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plotting a linear step function without vertical lines

2010-01-13 Thread walter.djuric

---BeginMessage---
---End Message---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] selection of multiple subscripts

2010-01-13 Thread Duncan Murdoch


On 13/01/2010 8:09 AM, e-letter wrote:

On 13/01/2010, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 13/01/2010 7:36 AM, e-letter wrote:
 Readers,

 For a data set 'x':

 1 a
 2 b
 3 c
 4 d
 5 e
 6 f
 7 g
 8 h
 9 i

 How to select multiple subscripts to plot? For example to plot values
 1:3 and 9:10:

 plot(x[1:3,1],x[,2])

 and

 plot(x[9:10,1],x[,2])

 into one plot?

 Neither of those will work, because your x[,2] vector is longer than the
 other vector.

 What you want is something like this:

 plot(col2 ~ col1, data=x[c(1:3, 9:10),])

Thanks, I now understand the concatenate function would help but
forgot the syntax. Anyway I've just realised that the search database
for R yields no result for '?concatenate' which is surprising.


That's because there's no concatenate function in base R.  If you want 
to search for the word concatenate, use ??concatenate.  You won't 
find the c() function, because it is called combine, but you'll find 
several other ways to concatenate.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] reading fifo with read.table hangs

2010-01-13 Thread Mads Jeppe Tarp-Johansen


To R-helpers,

Running
  R version 2.10.0 (2009-10-26)
  Linux ... 2.6.25.20-0.5-default #1 SMP 2009-08-14 01:48:11 +0200 x86_64 
x86_64 x86_64 GNU/Linux
  openSUSE 11.0 (X86-64)
and having difficulties reading a fifo from within R.

A short example that I find simply haning is shown as 'SHORT SCRIPT' 
below. I expected R to print a data set read from the fifo with the 
numbers 0,1,...7 and then gracefully exit. Any ideas why not?


A longer script that actually does the job in its 2nd clause is shown in 
'LONG SCRIPT' below ... I'm confused that the open call is needed. Any 
comments on this?


Regards MJ

--- SHORT SCRIPT BEGIN
#!/bin/bash

mkfifo chops
gawk 'BEGIN {for (i=0;i8;i++){print i}}'  chops 

R --slave --no-save EOF
print (Hello from R)
con.data - read.table (chops)
con.data
EOF

unlink chops
--- SHORT SCRIPT END


--- LONG SCRIPT BEGIN
#!/bin/bash

DO_1st=no
DO_2nd=yes
DO_3rd=yes

# 1 Hoped for this to work but fails
if [[ $DO_1st =~ [yY][eE][sS] ]] ; then

  echo With R 1
  mkfifo chops
  gawk 'BEGIN {for (i=0;i8;i++){print i}}'  chops 

  R --slave --no-save EOF
print (Hello from R 1)
con.data - read.table (chops)
con.data
EOF
  unlink chops

fi

# 2 Works but with an unexpected open call
if [[ $DO_2nd =~ [yY][eE][sS] ]] ; then

  echo With R 2
  mkfifo chops
  gawk 'BEGIN {for (i=0;i8;i++){print i}}'  chops 

  R --slave --no-save EOF
print (Hello from R 2)
theFifo - fifo(description=chops, open=read)
open(theFifo)  # without this read.table raises error of no lines available
con.data - read.table (theFifo)
close(theFifo)
con.data
EOF
  unlink chops

fi

# 3 Works - just for reference
if [[ $DO_3rd =~ [yY][eE][sS] ]] ; then

  echo With cat
  mkfifo chops
  gawk 'BEGIN {for (i=0;i8;i++){print i}}'  chops 
  cat chops
  unlink chops

fi
--- LONG SCRIPT END

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem fitting a non-linear regression model with nls

2010-01-13 Thread Nathalie Yauschew-Raguenes


Hi,

I'm trying to make a regression of the form :

formula - y ~ Asym_inf  + Asym_sup * ( (1 / (1 + (n1 * (exp( (tmid1-x) 
/ scal1) )^(1/n1) ) ) ) - (1 / (1 + (n2 * (exp( (tmid2-x) / scal2) 
)^(1/n2) ) ) ) )

which is a sum of the generalized logistic model proposed by richards.

with data such as these:

x - c(88,113,128,143,157,172,184,198,210,226,240,249,263,284,302,340)
y - 
c(0.04,0.16,1.09,2.65,2.46,2.43,1.88,2.42,1.51,1.70,1.92,1.35,0.89,0.34,0.13,0.10) 



I use the nls function to fit my data to the model.

nls(formule, data=cbind.data.frame(x,y), start=list(Asym_inf 
=min(y),Asym_inf =max(y)-min(y), 
n1=1,n2=1,tmid1=120,tmid2=250,scal1=11,scal2=30))


and it always finished by one of those answers (even if I change the 
initial values) :
- Error in nls(formule, data = cbind.data.frame(x, y), start = 
list(Asym_inf =min(y),  : \n  le pas 0.000488281 est devenu inférieur à  
'minFactor' de 0.000976562\n
- Error in nls(formule, data = cbind.data.frame(x, y), start = 
list(miny = min(y),  : \n  gradient singulier\n
- Error in numericDeriv(form[[3]], names(ind), env) : \n  Valeur 
manquante ou infinie obtenue au cours du calcul du modèle\n)
- Error in nlsModel(formula, mf, start, wts) : \n  singular gradient 
matrix at initial parameter estimates\n
So it seems that I reach a local extremum each time. I know that most 
of  the problem comes from the choice of the initial values of the 
parameters Asym_inf, Asym_inf, n1, n2, tmid1, tmid2, scal1and scal2.


My question is how could I estimate those initial values so that the nls 
fitting works.


Thanks in advance

--
Nathalie YAUSCHEW-RAGUENES
Ph.D Student

Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement 
(EPHYSE)
INRA, Centre de Bordeaux - Aquitaine
71 Av Edouard Bourlaux
33883 Villenave d'Ornon Cedex
France

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting moving range control chart

2010-01-13 Thread Tom Hopper

I have been having the same problem as poster Hodgess, below. It appears
that her question was never answered, so I would like to share a solution
with the community.

The problem is the (apparent?) inability to produce moving range process
behavior (a.k.a. control) charts with individuals data in the package
qcc (v. 2.0). I have also struggled with the same limitation in package
IQCC (v. 1.0).

The package qAnalyst (v. 0.6.0) provides an option to produce a moving
range chart with individuals data. The example given in the qAnalyst manual
for function spc yields an individuals chart:

 #i-chart, moving range to estimate st. dev. is equal to 2 points with
testType=1,
 data(rawWeight)
 ichart=spc(x=rawWeight$rawWeight, sg=2, type=i, name=weight,
testType=1)
 plot(ichart)
 summary(ichart)

Changing type = 'i' to type = 'mr' yields the moving chart:

 mrchart = spc(x = rawWeight$rawWeight, sg = 2, type = mr, name =
weight, testType = 1)
 plot(mrchart)
 summary(mrchart)

In separate tests, I have confirmed that qAnalyst correctly computes natural
process limits (a.k.a. control limits) for X-bar and R charts, using the
average of the subgroup means. I have not yet checked the calculations for
the ImR or other charts.

An additional difference between these packages is that qAnalyst uses the
lattice library to generate output, while the other two packages appear to
use the (traditional) graphics library.

Regards,

Tom


On Tue, 10 Nov 2009 23:39:23 -0600, Erin Hodgess
erinm.hodgess_at_gmail.comerinm.hodgess_at_gmail.com?Subject=Re:%20[R]%20%20plotting%20moving%20range%20control%20chart
wrote:

 Dear R People:

 I am using qcc for a quality control class.

 I have used qcc with type xbar.one for individuals but cannot determine
 how to plot a moving range control chart.

 Has anyone done that, please?

 Thanks,
 Erin

 --
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodgess_at_gmail.com



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How can I store the results

2010-01-13 Thread Alex Roy

Dear R users,
I am running a R code which gives me 10 columns and
160 rows. I need to run the code for 100 times and each time I need to store
the results in a single file.
I do not know how can I store them in a single file without over writting
the results?

Thanks

Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem fitting a non-linear regression model with nls

2010-01-13 Thread Gabor Grothendieck

You could try the brute force of nls2 package; however, note that you
have 8 parameters and only 16 points so you might look for a more
parsimonious model.  Plotting it it seems somewhat gaussian in shape
so:

mod - nls(y ~ a * dnorm(x, b, c), start = c(a = mean(y)/dnorm(0, 0,
sd(x)), b = mean(x), c = sd(x)))
matplot(x, cbind(y, fitted(mod)), type = c(p, l), pch = 20)



On Wed, Jan 13, 2010 at 9:02 AM, Nathalie Yauschew-Raguenes
nathalie.yauschew-rague...@bordeaux.inra.fr wrote:
 Hi,

 I'm trying to make a regression of the form :

 formula - y ~ Asym_inf  + Asym_sup * ( (1 / (1 + (n1 * (exp( (tmid1-x) /
 scal1) )^(1/n1) ) ) ) - (1 / (1 + (n2 * (exp( (tmid2-x) / scal2) )^(1/n2) )
 ) ) )
 which is a sum of the generalized logistic model proposed by richards.

 with data such as these:

 x - c(88,113,128,143,157,172,184,198,210,226,240,249,263,284,302,340)
 y -
 c(0.04,0.16,1.09,2.65,2.46,2.43,1.88,2.42,1.51,1.70,1.92,1.35,0.89,0.34,0.13,0.10)

 I use the nls function to fit my data to the model.

 nls(formule, data=cbind.data.frame(x,y), start=list(Asym_inf
 =min(y),Asym_inf =max(y)-min(y),
 n1=1,n2=1,tmid1=120,tmid2=250,scal1=11,scal2=30))

 and it always finished by one of those answers (even if I change the initial
 values) :
 - Error in nls(formule, data = cbind.data.frame(x, y), start =
 list(Asym_inf =min(y),  : \n  le pas 0.000488281 est devenu inférieur à
  'minFactor' de 0.000976562\n
 - Error in nls(formule, data = cbind.data.frame(x, y), start = list(miny =
 min(y),  : \n  gradient singulier\n
 - Error in numericDeriv(form[[3]], names(ind), env) : \n  Valeur manquante
 ou infinie obtenue au cours du calcul du modèle\n)
 - Error in nlsModel(formula, mf, start, wts) : \n  singular gradient matrix
 at initial parameter estimates\n
 So it seems that I reach a local extremum each time. I know that most of
  the problem comes from the choice of the initial values of the parameters
 Asym_inf, Asym_inf, n1, n2, tmid1, tmid2, scal1and scal2.

 My question is how could I estimate those initial values so that the nls
 fitting works.

 Thanks in advance

 --
 Nathalie YAUSCHEW-RAGUENES
 Ph.D Student

 Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement
 (EPHYSE)
 INRA, Centre de Bordeaux - Aquitaine
 71 Av Edouard Bourlaux
 33883 Villenave d'Ornon Cedex
 France

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] wrong with using subset

2010-01-13 Thread Ahmet Temiz


hello 

is it wrong with this expression:

subset(dfpr2_r,(as.numeric(as.character(dfpr2_r$pr2)))  0.2  (dfpr2_r$landa
 10))

it gives nothing

regards
--
Open WebMail Project (http://openwebmail.org)


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] = returns wrong result? Why

2010-01-13 Thread Magnus Torfason


Yupp, FAQ 7.31 is definitely your friend here.

You might also want to take a look at these two very recent threads on 
this help list:


Strange behaviour of as.integer()
http://tolstoy.newcastle.edu.au/R/e9/help/10/01/index.html#547

Newbie question on precision
http://tolstoy.newcastle.edu.au/R/e9/help/10/01/index.html#718

Best,
Magnus

On 1/13/2010 3:25 AM, Stephan Kolassa wrote:


take a look at FAQ 7.31.

Trafim Vanishek wrote:

Does anybody know the probable reason why = gives false when it
should give true?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dynamic file / url name with read.csv

2010-01-13 Thread ivan popivanov


A few packages have support for basic download from Yahoo Finance. If that's 
what you are trying to achieve - you may want to try quantmod (getSymbols 
function) or tseries (get.hist.quote function). If you want to do something not 
supported yet - first take a look at their source code.

Regards,
Ivan

 From: k...@csusb.edu
 Date: Tue, 12 Jan 2010 22:25:17 -0800
 To: r-help@r-project.org
 Subject: Re: [R] Dynamic file / url name with read.csv
 
 
 A few suggestions:
   Don't mix ' and 
   Use paste()
   Don't include an extraneous ;
 
 SymA- SPY
 Sym1- 
 paste(http://ichart.finance.yahoo.com/table.csv?s=,SymA,ignore=.csv,sep=;)
 Symbol- read.csv(Sym1, stringsAsFactors=F)
 
 
 On Jan 12, 2010, at 10:03 PM, B S wrote:
  
  Hi- 
  
  I would like to be able to change the value of SymA below and download a 
  file from the corresponding URL.  Hardcoded, this line works fine: 
  
  Symbol- 
  read.csv(http://ichart.finance.yahoo.com/table.csv?s=SPYignore=.csv;, 
  stringsAsFactors=F)
  
  However, when I incorporate using a variable for the ticker, it no longer 
  works.  
  
  SymA- SPY
  Sym1- 
  cat('http://ichart.finance.yahoo.com/table.csv?s=,SymA,ignore=.csv,sep=;;)
  Symbol- read.csv(Sym1, stringsAsFactors=F)
  
  I know that the problem lies in the concatenation, but I've tried different 
  variations of cat() and toString() (and others) with SymA and Sym1 but 
  cannot seem to get a string together that will work.  Would appreciate any 
  suggestions for this simple problem?? 
  
  Thank you. 
  
  
  
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem fitting a non-linear regression model with nls

2010-01-13 Thread Bert Gunter

 My question is how could I estimate those initial values so that the nls
 fitting works.

You can't. Your parameters are almost certainly nonidentifiable (which is
what Gabor told you more gracefully).

Just because you believe in a complex (often mechanistic) nonlinear model
and have some data does not assure that the model parameters can be
estimated. If you do not understand why this is so, consider fitting even a
simple 4 parameter logistic when the data do not level off at the top and/or
bottom end. There are then infinitely many solutions in which the parameters
trade off with one another to give essentially identical fits. That is
what the singular gradient message is trying to tell you.

Bert Gunter
Genentech Nonclinical Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] wrong with using subset

2010-01-13 Thread Don MacQueen


I would suggest that first you look at the results of

  (as.numeric(as.character(dfpr2_r$pr2)))  0.2  (dfpr2_r$landa  10)

by itself. Does it give all FALSE ?

Then look at each of the parts separately. What are the results of

   (as.numeric(as.character(dfpr2_r$pr2)))  0.2
and
   dfpr2_r$landa  10

Are there any TRUE among the results?

Does
   as.numeric(as.character(dfpr2_r$pr2))
give what you expect?

-Don

At 5:20 PM +0200 1/13/10, Ahmet Temiz wrote:

hello

is it wrong with this expression:

subset(dfpr2_r,(as.numeric(as.character(dfpr2_r$pr2)))  0.2  (dfpr2_r$landa

 10))


it gives nothing

regards
--
Open WebMail Project (http://*openwebmail.org)


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

__
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] wrong with using subset

2010-01-13 Thread Duncan Murdoch


On 13/01/2010 10:45 AM, Don MacQueen wrote:

I would suggest that first you look at the results of

   (as.numeric(as.character(dfpr2_r$pr2)))  0.2  (dfpr2_r$landa  10)

by itself. Does it give all FALSE ?
  


I'd guess the problem is using  instead of . 


Duncan Murdoch

Then look at each of the parts separately. What are the results of

(as.numeric(as.character(dfpr2_r$pr2)))  0.2
and
dfpr2_r$landa  10

Are there any TRUE among the results?

Does
as.numeric(as.character(dfpr2_r$pr2))
give what you expect?

-Don

At 5:20 PM +0200 1/13/10, Ahmet Temiz wrote:
hello

is it wrong with this expression:

subset(dfpr2_r,(as.numeric(as.character(dfpr2_r$pr2)))  0.2  (dfpr2_r$landa
  10))

it gives nothing

regards
--
Open WebMail Project (http://*openwebmail.org)


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

__
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ask about large data set

2010-01-13 Thread Magnus Torfason


On 1/12/2010 8:29 PM, Yi Du wrote:

Hi,

Is that okay to let R to read data set more than 1 rows and
use it to do some kernel density estimation? Thanks.

Yi


Why don't you just try it and see? Nothing bad will happen - the 
absolute worst case scenario is that R will hang.


But I can tell you that reading 1 rows should be a piece of cake on 
any decent computer. Different estimation techniques are different in 
terms of computational intensity. Trying it is the best approach. If you 
run into problems, you could come back with specific questions of 
optimization.


Best,
Magnus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem fitting a non-linear regression model with nls

2010-01-13 Thread Nathalie Yauschew-Raguenes


Actually, the data that I used are measurements of plant growth during
an entire year.It is usual to model the growth with logistic models.
I have already tried the simple logistic model (which works). But the
problem is that with this model the inflexion point occurs half-way up
or down the logistic curve.
Thats why, despite the small amount of measurements, I wanted to try the
generalized logistic model proposed by richards.

So I will still try the nls2 package, just in case. And if it doesn't
work, I'll use a more parsimonious model as you two have suggested.
Thank you for your answers

--
Nathalie YAUSCHEW-RAGUENES
Ph.D Student
Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I store the results

2010-01-13 Thread Greg Snow

You could put all of your results into a single list, then just save the list.

Or, functions like write.table and write have an append argument, set that to 
true and the information will be appended to the file rather than overwriting 
it.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Alex Roy
 Sent: Wednesday, January 13, 2010 8:00 AM
 To: r-help@r-project.org
 Subject: [R] How can I store the results
 
 Dear R users,
 I am running a R code which gives me 10 columns
 and
 160 rows. I need to run the code for 100 times and each time I need to
 store
 the results in a single file.
 I do not know how can I store them in a single file without over
 writting
 the results?
 
 Thanks
 
 Alex
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I store the results

2010-01-13 Thread jim holtman

Collect the results in a list (one entry for each matrix) and then 'save'
the list.  When you 'load' it back in, you can easily reference each element
for further processing.

On Wed, Jan 13, 2010 at 9:59 AM, Alex Roy alexroy2...@gmail.com wrote:

 Dear R users,
I am running a R code which gives me 10 columns and
 160 rows. I need to run the code for 100 times and each time I need to
 store
 the results in a single file.
 I do not know how can I store them in a single file without over writting
 the results?

 Thanks

 Alex

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R package dependencies

2010-01-13 Thread Colin Millar

Hi there,
 
My question relates to getting information about R packages.  In particular i 
would like to be able to find from within R:
  what are a packages dependencies
  what are a packages reverse dependencies
  does a package contain a dll
 
The reason i ask is:
 
The organisation that i work for is introducing a secure intranet operating on 
windows PCs and laptops, and this requires that all software / executables / 
dlls are validated before they are combined to produce a generic PC build.
 
I would like to maximise the packages available to our staff and so for the 
packages that we have listed as buisness needs, i would like to include all 
reverse dependencies of this collection that do not have dlls.
 
I hope this makes sense (the question not the reason).
 
Kind regards,
Colin.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert factor data to numeric

2010-01-13 Thread Peter Ehlers


S Devriese wrote:

On 01/13/2010 10:47 AM, Ahmet Temiz wrote:

hello

 could you give me a hint to convert data in factor type to numeric (float) ?

  regards

--
Open WebMail Project (http://openwebmail.org)



you could try as.numeric but without more details it is difficult to see
if this will work. How did you end up with a factor (e.g. through import)?


No, don't use as.numeric(). Do follow Dimitris' advice.
But the question of how you got the factor data is good; you
can usually avoid getting factors to begin with.

 -Peter Ehlers


Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary
403.202.3921

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] exporting data frame - write foreign inconsistencies

2010-01-13 Thread John Cullen

Hello List,

I have a data frame object (wa2) that I am exporting for use in
another statistics package. Using

library(foreign)
write.foreign(wa2, choose.files(), choose.files(), package='SPSS')

I noticed that there were several differences between the data sets as
seen within R (View(wa2)) and what was produced in SPSS.  Examining
the data file produced by write.foreign (before running the generated
SPSS syntax), I noticed the same inconsistencies.

I then used:

write.table(wa2, choose.files(), sep=,, col.names=TRUE,
row.names=FALSE, quote=TRUE, na=NA)

and the file generated using this method matched what was in the R object.
I'm trying to send this dataset to a colleague who will only use SPSS.
Any ideas why the two methods produce different data files?

--
sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] Rcmdr_1.5-4car_1.2-16 relimp_1.0-1   foreign_0.8-39

loaded via a namespace (and not attached):
[1] tools_2.10.1
--

Thanks in advance.

Sincerely;

John Cullen, M.Sc.
caninesinmotion.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?

2010-01-13 Thread bbslover


 Hello,
   I am learning randomForest, now I want to boxplot mse and mtry using 20
5-fold cross-validation(using median value), but I have no a good method to
do it, except a not good method.

randomforest package itself did not contain cross-validating method, and
caret package contain cross validation method, but how can I get the the all
number of mtry , at the same time corresponding mse?


-- 
View this message in context: 
http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Method for reduction of independent variables

2010-01-13 Thread rubystallion


Hello

I am currently investing software code metrics for a variety of software
projects of a company to determine the worst parts of software products
according to specified quality characteristics. 
As the gathering of metrics correlates with effort, I would like to find a
subset of the metrics preserving significant predictive power for the
problem value while using the least amount of code metrics. 

I have the results of 25 metrics for 6 software projects for a combined 9355
individuals, i.e. software parts with metrics.
However, as many metrics only measure metric values above a predefined
limit, 58% of the responses for independent variables are 0.

Which method can I use to determine a reduced set of independent variables
with significant predictive power?
As I do not have a statistics background, I would also appreciate a simple
explanation of the chosen method and sensible choices for parameters, so
that I will be able to infer the reduced set of software metrics to keep.

Thank you in advance!

Johannes
-- 
View this message in context: 
http://n4.nabble.com/Method-for-reduction-of-independent-variables-tp1013171p1013171.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert factor data to numeric

2010-01-13 Thread Nathalie Yauschew-Raguenes


Hello,

I find a way to convert data in factor type to numeric :
data_numeric - as.numeric(as.character(data_factor)).
It's treaky but works.


Peter Ehlers a écrit :

S Devriese wrote:

On 01/13/2010 10:47 AM, Ahmet Temiz wrote:

hello

 could you give me a hint to convert data in factor type to numeric 
(float) ?


  regards

--
Open WebMail Project (http://openwebmail.org)



you could try as.numeric but without more details it is difficult to see
if this will work. How did you end up with a factor (e.g. through 
import)?



No, don't use as.numeric(). Do follow Dimitris' advice.
But the question of how you got the factor data is good; you
can usually avoid getting factors to begin with.

 -Peter Ehlers


Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.







--
Nathalie YAUSCHEW-RAGUENES
Ph.D Student

Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement 
(EPHYSE)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Applying function to parts of a matrix based on a factor

2010-01-13 Thread John Sorkin

R 2.9
Windows XP

I have a matrix, Data, which contains a factor Sex and a continuous variable 
Age.
I want to get mean age by sex. I know I can do this with two statements,
mean(Data[Age,Data[,Sex]==Male) and  
mean(Data[Age,Data[,Sex]==Female)

I know this can be done in a single command, but I can remember how. There is a 
function that allows another function work within factors, something like
magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is 
not in the lapply, sapply etc. family

Please put me out of my misery (and senior moment) and remind me what function 
I should be using. 




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying function to parts of a matrix based on a factor

2010-01-13 Thread Doran, Harold

with(yourdataframe, tapply(age,sex,mean))

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of John Sorkin
Sent: Wednesday, January 13, 2010 12:11 PM
To: r-help@r-project.org
Subject: [R] Applying function to parts of a matrix based on a factor

R 2.9
Windows XP

I have a matrix, Data, which contains a factor Sex and a continuous variable 
Age.
I want to get mean age by sex. I know I can do this with two statements,
mean(Data[Age,Data[,Sex]==Male) and  
mean(Data[Age,Data[,Sex]==Female)

I know this can be done in a single command, but I can remember how. There is a 
function that allows another function work within factors, something like
magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is 
not in the lapply, sapply etc. family

Please put me out of my misery (and senior moment) and remind me what function 
I should be using. 

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying function to parts of a matrix based on a factor

2010-01-13 Thread Dimitris Rizopoulos


try this:

with(Data, tapply(Age, Sex, mean))


I hope it helps.

Best,
Dimitris


John Sorkin wrote:

R 2.9
Windows XP

I have a matrix, Data, which contains a factor Sex and a continuous variable 
Age.
I want to get mean age by sex. I know I can do this with two statements,
mean(Data[Age,Data[,Sex]==Male) and  
mean(Data[Age,Data[,Sex]==Female)


I know this can be done in a single command, but I can remember how. There is a 
function that allows another function work within factors, something like
magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is 
not in the lapply, sapply etc. family

Please put me out of my misery (and senior moment) and remind me what function I should be using. 





John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I store the results

2010-01-13 Thread Gavin Simpson

On Wed, 2010-01-13 at 15:59 +0100, Alex Roy wrote:
 Dear R users,
 I am running a R code which gives me 10 columns and
 160 rows. I need to run the code for 100 times and each time I need to store
 the results in a single file.
 I do not know how can I store them in a single file without over writting
 the results?

In a list?

results - vector(mode = list, length = 100)
for(i in seq_along(results) {
## do something
## 
## store result for iteration i
results[[i]] - something
}

results will now contain 100 matrices of dim 160x10.

HTH

G

 
 Thanks
 
 Alex
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Simulation numbers from a probability table

2010-01-13 Thread Kelvin

Dear friends,

If I have a table like this, first row A B C D ... are different
levels of the variable, first column 0 1 2 4 ... are the levels of the
numbers, the numbers inside the table are the probabilities of the
number occuring.

A  B  C   D...
0  0.20.30.10.05
1  0.10.10.20.2
2  0.02  0.20   0.1
4  0.30.01  0.01   0.4
...

How can I use R to do the simulation and get a table like this, first
row A B C D ... are different levels of the variable, the numbers
inside the table are the numbers simulated from the probailties
table above?

A  B  C  D ...
0  4   2   0
2   2  0   1
0   1  4   1
2   2  0   0
...


Thanks for help!


Kelvin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optimization challenge

2010-01-13 Thread Greg Snow

WOW, your results give about half the variance of my best optim run (possibly 
due to my suboptimal use of optim).

Can you describe a little what the algorithm is doing?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: Albyn Jones [mailto:jo...@reed.edu]
 Sent: Tuesday, January 12, 2010 5:31 PM
 To: Greg Snow
 Cc: r-help@r-project.org
 Subject: Re: [R] optimization challenge
 
 Greg
 
 Nice problem: I wasted my whole day on it :-)
 
 I was explaining my plan for a solution to a colleague who is a
 computer scientist, he pointed out that I was trying to re-invent the
 wheel known as dynamic programming.  here is my code, apparently it is
 called bottom up dynamic programming.  It runs pretty quickly, and
 returns (what I hope is :-) the optimal sum of squares and the
 cut-points.
 
 function(X=bom3$Verses,days=128){
 # find optimal BOM reading schedule for Greg Snow
 # minimize variance of quantity to read per day over 128 days
 #
 N = length(X)
 Nm1 = N-1
 SSQ- matrix(NA,nrow=days,ncol=N)
 Cuts - list()
 #
 #  SSQ[i,j]: the ssqs about the overall mean for the optimal partition
 #   for i days on the chapters 1 to j
 #
 M = sum(X)/days
 CS = cumsum(X)
 SSQ[1,]= (CS-M)^2
 Cuts[[1]]= as.list(1:N)
 #
 for(m in 2:days){
 Cuts[[m]]=list()
 #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]]
 for(n in m:N){
 CS = cumsum(X[n:1])[n:1]
 SSQ1 = (CS-M)^2
 j = (m-1):(n-1)
 TS = SSQ[m-1,j]+(SSQ1[j+1])
 SSQ[m,n] = min(TS)
   k = min(which((min(TS)== TS)))+m-1
   Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n)
 }
 }
 list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]])
 }
 
 $SSQ
 [1] 11241.05
 
 $Cuts
   [1]   2   4   7   9  11  13  15  16  17  19  21  23  25  27  30  31
 34  37
  [19]  39  41  44  46  48  50  53  56  59  60  62  64  66  68  70  73
 75  77
  [37]  78  80  82  84  86  88  89  91  92  94  95  96  97  99 100 103
 105 106
  [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135
 137 138
  [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163
 164 166
  [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194
 196 199
 [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228
 234 236
 [127] 238 239
 
 
 
 
 On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote:
  I have a challenge that I want to share with the group.
 
  This is not homework (but I may assign it as such if I teach the
 appropriate class again) and I have found one solution, so don't need
 anything urgent.  This is more for fun to see if others can find a
 better solution than I did.
 
  The challenge:
 
  I want to read a book in a given number of days.  I want to read an
 integer number of chapters each day (there are more chapters than
 days), no stopping part way through a chapter, and at least 1 chapter
 each day.  The chapters are very non uniform in length (some very
 short, a few very long, many in between) so I would like to come up
 with a reading schedule that minimizes the variance of the length of
 the days readings (read multiple short chapters on the same day, long
 chapters are the only one read that day).  I also want to read through
 the book in order (no skipping ahead to combine short chapters that are
 not naturally next to each other.
 
  My thought was that the optim function with method=SANN would be an
 appropriate approach, but my first couple of tries did not give very
 good results.  I have since come up with an optim with SANN solution
 that gives what I consider good results (but I accept that better is
 possible).
 
  Below is a data frame with the lengths of the chapters for the book
 that originally sparked the challenge for me (but the general idea
 should work for any book).  Each row represents a chapter (in order)
 with 3 different measures of the length of the chapter.
 
  For this challenge I want to read the book in 128 days (there are 239
 chapters).
 
  I will post my solutions in a few days, but I want to wait so that my
 direction does not influence people from trying other approaches (if
 there is something better than optim, that is fine).
 
  Good luck for anyone interested in the challenge,
 
  The data frame:
 
  bom3 - structure(list(Chapter = structure(1:239, .Label = c(1 Nephi
 1,
  1 Nephi 2, 1 Nephi 3, 1 Nephi 4, 1 Nephi 5, 1 Nephi 6,
  1 Nephi 7, 1 Nephi 8, 1 Nephi 9, 1 Nephi 10, 1 Nephi 11,
  1 Nephi 12, 1 Nephi 13, 1 Nephi 14, 1 Nephi 15, 1 Nephi 16,
  1 Nephi 17, 1 Nephi 18, 1 Nephi 19, 1 Nephi 20, 1 Nephi 21,
  1 Nephi 22, 2 Nephi 1, 2 Nephi 2, 2 Nephi 3, 2 Nephi 4,
  2 Nephi 5, 2 Nephi 6, 2 Nephi 7, 2 Nephi 8, 2 Nephi 9,
  2 Nephi 10, 2 Nephi 11, 2 Nephi 12, 2 Nephi 13, 2 Nephi 14,
  2 Nephi 15, 2 Nephi 16, 2 Nephi 17, 2 Nephi 18, 2 Nephi 19,
  2 Nephi 20, 2 Nephi 21, 2 Nephi 22, 2 Nephi 23, 2 Nephi 24,
  2

Re: [R] Applying function to parts of a matrix based on a factor

2010-01-13 Thread Heinz Tuechler


If your matrix were a data.frame, it could work like this:

df - data.frame(age=1:100, sex=rep(1:2, 50))
with(df, by(age, sex, mean))

without the lapply, sapply etc. family.

h

At 18:16 13.01.2010, Doran, Harold wrote:

with(yourdataframe, tapply(age,sex,mean))

-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin

Sent: Wednesday, January 13, 2010 12:11 PM
To: r-help@r-project.org
Subject: [R] Applying function to parts of a matrix based on a factor

R 2.9
Windows XP

I have a matrix, Data, which contains a factor Sex and a continuous 
variable Age.

I want to get mean age by sex. I know I can do this with two statements,
mean(Data[Age,Data[,Sex]==Male) and
mean(Data[Age,Data[,Sex]==Female)

I know this can be done in a single command, but I can remember how. 
There is a function that allows another function work within 
factors, something like
magicfunction(Data,Factor=Sex). n.b. I know the function I am 
looking for is not in the lapply, sapply etc. family


Please put me out of my misery (and senior moment) and remind me 
what function I should be using.





John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for t...{{dropped:9}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simulation numbers from a probability table

2010-01-13 Thread Tal Galili

If the trials are not connected then I would consider melting the table
using melt() from the reshape package.
And then using lapply() with the function
random.function - function(my.prob, number.of.observations = 10)
{
sum(rbinom(number.of.observations, 1, my.prob))
}


in case the trials are connected, by column,
than you could use
apply(the.data.table, 2, a.function)
on it. Where a.function will to multinum distribution (for which I don't
remember the function at the moment, but it can be searched).


Best,
Tal.




Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com/ (English)
--




On Wed, Jan 13, 2010 at 7:20 PM, Kelvin 6kelv...@gmail.com wrote:

 Dear friends,

 If I have a table like this, first row A B C D ... are different
 levels of the variable, first column 0 1 2 4 ... are the levels of the
 numbers, the numbers inside the table are the probabilities of the
 number occuring.

A  B  C   D...
 0  0.20.30.10.05
 1  0.10.10.20.2
 2  0.02  0.20   0.1
 4  0.30.01  0.01   0.4
 ...

 How can I use R to do the simulation and get a table like this, first
 row A B C D ... are different levels of the variable, the numbers
 inside the table are the numbers simulated from the probailties
 table above?

A  B  C  D ...
0  4   2   0
2   2  0   1
0   1  4   1
2   2  0   0
...


Thanks for help!


Kelvin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ask for histogram

2010-01-13 Thread Yi Du

Hi,


I use a vector of data to draw the histogram, but it is different from the
graph by SAS. Can you check it for me please?

b is a column vector of 4332

hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
rug(b)

When I used rug, I find the records are smaller than 4332. I don't know
where I did wrong.

Thanks.

-- 
Yi Du

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert factor data to numeric

2010-01-13 Thread S Devriese

On 01/13/2010 05:41 PM, Peter Ehlers wrote:
 S Devriese wrote:
 On 01/13/2010 10:47 AM, Ahmet Temiz wrote:
 hello

  could you give me a hint to convert data in factor type to numeric
 (float) ?

   regards

 -- 
 Open WebMail Project (http://openwebmail.org)


 you could try as.numeric but without more details it is difficult to see
 if this will work. How did you end up with a factor (e.g. through
 import)?

 No, don't use as.numeric(). Do follow Dimitris' advice.
 But the question of how you got the factor data is good; you
 can usually avoid getting factors to begin with.
 
  -Peter Ehlers
 
 Stephan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 

I know, slightly sloppy answer (see Dimitri's answer), but I hoped to
find out how he got the factor in the first place, because if it is an
import issue (and e,g. decimal character is different from the locale
decimal character) the FAQ answer might not work as expected.

Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ask for histogram

2010-01-13 Thread Steve Lianoglou

Hi,

On Wed, Jan 13, 2010 at 12:58 PM, Yi Du abraham...@gmail.com wrote:
 Hi,


 I use a vector of data to draw the histogram, but it is different from the
 graph by SAS. Can you check it for me please?

How are we supposed to check something without data, pictures, etc?
What do you want checking, exactly?

 b is a column vector of 4332

 hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
 rug(b)

 When I used rug, I find the records are smaller than 4332. I don't know
 where I did wrong.

What do you mean? Is the histogram that you're getting surprising? Is
the result of adding a rug surprising?

Are you actually trying to count 4332 tick marks at the bottom of your
plot? What records are smaller than 4332?

Try to see if what rug returns, eg:

r - rug(b)

length(r) should be as long as your `b` vector

I'm not sure what you're asking, but hopefully some of the info I
threw at you is helpful. Please be a bit more specific with any follow
up if you still find anything confusing.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optimization challenge

2010-01-13 Thread Albyn Jones

The key idea is that you are building a matrix that contains the
solutions to smaller problems which are sub-problems of the big
problem.  The first row of the matrix SSQ contains the solution for no
splits, ie SSQ[1,j] is just the sum of squares about the overall mean
for reading chapters1 through j in one day.  The iteration then uses
row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
chapters in m-1 days) is part of the overall optimal solution, you
have already computed it, and so don't ever need to recompute it.

   TS = SSQ[m-1,j]+(SSQ1[j+1])

computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) 
breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to
n in 1 day.  j is a vector in the function, and min(TS) is the minimum
over choices of j, ie SSQ[m,n].

At the end, SSQ[128,239] is the optimal value for reading all 239
chapters in 128 days.  That's just the objective function, so the rest
involves constructing the list of optimal cuts, ie which chapters are
grouped together for each day's reading.  That code uses the same
idea... constructing a list of lists of cutpoints.

statisticians should study a bit of data structures and algorithms!

albyn

On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
 WOW, your results give about half the variance of my best optim run (possibly 
 due to my suboptimal use of optim).
 
 Can you describe a little what the algorithm is doing?
 
 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
  -Original Message-
  From: Albyn Jones [mailto:jo...@reed.edu]
  Sent: Tuesday, January 12, 2010 5:31 PM
  To: Greg Snow
  Cc: r-help@r-project.org
  Subject: Re: [R] optimization challenge
  
  Greg
  
  Nice problem: I wasted my whole day on it :-)
  
  I was explaining my plan for a solution to a colleague who is a
  computer scientist, he pointed out that I was trying to re-invent the
  wheel known as dynamic programming.  here is my code, apparently it is
  called bottom up dynamic programming.  It runs pretty quickly, and
  returns (what I hope is :-) the optimal sum of squares and the
  cut-points.
  
  function(X=bom3$Verses,days=128){
  # find optimal BOM reading schedule for Greg Snow
  # minimize variance of quantity to read per day over 128 days
  #
  N = length(X)
  Nm1 = N-1
  SSQ- matrix(NA,nrow=days,ncol=N)
  Cuts - list()
  #
  #  SSQ[i,j]: the ssqs about the overall mean for the optimal partition
  #   for i days on the chapters 1 to j
  #
  M = sum(X)/days
  CS = cumsum(X)
  SSQ[1,]= (CS-M)^2
  Cuts[[1]]= as.list(1:N)
  #
  for(m in 2:days){
  Cuts[[m]]=list()
  #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]]
  for(n in m:N){
CS = cumsum(X[n:1])[n:1]
SSQ1 = (CS-M)^2
j = (m-1):(n-1)
TS = SSQ[m-1,j]+(SSQ1[j+1])
SSQ[m,n] = min(TS)
k = min(which((min(TS)== TS)))+m-1
Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n)
  }
  }
  list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]])
  }
  
  $SSQ
  [1] 11241.05
  
  $Cuts
[1]   2   4   7   9  11  13  15  16  17  19  21  23  25  27  30  31
  34  37
   [19]  39  41  44  46  48  50  53  56  59  60  62  64  66  68  70  73
  75  77
   [37]  78  80  82  84  86  88  89  91  92  94  95  96  97  99 100 103
  105 106
   [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135
  137 138
   [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163
  164 166
   [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194
  196 199
  [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228
  234 236
  [127] 238 239
  
  
  
  
  On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote:
   I have a challenge that I want to share with the group.
  
   This is not homework (but I may assign it as such if I teach the
  appropriate class again) and I have found one solution, so don't need
  anything urgent.  This is more for fun to see if others can find a
  better solution than I did.
  
   The challenge:
  
   I want to read a book in a given number of days.  I want to read an
  integer number of chapters each day (there are more chapters than
  days), no stopping part way through a chapter, and at least 1 chapter
  each day.  The chapters are very non uniform in length (some very
  short, a few very long, many in between) so I would like to come up
  with a reading schedule that minimizes the variance of the length of
  the days readings (read multiple short chapters on the same day, long
  chapters are the only one read that day).  I also want to read through
  the book in order (no skipping ahead to combine short chapters that are
  not naturally next to each other.
  
   My thought was that the optim function with method=SANN would be an
  appropriate approach, but my first couple of tries did

Re: [R] Simulation numbers from a probability table

2010-01-13 Thread Peter Ehlers


Try this:

dat - data.frame(x=11:14, pa=1:4/10, pb=4:1/10)
f - function(numreps, data){
  pmat - as.matrix(data[-1])
  x - data[,1]
  result - matrix(0, nrow=numreps, ncol=ncol(pmat))
  colnames(result) - c(A, B)
  for(i in seq_len(numreps)){
result[i,] - apply(pmat, 2, function(p) sample(x, 1, prob=p))
  }
  result
}
f(5, dat)

 -Peter Ehlers

Kelvin wrote:

Dear friends,

If I have a table like this, first row A B C D ... are different
levels of the variable, first column 0 1 2 4 ... are the levels of the
numbers, the numbers inside the table are the probabilities of the
number occuring.

A  B  C   D...
0  0.20.30.10.05
1  0.10.10.20.2
2  0.02  0.20   0.1
4  0.30.01  0.01   0.4
...

How can I use R to do the simulation and get a table like this, first
row A B C D ... are different levels of the variable, the numbers
inside the table are the numbers simulated from the probailties
table above?

A  B  C  D ...
0  4   2   0
2   2  0   1
0   1  4   1
2   2  0   0
...


Thanks for help!


Kelvin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary
403.202.3921

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Method for reduction of independent variables

2010-01-13 Thread Daniel Malter

Hi, please read the posting guide. You are not likely to get an extensive
answer to your question from this list. Your question is a please
solve/explain my statistical problem for me question. There are two things
problematic with that. First, statistical, and second please solve for
me.

First, the R-help list is mostly concerned with problems in implementing
analyses in R, not with the (choice of the) statistical approach per se
(there are few exceptions). Second, please solve for me questions are
generally frowned upon, unless you evidence a specific point at which you
are stuck and have to make a choice. That is, the list members want to see
that you have done your homework to the extent one can expect you to. To
ask the list to provide an introduction to data reduction methods without
having any background knowledge is, frankly, a waste of your and the list
members' time. There are books on the topic, which you can buy or lend, and
certainly many online sources to give you a basic background. Or you can
start here: http://en.wikipedia.org/wiki/Dimension_reduction. If you want
your statistical questions answered and problems solved without reading
yourself into the matter, your question is more suitable for a local
statistician at your institution or a paid service rather than this list.

Best,
Daniel 

-
cuncta stricte discussurus
-
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of rubystallion
Sent: Wednesday, January 13, 2010 11:57 AM
To: r-help@r-project.org
Subject: [R] Method for reduction of independent variables


Hello

I am currently investing software code metrics for a variety of software
projects of a company to determine the worst parts of software products
according to specified quality characteristics. 
As the gathering of metrics correlates with effort, I would like to find a
subset of the metrics preserving significant predictive power for the
problem value while using the least amount of code metrics. 

I have the results of 25 metrics for 6 software projects for a combined 9355
individuals, i.e. software parts with metrics.
However, as many metrics only measure metric values above a predefined
limit, 58% of the responses for independent variables are 0.

Which method can I use to determine a reduced set of independent variables
with significant predictive power?
As I do not have a statistics background, I would also appreciate a simple
explanation of the chosen method and sensible choices for parameters, so
that I will be able to infer the reduced set of software metrics to keep.

Thank you in advance!

Johannes
-- 
View this message in context:
http://n4.nabble.com/Method-for-reduction-of-independent-variables-tp1013171
p1013171.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?

2010-01-13 Thread Max Kuhn

In caret, see ?trainControl. Use returnResamp = all

Max

On Wed, Jan 13, 2010 at 9:47 AM, bbslover dlu...@yeah.net wrote:

  Hello,
   I am learning randomForest, now I want to boxplot mse and mtry using 20
 5-fold cross-validation(using median value), but I have no a good method to
 do it, except a not good method.

 randomforest package itself did not contain cross-validating method, and
 caret package contain cross validation method, but how can I get the the all
 number of mtry , at the same time corresponding mse?


 --
 View this message in context: 
 http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Formula for normal distribution with know mean and standard error and n terms

2010-01-13 Thread Steve_Friedman


Hello,

I am searching for a method to calculate a normal distribution.

For example this equation is used to calculate the normal curve when the
mean and standard deviation are know.
p(x) = (1/σ*sqrt(2π)) x exp (- (x-μ)2/2σ2)


or
(Embedded image moved to file: pic27350.jpg)Normal Probability Distribution
Formula


However, some of the literature I'm reading (I'm building an ecological
niche model for vegetation along several ecological gradients) report the
standard error instead and n sample size.  Is there an equivalent formula ?
If so, how can I also normalize the p(x) term to be within the 0-1 range?


Thank you all
Steve


Steve Friedman Ph. D.
Spatial Ecological Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] New sp release

2010-01-13 Thread Roger Bivand

The sp package provides class definitions for spatial data, and utilities 
for spatial data handling and manipulation.


The release of sp version 0.9-56 introduces changes in the ways in which 
Polygon, Polygons, and SpatialPolygons objects are created, moving from R 
code to compiled C code. Because of these changes, it is possible that 
users will see changed output. The package maintainers have tested as far 
as possible, and a beta release has been checked by some users, without 
any problems coming to light.


Further details are given in:

https://stat.ethz.ch/pipermail/r-sig-geo/2010-January/007377.html

Should anyone see problems following this change, please contact me 
directly with a reproducible example.


--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: roger.biv...@nhh.no

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert factor data to numeric

2010-01-13 Thread Rolf Turner



On 14/01/2010, at 6:00 AM, Nathalie Yauschew-Raguenes wrote:


Hello,

I find a way to convert data in factor type to numeric :
data_numeric - as.numeric(as.character(data_factor)).
It's treaky but works.


Possibly even more ``treaky'' but more efficient is:

data_numeric - as.numeric(levels(data_factor)[data_factor])

as has been pointed out quite a few times on this list.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Advantages of using SQLite for data import in comparison to csv files

2010-01-13 Thread Juliet Jacobson

Hello everybody out there using R,

I'm using R for the analysis of biological data and write the results
down using LaTeX, both on a notebook with linux installed.
I've already tried two options for the import of my data:
1. Import from a SQLite database
2. Import from individual csv files edited with sed, awk and sort.
Both methods actually work very well, since I don't need advanced
features like multi-user network access to the data.
My data sets are tables with up to 20 columns and 1000 rows, containing
mostly numerical values and strings. Moreover,
I might also have to handle microarray data, but I'm not so sure about
that yet. Moreover, I need to organise tags for a collection of photos,
but this data is of course not analysed with R.
I'm now beginning to work on a larger project and have to decide,
whether it is better to use SQLite or csv-files for handling my data.
I fear, it might get difficult to switch between the two system after
having accumulated the data, adapted software for backups and revision
control, written makefiles etc.
Could anyone of you give me a hint on the additional benefits of
importing data from a SQLite database to R to the simpler way of
organising the data in csv files? Is it for example possible to select
values from a column within a certain range from a csv file using awk?

Thanks in advance,
Juliet Jacobson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rollapply

2010-01-13 Thread Pete B


Hi 

I would like to understand how to extend the function (FUN) I am using in
rollapply below.

##
With the following simplified data, test1 yields parameters for a rolling
regression

data = data.frame(Xvar=c(70.67,70.54,69.87,69.51,70.69,72.66,72.65,73.36),
   Yvar =c(78.01,77.07,77.35,76.72,77.49,78.70,77.78,79.58))
data.z = zoo(d)

test1 = rollapply(data.z, width=3, 
  FUN = function(z) coef(lm(z[,1]~z[,2], 
  data=as.data.frame(z))), by.column = FALSE, align = right)

print(test1)

##

Rewriting this to call myfn1 gives test2 (and is consistent with test1
above)

myfn1 = function(mydata){
  dd = as.data.frame(mydata) 
  l = lm(dd[,1]~dd[,2], data=dd)
  c = coef(l)
}

test2 = rollapply(data.z, width=3, 
 FUN= myfn1, by.column = FALSE, align = right)

print(test2)

##

I would like to be able to use the predict function to obtain a prediction
(and its std error) from the rolling regression I have just calculated.

My effort below issues a warning that 'newdata' had 1 row but variable(s)
found have 3 rows.
(if I run this outside of rollapply I don't get this warning) 

Also, I don't see the predicted value or its se with print(fm2[[1]]). Again,
if I run this outside of rollapply I am able to extract the predicted value.


Xpred=c(70.67)

myfn2 = function(mydata){
  dd = as.data.frame(mydata) 
  l = lm(dd[,1]~dd[,2], data=dd)
  c = coef(l)
  p = predict(l, data.frame(Xvar=Xpred),se=T)
  ret=c(l,c,p)
}

fm2 = rollapply(data.z, width=3, 
 FUN= myfn2, by.column = FALSE, align = right)

print(fm2[[1]])


Any insights would be gratefully received.

Best regards

Pete
-- 
View this message in context: 
http://n4.nabble.com/Rollapply-tp1013345p1013345.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ask for histogram

2010-01-13 Thread Don MacQueen


If I do

   b - rnorm(4332)
   hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
   rug(b)

The plot looks entirely reasonable.

As far as being different from SAS, perhaps SAS and R use different 
breakpoints, that is, different boundaries between the histogram bars.


-Don

At 11:58 AM -0600 1/13/10, Yi Du wrote:

Hi,


I use a vector of data to draw the histogram, but it is different from the
graph by SAS. Can you check it for me please?

b is a column vector of 4332

hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
rug(b)

When I used rug, I find the records are smaller than 4332. I don't know
where I did wrong.

Thanks.

--
Yi Du

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ask for histogram

2010-01-13 Thread Yi Du

Thanks all, I fixed it.

On Wed, Jan 13, 2010 at 2:47 PM, Don MacQueen m...@llnl.gov wrote:

 If I do

   b - rnorm(4332)
   hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
   rug(b)

 The plot looks entirely reasonable.

 As far as being different from SAS, perhaps SAS and R use different
 breakpoints, that is, different boundaries between the histogram bars.

 -Don

 At 11:58 AM -0600 1/13/10, Yi Du wrote:

 Hi,


 I use a vector of data to draw the histogram, but it is different from the
 graph by SAS. Can you check it for me please?

 b is a column vector of 4332

 hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
 rug(b)

 When I used rug, I find the records are smaller than 4332. I don't know
 where I did wrong.

 Thanks.

 --
 Yi Du

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://*stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://
 *www.*R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 --
 Don MacQueen
 Environmental Protection Department
 Lawrence Livermore National Laboratory
 Livermore, CA, USA
 925-423-1062
 --




-- 
Yi Du
Ph. D student in Economics
University of Missouri
Department of Economics
118 Professional Building
Columbia MO  65211
1-573-239-6467

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Operating on each row of data frame

2010-01-13 Thread Abhishek Pratap

Hi All

I have a data frame in which there are 4 columns .

Column 1 : name

Column 2-4 : values

I would like to calculate mean/Standard error  of values in column 2-4 and
store them in column 5,6 respectively.



I have done the following but doesn't seem to work

mean_N_SE -function(x)
{

name - x[1]
vals - c(x[2:4])
temp_mean - mean(vals)
SE -  sqrt(var(x)/length(x))

}

apply(d,1,mean_N_SE) where d = data frame.


Can someone help me with this.

Thanks!
-Abhi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging issue.........

2010-01-13 Thread Pete B


Try the merge function
?merge

in1 = id trait1 
110.2 
211.1 
39.7 
610.2 
78.9 
10  9.7 
11  10.2 


in2 = id trait2 
1 9.8 
2 10.8 
4 7.8 
5 9.8 
6 10.1 
1210.2 
1310.1
 

data1 = read.table(textConnection(in1), header=T)
data2 = read.table(textConnection(in2), header=T)

mymerge = merge(data1,data2,all.x=TRUE)
print(mymerge)



karena wrote:
 
 hi, I have a question about merging two files.
 For example, I have two files, the first file is like the following:
 
 id   trait1
 110.2
 211.1
 39.7
 610.2
 78.9
 10  9.7
 11  10.2
 
 The second file is like the following:
 idtrait2
 1 9.8
 2 10.8
 4 7.8
 5 9.8
 6 10.1
 1210.2
 1310.1
 
 now I want to merge the two files by the variable id, I only want to
 keep the ids which show up in the first file. Even the id does not
 show up in the second file, it doesn't matter, I can keep the missing
 values. So my question is: how can I merge the two files and keep only the
 rows whose id show up in the first file?
 I know how to do it is SAS, just use the following code: 
 merge data1(in=in1) data2(in=in2);
 by id;
 if in1;
 
 but I really have no idea about how to do it in R.
 
 thank you in advance,
 
 karean 
 

-- 
View this message in context: 
http://n4.nabble.com/merging-issue-tp1013356p1013375.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rollapply

2010-01-13 Thread Gabor Grothendieck

See:

http://tolstoy.newcastle.edu.au/R/help/04/03/1446.html

On Wed, Jan 13, 2010 at 3:45 PM, Pete B peter.breckn...@bp.com wrote:

 Hi

 I would like to understand how to extend the function (FUN) I am using in
 rollapply below.

 ##
 With the following simplified data, test1 yields parameters for a rolling
 regression

 data = data.frame(Xvar=c(70.67,70.54,69.87,69.51,70.69,72.66,72.65,73.36),
               Yvar =c(78.01,77.07,77.35,76.72,77.49,78.70,77.78,79.58))
 data.z = zoo(d)

 test1 = rollapply(data.z, width=3,
          FUN = function(z) coef(lm(z[,1]~z[,2],
          data=as.data.frame(z))), by.column = FALSE, align = right)

 print(test1)

 ##

 Rewriting this to call myfn1 gives test2 (and is consistent with test1
 above)

 myfn1 = function(mydata){
      dd = as.data.frame(mydata)
      l = lm(dd[,1]~dd[,2], data=dd)
      c = coef(l)
    }

 test2 = rollapply(data.z, width=3,
     FUN= myfn1, by.column = FALSE, align = right)

 print(test2)

 ##

 I would like to be able to use the predict function to obtain a prediction
 (and its std error) from the rolling regression I have just calculated.

 My effort below issues a warning that 'newdata' had 1 row but variable(s)
 found have 3 rows.
 (if I run this outside of rollapply I don't get this warning)

 Also, I don't see the predicted value or its se with print(fm2[[1]]). Again,
 if I run this outside of rollapply I am able to extract the predicted value.


 Xpred=c(70.67)

 myfn2 = function(mydata){
      dd = as.data.frame(mydata)
      l = lm(dd[,1]~dd[,2], data=dd)
      c = coef(l)
      p = predict(l, data.frame(Xvar=Xpred),se=T)
      ret=c(l,c,p)
    }

 fm2 = rollapply(data.z, width=3,
     FUN= myfn2, by.column = FALSE, align = right)

 print(fm2[[1]])


 Any insights would be gratefully received.

 Best regards

 Pete
 --
 View this message in context: 
 http://n4.nabble.com/Rollapply-tp1013345p1013345.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Operating on each row of data frame

2010-01-13 Thread Pete B


Look at the apply function
?apply

x = data.frame(x1=c(1,2,3,4,5),x2=c(2,4,6,8,10),x3=c(1,3,5,7,9))
x$x5=apply(x,1,mean)
x$x6=apply(x,1,sd)

print(x)



Abhishek Pratap wrote:
 
 Hi All
 
 I have a data frame in which there are 4 columns .
 
 Column 1 : name
 
 Column 2-4 : values
 
 I would like to calculate mean/Standard error  of values in column 2-4 and
 store them in column 5,6 respectively.
 
 
 
 I have done the following but doesn't seem to work
 
 mean_N_SE -function(x)
 {
 
 name - x[1]
 vals - c(x[2:4])
 temp_mean - mean(vals)
 SE -  sqrt(var(x)/length(x))
 
 }
 
 apply(d,1,mean_N_SE) where d = data frame.
 
 
 Can someone help me with this.
 
 Thanks!
 -Abhi
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://n4.nabble.com/Operating-on-each-row-of-data-frame-tp1013365p1013397.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R package dependencies

2010-01-13 Thread Gabor Grothendieck

See the dep function defined here:
http://tolstoy.newcastle.edu.au/R/e6/help/09/03/7159.html

On Wed, Jan 13, 2010 at 11:39 AM, Colin Millar c.mil...@marlab.ac.uk wrote:
 Hi there,

 My question relates to getting information about R packages.  In particular i 
 would like to be able to find from within R:
  what are a packages dependencies
  what are a packages reverse dependencies
  does a package contain a dll

 The reason i ask is:

 The organisation that i work for is introducing a secure intranet operating 
 on windows PCs and laptops, and this requires that all software / executables 
 / dlls are validated before they are combined to produce a generic PC build.

 I would like to maximise the packages available to our staff and so for the 
 packages that we have listed as buisness needs, i would like to include all 
 reverse dependencies of this collection that do not have dlls.

 I hope this makes sense (the question not the reason).

 Kind regards,
 Colin.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] merging issue.........

2010-01-13 Thread karena


hi, I have a question about merging two files.
For example, I have two files, the first file is like the following:

id   trait1
110.2
211.1
39.7
610.2
78.9
10  9.7
11  10.2

The second file is like the following:
idtrait2
1 9.8
2 10.8
4 7.8
5 9.8
6 10.1
1210.2
1310.1

now I want to merge the two files by the variable id, I only want to keep
the ids which show up in the first file. Even the id does not show up in
the second file, it doesn't matter, I can keep the missing values. So my
question is: how can I merge the two files and keep only the rows whose id
show up in the first file?
I know how to do it is SAS, just use the following code: 
merge data1(in=in1) data2(in=in2);
by id;
if in1;

but I really have no idea about how to do it in R.

thank you in advance,

karean 
-- 
View this message in context: 
http://n4.nabble.com/merging-issue-tp1013356p1013356.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] counting the number of times a string appears

2010-01-13 Thread Jesse Sinclair

Hi all,

I have a vector of strings and need to count the number of times a string
appears in the vector.

eg:

 [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2  spp2  spp3
 [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1  spp10 spp8

 [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8  spp6  spp5

 [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5  spp4  spp9

 [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9  spp2  spp6

 [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1  spp1  spp1

 [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
spp10
 [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7  spp4  spp7

 [97] spp2  spp6  spp2  spp6

Is it possible to create a vector of counts for each spp1-spp10?

Any help or ideas would be appreciated.

Cheers,
Jesse

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Operating on each row of data frame

2010-01-13 Thread Abhishek Pratap

Thanks all for a very quick solution. It is actually good to know different
ways to do the same things. It expands my limited understanding of R :).

-A

On Wed, Jan 13, 2010 at 5:12 PM, Stephan Kolassa stephan.kola...@gmx.dewrote:

 Hi,

 does this do what you want?

 d - cbind(d,apply(d[,c(2,3,4)],1,mean),apply(d[,c(2,3,4)],1,sd))

 HTH,
 Stephan


 Abhishek Pratap schrieb:

 Hi All

 I have a data frame in which there are 4 columns .

 Column 1 : name

 Column 2-4 : values

 I would like to calculate mean/Standard error  of values in column 2-4 and
 store them in column 5,6 respectively.



 I have done the following but doesn't seem to work

 mean_N_SE -function(x)
 {

 name - x[1]
 vals - c(x[2:4])
 temp_mean - mean(vals)
 SE -  sqrt(var(x)/length(x))

 }

 apply(d,1,mean_N_SE) where d = data frame.


 Can someone help me with this.

 Thanks!
 -Abhi

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting the number of times a string appears

2010-01-13 Thread Greg Hirson


Jesse,

see ?table and try

table(stringVector)

Greg

On 1/13/10 2:12 PM, Jesse Sinclair wrote:

Hi all,

I have a vector of strings and need to count the number of times a string
appears in the vector.

eg:

  [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2  spp2  spp3
  [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1  spp10 spp8

  [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8  spp6  spp5

  [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5  spp4  spp9

  [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9  spp2  spp6

  [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1  spp1  spp1

  [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
spp10
  [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7  spp4  spp7

  [97] spp2  spp6  spp2  spp6

Is it possible to create a vector of counts for each spp1-spp10?

Any help or ideas would be appreciated.

Cheers,
Jesse

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   


--
Greg Hirson
ghir...@ucdavis.edu

Graduate Student
Agricultural and Environmental Chemistry

1106 Robert Mondavi Institute North
One Shields Avenue
Davis, CA 95616

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting the number of times a string appears

2010-01-13 Thread Rolf Turner



?table

On 14/01/2010, at 11:12 AM, Jesse Sinclair wrote:


Hi all,

I have a vector of strings and need to count the number of times a  
string

appears in the vector.

eg:

 [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2   
spp2  spp3
 [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1   
spp10 spp8


 [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8   
spp6  spp5


 [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5   
spp4  spp9


 [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9   
spp2  spp6


 [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1   
spp1  spp1


 [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
spp10
 [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7   
spp4  spp7


 [97] spp2  spp6  spp2  spp6

Is it possible to create a vector of counts for each spp1-spp10?

Any help or ideas would be appreciated.

Cheers,
Jesse

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.



##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Advantages of using SQLite for data import in comparison to csv files

2010-01-13 Thread Gabor Grothendieck

You could look at read.csv.sql in sqldf (http://sqldf.googlecode.com) as well.

On Wed, Jan 13, 2010 at 2:00 PM, Juliet Jacobson julietjacob...@aim.com wrote:
 Hello everybody out there using R,

 I'm using R for the analysis of biological data and write the results
 down using LaTeX, both on a notebook with linux installed.
 I've already tried two options for the import of my data:
 1. Import from a SQLite database
 2. Import from individual csv files edited with sed, awk and sort.
 Both methods actually work very well, since I don't need advanced
 features like multi-user network access to the data.
 My data sets are tables with up to 20 columns and 1000 rows, containing
 mostly numerical values and strings. Moreover,
 I might also have to handle microarray data, but I'm not so sure about
 that yet. Moreover, I need to organise tags for a collection of photos,
 but this data is of course not analysed with R.
 I'm now beginning to work on a larger project and have to decide,
 whether it is better to use SQLite or csv-files for handling my data.
 I fear, it might get difficult to switch between the two system after
 having accumulated the data, adapted software for backups and revision
 control, written makefiles etc.
 Could anyone of you give me a hint on the additional benefits of
 importing data from a SQLite database to R to the simpler way of
 organising the data in csv files? Is it for example possible to select
 values from a column within a certain range from a csv file using awk?

 Thanks in advance,
 Juliet Jacobson

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging issue.........

2010-01-13 Thread Adrian Dusa

Hi Karean,

If your first object is called obj1 and the second called obj2, then:

 merge(obj1, obj2, all.x=TRUE)
  id trait1 trait2
1  1   10.29.8
2  2   11.1   10.8
3  39.7 NA
4  6   10.2   10.1
5  78.9 NA
6 109.7 NA
7 11   10.2 NA

Hope this helps,
Adrian

On Wednesday 13 January 2010, karena wrote:
 hi, I have a question about merging two files.
 For example, I have two files, the first file is like the following:
 
 id   trait1
 110.2
 211.1
 39.7
 610.2
 78.9
 10  9.7
 11  10.2
 
 The second file is like the following:
 idtrait2
 1 9.8
 2 10.8
 4 7.8
 5 9.8
 6 10.1
 1210.2
 1310.1
 
 now I want to merge the two files by the variable id, I only want to keep
 the ids which show up in the first file. Even the id does not show up
  in the second file, it doesn't matter, I can keep the missing values. So
  my question is: how can I merge the two files and keep only the rows whose
  id show up in the first file?
 I know how to do it is SAS, just use the following code:
 merge data1(in=in1) data2(in=in2);
 by id;
 if in1;
 
 but I really have no idea about how to do it in R.
 
 thank you in advance,
 
 karean
 


-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
 +40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging issue.........

2010-01-13 Thread Heinz Tuechler


Did you consider to look at the help page for merge?
h

At 22:01 13.01.2010, karena wrote:


hi, I have a question about merging two files.
For example, I have two files, the first file is like the following:

id   trait1
110.2
211.1
39.7
610.2
78.9
10  9.7
11  10.2

The second file is like the following:
idtrait2
1 9.8
2 10.8
4 7.8
5 9.8
6 10.1
1210.2
1310.1

now I want to merge the two files by the variable id, I only want to keep
the ids which show up in the first file. Even the id does not show up in
the second file, it doesn't matter, I can keep the missing values. So my
question is: how can I merge the two files and keep only the rows whose id
show up in the first file?
I know how to do it is SAS, just use the following code:
merge data1(in=in1) data2(in=in2);
by id;
if in1;

but I really have no idea about how to do it in R.

thank you in advance,

karean
--
View this message in context: 
http://n4.nabble.com/merging-issue-tp1013356p1013356.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting the number of times a string appears

2010-01-13 Thread Adrian Dusa

Hi Jesse,

If your vector is called aa, then how about:

 table(aa)
aa
 spp1 spp10  spp2  spp3  spp4  spp5  spp6  spp7  spp8  spp9
7 216 815 9 910 915

Hope this helps,
Adrian


On Thursday 14 January 2010, Jesse Sinclair wrote:
 Hi all,
 
 I have a vector of strings and need to count the number of times a string
 appears in the vector.
 
 eg:
 
  [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2  spp2  spp3
  [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1  spp10
  spp8
 
  [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8  spp6 
  spp5
 
  [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5  spp4 
  spp9
 
  [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9  spp2 
  spp6
 
  [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1  spp1 
  spp1
 
  [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
 spp10
  [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7  spp4 
  spp7
 
  [97] spp2  spp6  spp2  spp6
 
 Is it possible to create a vector of counts for each spp1-spp10?
 
 Any help or ideas would be appreciated.
 
 Cheers,
 Jesse
 
   [[alternative HTML version deleted]]
 


-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
 +40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Operating on each row of data frame

2010-01-13 Thread Stephan Kolassa


Hi,

does this do what you want?

d - cbind(d,apply(d[,c(2,3,4)],1,mean),apply(d[,c(2,3,4)],1,sd))

HTH,
Stephan


Abhishek Pratap schrieb:

Hi All

I have a data frame in which there are 4 columns .

Column 1 : name

Column 2-4 : values

I would like to calculate mean/Standard error  of values in column 2-4 and
store them in column 5,6 respectively.



I have done the following but doesn't seem to work

mean_N_SE -function(x)
{

name - x[1]
vals - c(x[2:4])
temp_mean - mean(vals)
SE -  sqrt(var(x)/length(x))

}

apply(d,1,mean_N_SE) where d = data frame.


Can someone help me with this.

Thanks!
-Abhi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a question about deleting rows

2010-01-13 Thread karena


I have a file like this:
idn1n2   n3   n4   n5   n6  
1  3 47 8 102
2  4 12 4 3 10
3  7 00 0 0 8
4  1010 0 2 3
5  1110 0 0 5

what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then delete
the row. how can I do that?

thank you,

karena 
-- 
View this message in context: 
http://n4.nabble.com/a-question-about-deleting-rows-tp1013403p1013403.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting the number of times a string appears

2010-01-13 Thread Jesse Sinclair

This is great all.

It works perfectly. Thank-you.

Cheers,
Jesse

On Wed, Jan 13, 2010 at 14:27, Adrian Dusa dusa.adr...@gmail.com wrote:

 Hi Jesse,

 If your vector is called aa, then how about:

  table(aa)
 aa
  spp1 spp10  spp2  spp3  spp4  spp5  spp6  spp7  spp8  spp9
7 216 815 9 910 915

 Hope this helps,
 Adrian


 On Thursday 14 January 2010, Jesse Sinclair wrote:
  Hi all,
 
  I have a vector of strings and need to count the number of times a string
  appears in the vector.
 
  eg:
 
   [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2  spp2
  spp3
   [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1  spp10
   spp8
 
   [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8  spp6
   spp5
 
   [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5  spp4
   spp9
 
   [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9  spp2
   spp6
 
   [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1  spp1
   spp1
 
   [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
  spp10
   [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7  spp4
   spp7
 
   [97] spp2  spp6  spp2  spp6
 
  Is it possible to create a vector of counts for each spp1-spp10?
 
  Any help or ideas would be appreciated.
 
  Cheers,
  Jesse
 
[[alternative HTML version deleted]]
 


 --
 Adrian Dusa
 Romanian Social Data Archive
 1, Schitu Magureanu Bd.
 050025 Bucharest sector 5
 Romania
 Tel.:+40 21 3126618 \
 +40 21 3120210 / int.101
 Fax: +40 21 3158391


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging issue.........

2010-01-13 Thread karena


thank you very much!
-- 
View this message in context: 
http://n4.nabble.com/merging-issue-tp1013356p1013433.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optimization challenge

2010-01-13 Thread Ravi Varadhan

Greg - thanks for posting this interesting problem.

Albyn - thanks for posting a solution.  Now, I have some questions: (1) is
the algorithm guaranteed to find a best solution? (2) can there be
multiple solutions (it seems like there can be more than 1 solution
depending on the data)?, and (3) is there a good reference for this and
similar algorithms?

Thanks  Best,
Ravi.


---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvarad...@jhmi.edu

Webpage:
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
tml

 





-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Albyn Jones
Sent: Wednesday, January 13, 2010 1:19 PM
To: Greg Snow
Cc: r-help@r-project.org
Subject: Re: [R] optimization challenge

The key idea is that you are building a matrix that contains the
solutions to smaller problems which are sub-problems of the big
problem.  The first row of the matrix SSQ contains the solution for no
splits, ie SSQ[1,j] is just the sum of squares about the overall mean
for reading chapters1 through j in one day.  The iteration then uses
row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
chapters in m-1 days) is part of the overall optimal solution, you
have already computed it, and so don't ever need to recompute it.

   TS = SSQ[m-1,j]+(SSQ1[j+1])

computes the vector of possible solutions for SSQ[m,n] (n chapters in n
days) 
breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1
to
n in 1 day.  j is a vector in the function, and min(TS) is the minimum
over choices of j, ie SSQ[m,n].

At the end, SSQ[128,239] is the optimal value for reading all 239
chapters in 128 days.  That's just the objective function, so the rest
involves constructing the list of optimal cuts, ie which chapters are
grouped together for each day's reading.  That code uses the same
idea... constructing a list of lists of cutpoints.

statisticians should study a bit of data structures and algorithms!

albyn

On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
 WOW, your results give about half the variance of my best optim run
(possibly due to my suboptimal use of optim).
 
 Can you describe a little what the algorithm is doing?
 
 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
  -Original Message-
  From: Albyn Jones [mailto:jo...@reed.edu]
  Sent: Tuesday, January 12, 2010 5:31 PM
  To: Greg Snow
  Cc: r-help@r-project.org
  Subject: Re: [R] optimization challenge
  
  Greg
  
  Nice problem: I wasted my whole day on it :-)
  
  I was explaining my plan for a solution to a colleague who is a
  computer scientist, he pointed out that I was trying to re-invent the
  wheel known as dynamic programming.  here is my code, apparently it is
  called bottom up dynamic programming.  It runs pretty quickly, and
  returns (what I hope is :-) the optimal sum of squares and the
  cut-points.
  
  function(X=bom3$Verses,days=128){
  # find optimal BOM reading schedule for Greg Snow
  # minimize variance of quantity to read per day over 128 days
  #
  N = length(X)
  Nm1 = N-1
  SSQ- matrix(NA,nrow=days,ncol=N)
  Cuts - list()
  #
  #  SSQ[i,j]: the ssqs about the overall mean for the optimal partition
  #   for i days on the chapters 1 to j
  #
  M = sum(X)/days
  CS = cumsum(X)
  SSQ[1,]= (CS-M)^2
  Cuts[[1]]= as.list(1:N)
  #
  for(m in 2:days){
  Cuts[[m]]=list()
  #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]]
  for(n in m:N){
CS = cumsum(X[n:1])[n:1]
SSQ1 = (CS-M)^2
j = (m-1):(n-1)
TS = SSQ[m-1,j]+(SSQ1[j+1])
SSQ[m,n] = min(TS)
k = min(which((min(TS)== TS)))+m-1
Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n)
  }
  }
  list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]])
  }
  
  $SSQ
  [1] 11241.05
  
  $Cuts
[1]   2   4   7   9  11  13  15  16  17  19  21  23  25  27  30  31
  34  37
   [19]  39  41  44  46  48  50  53  56  59  60  62  64  66  68  70  73
  75  77
   [37]  78  80  82  84  86  88  89  91  92  94  95  96  97  99 100 103
  105 106
   [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135
  137 138
   [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163
  164 166
   [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194
  196 199
  [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228
  234 236
  [127] 238 239
  
  
  
  
  On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote:
   I have a challenge

[R] Updated comparison table for SAS-SPSS Add-ons and R Functions

2010-01-13 Thread Muenchen, Robert A (Bob)

Hi All,

I have substantially expanded the table that compares SAS and SPSS
add-on modules to somewhat equivalent R packages. This new version is
at:
http://r4stats.com/add-on-modules 
and I would very much appreciate any feedback you might have on it.

The site http://r4stats.com is the replacement to
http://RforSASandSPSSusers.com and includes the support files for both
R for SAS and SPSS Users and the new R for Stata Users, due out in
March from Springer. I'll phase the older site out eventually and change
the URL to point to the new one.

Thanks,
Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Research Computing Support
  Voice: (865) 974-5230  
  Email: muenc...@utk.edu
  Web:   http://oit.utk.edu/research, 
  News:  http://oit.utk.edu/research/news.php
=

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Updated comparison table for SAS-SPSS Add-ons and R Functions

2010-01-13 Thread Barry Rowlingson

On Wed, Jan 13, 2010 at 11:53 PM, Muenchen, Robert A (Bob) muenc...@utk.edu
 wrote:

 Hi All,

 I have substantially expanded the table that compares SAS and SPSS
 add-on modules to somewhat equivalent R packages. This new version is
 at:
 http://r4stats.com/add-on-modules
 and I would very much appreciate any feedback you might have on it.

 The site http://r4stats.com is the replacement to
 http://RforSASandSPSSusers.com and includes the support files for both
 R for SAS and SPSS Users and the new R for Stata Users, due out in
 March from Springer. I'll phase the older site out eventually and change
 the URL to point to the new one.


Maybe the first thing you should do is a global search and replace of 'SPSS'
with 'PASW'

 http://www.spss.com/software/product-name-guide/

Barry

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a question about deleting rows

2010-01-13 Thread Steve Taylor

yourdataframe = subset(yourdataframe, !(n2==0  n3==0  n4==0  n5==0))

From: karena dr.jz...@gmail.com
To:r-help@r-project.org
Date: 14/Jan/2010 12:24 p.m.
Subject: [R]  a question about deleting rows

I have a file like this:
idn1n2   n3   n4   n5   n6  
1  3 47 8 102
2  4 12 4 3 10
3  7 00 0 0 8
4  1010 0 2 3
5  1110 0 0 5

what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then delete
the row. how can I do that?

thank you,

karena 
-- 
View this message in context: 
http://n4.nabble.com/a-question-about-deleting-rows-tp1013403p1013403.html 
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R ( http://www.r/ 
)-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a question about deleting rows

2010-01-13 Thread jim holtman

Try this:

 x
  id n1 n2 n3 n4 n5 n6
1  1  3  4  7  8 10  2
2  2  4  1  2  4  3 10
3  3  7  0  0  0  0  8
4  4 10  1  0  0  2  3
5  5 11  1  0  0  0  5
 delete - with(x, n2 == 0  n3 == 0  n4 == 0  n5 == 0)
 delete
[1] FALSE FALSE  TRUE FALSE FALSE
 x[!delete,]
  id n1 n2 n3 n4 n5 n6
1  1  3  4  7  8 10  2
2  2  4  1  2  4  3 10
4  4 10  1  0  0  2  3
5  5 11  1  0  0  0  5



On Wed, Jan 13, 2010 at 5:15 PM, karena dr.jz...@gmail.com wrote:


 I have a file like this:
 idn1n2   n3   n4   n5   n6
 1  3 47 8 102
 2  4 12 4 3 10
 3  7 00 0 0 8
 4  1010 0 2 3
 5  1110 0 0 5

 what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then delete
 the row. how can I do that?

 thank you,

 karena
 --
 View this message in context:
 http://n4.nabble.com/a-question-about-deleting-rows-tp1013403p1013403.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optimization challenge

2010-01-13 Thread Aaron Mackey

FYI, in bioinformatics, we use dynamic programming algorithms in similar
ways to solve similar problems of finding guaranteed-optimal partitions in
streams of data (usually DNA or protein sequence, but sometimes numerical
data from chip-arrays).  These path optimization algorithms are often
called Viterbi algorithms, a web search for which should provide multiple
references.

The solutions are not necessarily unique (there may be multiple
paths/partitions with identical integer maxima in some systems) and there is
much research on whether the optimal solution is actually the one you want
to work with (for example, there may be a fair amount of probability mass
within an area/ensemble of suboptimal solutions that overall have greater
posterior probabilities than does the optimal solution singleton).  See
Chip Lawrence's PNAS paper for more erudite discussion, and references
therein: www.pnas.org/content/105/9/3209.abstract

-Aaron

P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at
Reed back in 1993, which started me down a somewhat windy road to
statistical genomics!

--
Aaron J. Mackey, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
amac...@virginia.edu


On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:

 Greg - thanks for posting this interesting problem.

 Albyn - thanks for posting a solution.  Now, I have some questions: (1) is
 the algorithm guaranteed to find a best solution? (2) can there be
 multiple solutions (it seems like there can be more than 1 solution
 depending on the data)?, and (3) is there a good reference for this and
 similar algorithms?

 Thanks  Best,
 Ravi.


 
 ---

 Ravi Varadhan, Ph.D.

 Assistant Professor, The Center on Aging and Health

 Division of Geriatric Medicine and Gerontology

 Johns Hopkins University

 Ph: (410) 502-2619

 Fax: (410) 614-9625

 Email: rvarad...@jhmi.edu

 Webpage:

 http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
 tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml




 
 


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Albyn Jones
 Sent: Wednesday, January 13, 2010 1:19 PM
 To: Greg Snow
 Cc: r-help@r-project.org
 Subject: Re: [R] optimization challenge

 The key idea is that you are building a matrix that contains the
 solutions to smaller problems which are sub-problems of the big
 problem.  The first row of the matrix SSQ contains the solution for no
 splits, ie SSQ[1,j] is just the sum of squares about the overall mean
 for reading chapters1 through j in one day.  The iteration then uses
 row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
 chapters in m-1 days) is part of the overall optimal solution, you
 have already computed it, and so don't ever need to recompute it.

   TS = SSQ[m-1,j]+(SSQ1[j+1])

 computes the vector of possible solutions for SSQ[m,n] (n chapters in n
 days)
 breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1
 to
 n in 1 day.  j is a vector in the function, and min(TS) is the minimum
 over choices of j, ie SSQ[m,n].

 At the end, SSQ[128,239] is the optimal value for reading all 239
 chapters in 128 days.  That's just the objective function, so the rest
 involves constructing the list of optimal cuts, ie which chapters are
 grouped together for each day's reading.  That code uses the same
 idea... constructing a list of lists of cutpoints.

 statisticians should study a bit of data structures and algorithms!

 albyn

 On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
  WOW, your results give about half the variance of my best optim run
 (possibly due to my suboptimal use of optim).
 
  Can you describe a little what the algorithm is doing?
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
 
   -Original Message-
   From: Albyn Jones [mailto:jo...@reed.edu]
   Sent: Tuesday, January 12, 2010 5:31 PM
   To: Greg Snow
   Cc: r-help@r-project.org
   Subject: Re: [R] optimization challenge
  
   Greg
  
   Nice problem: I wasted my whole day on it :-)
  
   I was explaining my plan for a solution to a colleague who is a
   computer scientist, he pointed out that I was trying to re-invent the
   wheel known as dynamic programming.  here is my code, apparently it is
   called bottom up dynamic programming.  It runs pretty quickly, and
   returns (what I hope is :-) the optimal sum of squares and the
   cut-points.
  
   function(X=bom3$Verses,days=128){
   # find optimal BOM reading schedule for Greg Snow
   # minimize variance of quantity to read per day over 128 days
   #
   N = length(X)
   Nm1 = N-1
   SSQ-

Re: [R] Updated comparison table for SAS-SPSS Add-ons and R Functions

2010-01-13 Thread Muenchen, Robert A (Bob)

From: b.rowling...@googlemail.com [mailto:b.rowling...@googlemail.com] On 
Behalf Of Barry Rowlingson
Sent: Wednesday, January 13, 2010 7:03 PM
To: Muenchen, Robert A (Bob)
Cc: r-help@r-project.org
Subject: Re: [R] Updated comparison table for SAS-SPSS Add-ons and R Functions

Maybe the first thing you should do is a global search and replace of 'SPSS' 
with 'PASW'

 http://www.spss.com/software/product-name-guide/

Barry

One of the things I updated was to *remove* the now-obsolete PASW! Since IBM 
bought the company, they did away with that and renamed things IBM SPSS 
 See the list at:
http://spss.com/software/statistics/ 
They still have some old web pages to clean up as you point out.

Cheers, 
Bob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error: object of type 'closure' is not subsettable

2010-01-13 Thread Matthew Walker


Hi everyone,

Would somebody please explain (or point me to a reference that explains) 
the following error:


Error: object of type 'closure' is not subsettable

I was trying to use rep() to replicate a function:

 example_function - function() { return(TRUE) }
 rep(example_function, 3)
Error: object of type 'closure' is not subsettable

But I just cannot understand this error.  I can combine functions using 
c without any problems:


 c(example_function, example_function)
[[1]]
function ()
{
   return(TRUE)
}

[[2]]
function ()
{
   return(TRUE)
}

What am I doing wrong when I use rep()?

Thanks in advance,

Matthew Walker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] package spam for R64-devel

2010-01-13 Thread Julian Ramirez

Dear Uwe and all,

First of all, I want to congratulate you for your dedication in providing
and maintaining R for 64bit operating systems. I tried the 64bit version of
R, under a windows server 2003 system. It seems to work properly, but am
concerned since I need to use the package fields, which depends on the
package spam, which seems to have a check error. I know 64bit versions of
R and its packages are just starting to roll, but I wonder if there's a
possibility of making the spam package working on 64bit R. From what I saw
in the log file (
http://www.statistik.tu-dortmund.de/~ligges/CRAN/bin/windows64/contrib/r-devel/check/spam-check.log)
it seems to be a problem with tests.

Is it possible to run the R CMD check for the spam package with the
--no-tests flag? By the way, the fields package was built using the
--no-tests flag

Many thanks for any help you might be able to provide,


Julian Ramirez
Research Assistant
International Centre for Tropical Agriculture, CIAT
Colombia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: object of type 'closure' is not subsettable

2010-01-13 Thread Gabor Grothendieck

See ?rep where it says that the argument must be a vector.  Try
   rep(list(sin), 3)

On Wed, Jan 13, 2010 at 8:11 PM, Matthew Walker
matthew.walke...@ulaval.ca wrote:
 Hi everyone,

 Would somebody please explain (or point me to a reference that explains) the
 following error:

 Error: object of type 'closure' is not subsettable

 I was trying to use rep() to replicate a function:

 example_function - function() { return(TRUE) }
 rep(example_function, 3)
 Error: object of type 'closure' is not subsettable

 But I just cannot understand this error.  I can combine functions using c
 without any problems:

 c(example_function, example_function)
 [[1]]
 function ()
 {
   return(TRUE)
 }

 [[2]]
 function ()
 {
   return(TRUE)
 }

 What am I doing wrong when I use rep()?

 Thanks in advance,

 Matthew Walker

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optimization challenge

2010-01-13 Thread Albyn Jones

Hi Aaron!  It's always nice to see a former student doing well.

Thanks for the notes and references, too!

albyn

On Wed, Jan 13, 2010 at 07:29:57PM -0500, Aaron Mackey wrote:
 FYI, in bioinformatics, we use dynamic programming algorithms in similar
 ways to solve similar problems of finding guaranteed-optimal partitions in
 streams of data (usually DNA or protein sequence, but sometimes numerical
 data from chip-arrays).  These path optimization algorithms are often
 called Viterbi algorithms, a web search for which should provide multiple
 references.
 
 The solutions are not necessarily unique (there may be multiple
 paths/partitions with identical integer maxima in some systems) and there is
 much research on whether the optimal solution is actually the one you want
 to work with (for example, there may be a fair amount of probability mass
 within an area/ensemble of suboptimal solutions that overall have greater
 posterior probabilities than does the optimal solution singleton).  See
 Chip Lawrence's PNAS paper for more erudite discussion, and references
 therein: www.pnas.org/content/105/9/3209.abstract
 
 -Aaron
 
 P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at
 Reed back in 1993, which started me down a somewhat windy road to
 statistical genomics!
 
 --
 Aaron J. Mackey, PhD
 Assistant Professor
 Center for Public Health Genomics
 University of Virginia
 amac...@virginia.edu
 
 
 On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:
 
  Greg - thanks for posting this interesting problem.
 
  Albyn - thanks for posting a solution.  Now, I have some questions: (1) is
  the algorithm guaranteed to find a best solution? (2) can there be
  multiple solutions (it seems like there can be more than 1 solution
  depending on the data)?, and (3) is there a good reference for this and
  similar algorithms?
 
  Thanks  Best,
  Ravi.
 
 
  
  ---
 
  Ravi Varadhan, Ph.D.
 
  Assistant Professor, The Center on Aging and Health
 
  Division of Geriatric Medicine and Gerontology
 
  Johns Hopkins University
 
  Ph: (410) 502-2619
 
  Fax: (410) 614-9625
 
  Email: rvarad...@jhmi.edu
 
  Webpage:
 
  http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
  tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml
 
 
 
 
  
  
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
  On
  Behalf Of Albyn Jones
  Sent: Wednesday, January 13, 2010 1:19 PM
  To: Greg Snow
  Cc: r-help@r-project.org
  Subject: Re: [R] optimization challenge
 
  The key idea is that you are building a matrix that contains the
  solutions to smaller problems which are sub-problems of the big
  problem.  The first row of the matrix SSQ contains the solution for no
  splits, ie SSQ[1,j] is just the sum of squares about the overall mean
  for reading chapters1 through j in one day.  The iteration then uses
  row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
  chapters in m-1 days) is part of the overall optimal solution, you
  have already computed it, and so don't ever need to recompute it.
 
TS = SSQ[m-1,j]+(SSQ1[j+1])
 
  computes the vector of possible solutions for SSQ[m,n] (n chapters in n
  days)
  breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1
  to
  n in 1 day.  j is a vector in the function, and min(TS) is the minimum
  over choices of j, ie SSQ[m,n].
 
  At the end, SSQ[128,239] is the optimal value for reading all 239
  chapters in 128 days.  That's just the objective function, so the rest
  involves constructing the list of optimal cuts, ie which chapters are
  grouped together for each day's reading.  That code uses the same
  idea... constructing a list of lists of cutpoints.
 
  statisticians should study a bit of data structures and algorithms!
 
  albyn
 
  On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
   WOW, your results give about half the variance of my best optim run
  (possibly due to my suboptimal use of optim).
  
   Can you describe a little what the algorithm is doing?
  
   --
   Gregory (Greg) L. Snow Ph.D.
   Statistical Data Center
   Intermountain Healthcare
   greg.s...@imail.org
   801.408.8111
  
  
-Original Message-
From: Albyn Jones [mailto:jo...@reed.edu]
Sent: Tuesday, January 12, 2010 5:31 PM
To: Greg Snow
Cc: r-help@r-project.org
Subject: Re: [R] optimization challenge
   
Greg
   
Nice problem: I wasted my whole day on it :-)
   
I was explaining my plan for a solution to a colleague who is a
computer scientist, he pointed out that I was trying to re-invent the
wheel known as dynamic programming.  here is my code, apparently it is
called bottom up dynamic

Re: [R] Formula for normal distribution with know mean and standard error and n terms

2010-01-13 Thread GlenB




steve_fried...@nps.gov wrote:
 
 I am searching for a method to calculate a normal distribution.
 
 For example this equation is used to calculate the normal curve when the
 mean and standard deviation are know.
 p(x) = (1/σ*sqrt(2π)) x exp (- (x-μ)2/2σ2)
 
 However, some of the literature I'm reading (I'm building an ecological
 niche model for vegetation along several ecological gradients) report the
 standard error instead and n sample size.  Is there an equivalent formula
 ?
 If so, how can I also normalize the p(x) term to be within the 0-1 range?
 

What you have there (p) is a density rather than the distribution.

note that p(x) is NOT a probability, so it doesn't lie between 0 and 1 

(integrals of p(x).dx are probabilities and do lie between 0 and 1)

The function to compute p is dnorm. Try ?dnorm in R.

if you're given the standard error of a mean (which I'll call se) and n, 
then sigma = sqrt(n)*se

(because se = sigma/sqrt(n) ).

If it's the standard error of something other than the mean you'll need to
give
more details.


-- 
View this message in context: 
http://n4.nabble.com/Formula-for-normal-distribution-with-know-mean-and-standard-error-and-n-terms-tp1013280p1013552.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FW: Problems connecting with MySQL using odbcDriverConnect (RODBC package) on Linux

2010-01-13 Thread Orvalho Augusto

Thanks you solved and share with us.

But, why don't you use the RMySQL, which connects to MySQL without the
need of ODBC?

Caveman


On Wed, Jan 13, 2010 at 1:48 AM, Marcus, Jeffrey
jeffrey.mar...@nuance.com wrote:
 I think I figured this out. I should not have put the Driver name in
 braces. Changing it from {MySQL} to MySQL seems to work.

 -Original Message-
 From: Marcus, Jeffrey
 Sent: Tuesday, January 12, 2010 6:09 PM
 To: 'r-help@r-project.org'
 Subject: Problems connecting with MySQL using odbcDriverConnect (RODBC
 package) on Linux

 I am sure I'm doing something wrong here but not sure what.

 Our system administrator recently installed UnixODBC and the MyODBC
 driver on a Linux box running Linux version 2.6 x86_64.

 I have an .odbc.ini file in my home directory with following lines:

 [mydb]
 Description = MySQL server on my-server
 Driver=/usr/lib64/libmyodbc3.so
 SERVER=my-server

 I can successfully do the following:

 library(RODBC)
 channel - odbcConnect(mydb)
 sqlQuery(channel, show databases)

 And in general, I have no problems using odbcConnect to connect to the
 mydb DSN.

 However, for various reasons I want to make a DSN-less connection
 using odbcDriverConnect. However, everything I've tried generated a
 data source not found message (see below for details)

  After reading through various documents, I tried doing following.

 (1) Put an odbcinst.ini file in my home directory with following lines
 [MySQL]
 Description     = ODBC for MySQL
 Driver=/usr/lib64/libmyodbc3.so
 Setup           = /usr/lib/libodbcmyS.so
 FileUsage       = 1

 (2) Install it with odbcinst -i -f. This seems to work as when I type
 odbcinst -j I get

 DRIVERS: /home/jmarcus/odbcinst.ini
 SYSTEM DATA SOURCES: /home/jmarcus/odbc.ini
 USER DATA SOURCES..: /home/jmarcus/.odbc.ini


 (2) Set the environment variable to point to this file:

 bash-3.2$  ODBCSYSINI=/home/jmarcus
 bash-3.2$ export ODBCSYSINI

 (3) Start R

 Note that R has inherited environment variable
 Sys.getenv(ODBCSYSINI)

     ODBCSYSINI
 /home/jmarcus

 (4) Try to connect to the MySQL server

   conn -
 odbcDriverConnect(connection=Driver={MySQL};Server=my-server;Database=m
 y_database;Uid=my_username;Pwd=my_password)

 This generates following:

 Warning messages:
 1: In odbcDriverConnect(connection =
 Driver={MySQL};Server=my-server;Database=my_database;Uid=my_username;Pw
 d=my_password) :
  [RODBC] ERROR: state IM002, code 0, message [unixODBC][Driver
 Manager]Data source name not found, and no default driver specified
 2: In odbcDriverConnect(connection =
 Driver={MySQL};Server=my-server;Database=my_database;Uid=my_username;Pw
 d=my_password) :
  ODBC connection failed


 Can anyone see what I'm doing wrong? Thanks.

  Jeff

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
OpenSource Software Consultant
CENFOSS (www.cenfoss.co.mz)
SP Tech (www.sptech.co.mz)
email: orvaq...@cenfoss.co.mz
cell: +258828810980

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] installing RCurl when libcurl is in non-standard location

2010-01-13 Thread Janet Young


Hi,

I'm struggling to install RCurl for 32-bit linux and am hoping for  
some suggestions.


I obtained RCurl_1.3-1.tar.gz from CRAN today, and am using a very  
recent version of R:

R version 2.10.1 Patched (2010-01-12 r50970).

I'm not the sysadmin for this system (disclaimer: my sysadmin skills  
are not very good, I'm afraid).  curl is available centrally on the  
system but it's a little old (7.12.3 - looks from some older r-help  
posts like this is too old for RCurl). Therefore I installed libcurl  
7.19.7 in a non-standard location (because I'm not the sysadmin), and  
I think I'm pointing R towards this new libcurl OK, but I'm not 100%  
sure about that. The output of locate (see below) makes me a little  
suspicious, but the output of the R CMD INSTALL makes it seem like the  
new libcurl I installed IS being used.


I've included various output below that I hope will help in figuring  
this out. Is there anything else that would be useful to know? I can  
also ask our sysadmin for help if that makes more sense than asking  
you all via r-help.


Thanks very much in advance for any ideas,

Janet Young

---

[2] zork20:/home/jayoung uname -a
Linux zork20 2.6.12-1.1381_FC3smp #1 SMP Fri Oct 21 04:03:26 EDT 2005  
i686 athlon i386 GNU/Linux

[3] zork20:/home/jayoung setenv MAKE gmake
[4] zork20:/home/jayoung which gmake
/usr/bin/gmake
[5] zork20:/home/jayoung gmake -version
GNU Make 3.80
Copyright (C) 2002  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
[6] zork20:/home/jayoung which curl-config
/home/jayoung/traskdata/bin_linux/curl-config
[7] zork20:/home/jayoung curl-config --version
libcurl 7.19.7
[8] zork20:/home/jayoung locate curl-config
/usr/bin/curl-config
/usr/share/man/man1/curl-config.1.gz
[16] zork20:/home/jayoung /usr/bin/curl-config --version
libcurl 7.12.3
[9] zork20:/home/jayoung locate libcurl
/usr/lib/libcurl.so.3
/usr/lib/libcurl.so
/usr/lib/libcurl.a
/usr/lib/libcurl.so.3.0.0
/usr/share/man/man3/libcurl-multi.3.gz
/usr/share/man/man3/libcurl-easy.3.gz
/usr/share/man/man3/libcurl-errors.3.gz
/usr/share/man/man3/libcurl-share.3.gz
/usr/share/man/man3/libcurl-tutorial.3.gz
/usr/share/man/man3/libcurl.3.gz
[10] zork20:/home/jayoung ls ~/traskdata/lib_linux/libcu*
/home/jayoung/traskdata/lib_linux/libcurl.a
/home/jayoung/traskdata/lib_linux/libcurl.la*
/home/jayoung/traskdata/lib_linux/libcurl.so@
/home/jayoung/traskdata/lib_linux/libcurl.so.3@
/home/jayoung/traskdata/lib_linux/libcurl.so.3.0.0*
/home/jayoung/traskdata/lib_linux/libcurl.so.4@
/home/jayoung/traskdata/lib_linux/libcurl.so.4.0.0*
/home/jayoung/traskdata/lib_linux/libcurl.so.4.1.1*
[11] zork20:/home/jayoung printenv LD_LIBRARY_PATH
/home/btrask/traskdata/lib_linux:/home/jayoung/traskdata/bin_linux/qt/ 
lib:/home/btrask/traskdata/lib_linux/R/library/RSPerl/libs:/home/ 
btrask/traskdata/lib_linux/R/lib
[14] zork20:/home/jayoung/source_codes/R/other_packages R CMD INSTALL  
RCurl_1.3-1.tar.gz --configure-args='--libdir=/home/btrask/traskdata/ 
lib_linux --includedir=/home/btrask/traskdata/include'

* installing to library ‘/home/btrask/traskdata/lib_linux/R/library’
* installing *source* package ‘RCurl’ ...
checking for curl-config... /home/jayoung/traskdata/bin_linux/curl- 
config

checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking how to run the C preprocessor... gcc -E
Version has a libidn field
Version has CURLOPT_URL
Version has CURLINFO_EFFECTIVE_URL
Version has CURLINFO_RESPONSE_CODE
Version has CURLINFO_TOTAL_TIME
Version has CURLINFO_NAMELOOKUP_TIME
Version has CURLINFO_CONNECT_TIME
Version has CURLINFO_PRETRANSFER_TIME
Version has CURLINFO_SIZE_UPLOAD
Version has CURLINFO_SIZE_DOWNLOAD
Version has CURLINFO_SPEED_DOWNLOAD
Version has CURLINFO_SPEED_UPLOAD
Version has CURLINFO_HEADER_SIZE
Version has CURLINFO_REQUEST_SIZE
Version has CURLINFO_SSL_VERIFYRESULT
Version has CURLINFO_FILETIME
Version has CURLINFO_CONTENT_LENGTH_DOWNLOAD
Version has CURLINFO_CONTENT_LENGTH_UPLOAD
Version has CURLINFO_STARTTRANSFER_TIME
Version has CURLINFO_CONTENT_TYPE
Version has CURLINFO_REDIRECT_TIME
Version has CURLINFO_REDIRECT_COUNT
Version has CURLINFO_PRIVATE
Version has CURLINFO_HTTP_CONNECTCODE
Version has CURLINFO_HTTPAUTH_AVAIL
Version has CURLINFO_PROXYAUTH_AVAIL
Version has CURLINFO_OS_ERRNO
Version has CURLINFO_NUM_CONNECTS
Version has CURLINFO_SSL_ENGINES
No CURLINFO_COOKIELIST enumeration value.
No CURLINFO_LASTSOCKET enumeration value.
No CURLINFO_FTP_ENTRY_PATH enumeration value.
No CURLINFO_REDIRECT_URL enumeration

Re: [R] apply a function down each column

2010-01-13 Thread Laetitia Schmid

Thank you very much! It works now perfectly. I even extended it to be  
able to apply it to the whole dataset:


data-read.delim(mhc_data.txt, stringsAsFactors=FALSE)

lettermatch - function(a, b) {
tb - merge(as.data.frame(table(strsplit(a, ))),
as.data.frame(table(strsplit(b, ))), by=Var1)
sum(apply(tb[-1], 1, min))
}

output-matrix(ncol=(ncol(data)-1),nrow=nrow(data)/2)
sim-rep(0, nrow(data)/2)

for (y in 2:(ncol(data))) {

for (x in 1:(nrow(data)/2)) {
a - data[(2*x-1),y]  # odd rows
b - data[(2*x),y]# even rows
sim[x]-(lettermatch(a,b))   
}
output[,y-1]-sim
}
colnames(output)-c(names(data[2:length(names(data))]))
rownames(output)-c(1:(nrow(data)/2))

output

Laetitia



Am 12.01.2010 um 18:31 schrieb Peter Ehlers:


Laetitia,

I was just responding to your comment that R complains
about a syntax error. But I realize now that 2x would
probably cause an unexpected symbol error.

Here's what I get when I run your loop; what do you get?


for (x in 1:(nrow(dat)-1)) {

+  a - as.character(dat[(2x-1),1])
Error: unexpected symbol in:
for (x in 1:(nrow(dat)-1)) {
 a - as.character(dat[(2x

b - as.character(dat[(2x),1])

Error: unexpected symbol in  b - as.character(dat[(2x

lettermatch(a,b)

Error in strsplit(a, ) : object 'a' not found

}

Error: unexpected '}' in }




and here's what I get when I fix the obvious syntax
error:


for (x in 1:(nrow(dat)-1)) {

+  a - as.character(dat[(2*x-1),1])
+  b - as.character(dat[(2*x),1])
+  lettermatch(a,b)
+ }
Error in fix.by(by.x, x) : 'by' must specify valid column(s)




That leaves two problems:
1) you're looking at the wrong column in dat[,1]; that
   should be dat[,2], etc.
2) that error message indicates that your index variable (x)
   gets to invalid values.

Try this:

for (x in 1:(nrow(dat)/2)) {
 a - dat[(2*x-1),2]  # odd rows
 b - dat[(2*x),2]# even rows
 print(lettermatch(a,b))
}

You don't need the as.character() if you have character data.
Always do a str(dat) before you do any analysis.

 -Peter Ehlers

Laetitia Schmid wrote:

Dear Peter,
thank you for the suggestion.
Unfortunately the star did not help. Did it work for you? For me it  
seems incomplete somehow.

Laetitia


From: Peter Ehlers [ehl...@ucalgary.ca]
Sent: Tuesday, January 12, 2010 09:54 AM
To: Laetitia Schmid
Cc: Steve Lianoglou; r-help@r-project.org
Subject: Re: [R] apply a function down each column

See inline below.

Laetitia Schmid wrote:

Dear Steve,
my solution looks like it would work, but it does not.
I attached a text file with an extract of my data. Maybe you can  
try it
yourself. I want to compare C1 with M1, C2 with M2, C3 with M3,,,  
for

each column.
I do not really know what the problem is. R complains about a  
syntax error.
The function I am applying counts the common strings between the  
two.

Greg Hirson helped me to write it.

lettermatch - function(a, b) {
  tb - merge(as.data.frame(table(strsplit(a, ))),
as.data.frame(table(strsplit(b, ))), by=Var1)
  sum(apply(tb[-1], 1, min))
}

For example for the second column I tried:

for (x in 1:(nrow(dat)-1)) {
a - as.character(dat[(2x-1),1])


Shouldn't that be 2*x-1??

 -Peter Ehlers


b - as.character(dat[(2x),1])
lettermatch(a,b)
}

or

a - as.character(dat[seq(1, nrow(dat), by=2),2])
b - as.character(dat[seq(2, nrow(dat), by=2), 2])
all.results - lettermatch(a,b)

With dat-read.delim(data_lgs.txt,stringsAsFactors=FALSE) I can
leave the as.character away in the formula above.

Laetitia

IndividualsSeq1Seq2Seq3Seq4
C1AATTCCGGCTTT
M1
C2AATTCCGGCTTT
M2AGGGAACTCCGGCGTT
C3AGGGAACTCCGGCGTT
M3AGGGAACTCCGGCGTT
C4AATTCCGGCCTT
M4AAATCGGGCTTT
C5AGGGACTTCCCGCTTT
M5AGGGCTTTCCTT
C6AGGGCTTTCCTT
M6AAAGCCTTCTTT
C7AAAGACCCCCCGGTTT
M7AAGGAACCCCGG
C8AATTCCGGCCTT
M8AATTCCGGCCTT
C9
M9
C11AGGGAAACCGGGGGTT
M11AATTCCGGCCTT



Am 11.01.2010 um 15:18 schrieb Steve Lianoglou:


Hi,

On Mon, Jan 11, 2010 at 8:41 AM, Laetitia Schmid laeti...@gmt.su.se 


wrote:

Hello World,
I have a function that makes pairwise comparisons between two
strings. I would like to apply this function to my data (which
consists of columns with different strings) in the way that it
compares the first with the second entry, and then the third  
with the
fourth, and then the fifth with the sixth, and so on down each  
column...

So (2x-1) and (2x) would be the different entries to be compared!

dat= my data:

for the first column: compare dat[(2x-1),1] with

Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?

2010-01-13 Thread bbslover


thank Max.
   you are so responsible, every time, you give me a lot of help. On my 
learning road, you are my guide, though we do not know each other.
 
best wishes
 
kevin



å¨2010-01-14ï¼Max Kuhn [via R] ml-node+1013265-480375...@n4.nabble.com 
åéï¼ -åå§é®ä»¶-
åä»¶äºº:Max Kuhn [via R] ml-node+1013265-480375...@n4.nabble.com
åéæ¶é´:2010å¹´1æ14æ¥ ææå
æ¶ä»¶äºº:bbslover dlu...@yeah.net
ä¸»é¢:Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold 
cross-validation?

In caret, see ?trainControl. Use returnResamp = all 

Max 

On Wed, Jan 13, 2010 at 9:47 AM, bbslover [hidden email] wrote: 

 
  Hello, 
   I am learning randomForest, now I want to boxplot mse and mtry using 20 
 5-fold cross-validation(using median value), but I have no a good method to 
 do it, except a not good method. 
 
 randomforest package itself did not contain cross-validating method, and 
 caret package contain cross validation method, but how can I get the the all 
 number of mtry , at the same time corresponding mse? 
 
 
 -- 
 View this message in 
 context:http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html
 Sent from the R help mailing list archive at Nabble.com. 
 
 __ 
[hidden email]mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code. 
 



-- 

Max 

__ 
[hidden email]mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 



View message 
@http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013265.html
To unsubscribe from Help, How can I boxplot mse and mtry using 20 5-fold 
cross-validation?,click here. 


-- 
View this message in context: 
http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013515.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bootstrap for correlation coefficient

2010-01-13 Thread Roslina Zakaria

I have the following code:
 
## to check correlation between the simulated uniform data
x2 - uni[,1] ; x2[1:10]
y2 - uni[,2] ; y2[1:10]
result2 - boot(cbind(x2,y2), f, 20)
# get 95% confidence interval 
boot.ci(result2, type=bca)
cor.test(x2,y2, method=pearson, conf.level=0.95)
 
part of my data:
 
 x2 - uni[,1] ; x2[1:10]
 [1] 0.63933145 0.71677785 0.02181925 0.15913391 0.61021930 0.72878176 
0.22237891 0.28178186 0.75503612 0.54928692
 y2 - uni[,2] ; y2[1:10]
 [1] 0.65754240 0.49263876 0.01352257 0.19195681 0.65759797 0.89813660 
0.24582441 0.12900017 0.78982501 0.68676534

## Result
 result2 - boot(cbind(x2,y2), f, 20)
 result2
ORDINARY NONPARAMETRIC BOOTSTRAP

Call:
boot(data = cbind(x2, y2), statistic = f, R = 20)

Bootstrap Statistics :
    original   bias    std. error
t1* 0.891797 -0.005272889  0.01198383
 
Not sure about this:
 
 boot.ci(result2, type=bca)
Error in bca.ci(boot.out, conf, index[1], L = L, t = t.o, t0 = t0.o, h = h,  : 
  estimated adjustment 'a' is NA

 cor.test(x2,y2, method=pearson, conf.level=0.95)
    Pearson's product-moment correlation
data:  x2 and y2 
t = 51.7391, df = 689, p-value  2.2e-16
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 0.8754420 0.9061121 
sample estimates:
 cor 
0.891797

My question is when I want to find the confidence interval why it gives me such 
message?
How do I get the p-value from the bootstrap?
 
Thank you so much


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 112 matches

Mail list logo