date:20100904

Re: [R] Package wavelets

2010-09-04 Thread Marize Simões


Hi,

In the decomposition of the dwt When I generate the out  their levels goes
of the 0 to 15 in the decompositions And i like to known how i do to
visualise In the out  the most concern levels for me
for exemple levels 7 to 14.  I like to can say what levels I want visualise.
Is it possible in the dwt?

Marize Simões
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Package-wavelets-tp2526023p2526505.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Query regarding Windows based statistical software development using R as programming language

2010-09-04 Thread Soumen Pal

Hi,

I am a beginner in R. I have a query as below:

Is it possible to develop a Windows based statistical software
(user-friendly) like SPSS using R as a programming language?

Otherwise, is it possible to use R code directly (no command-line
execution) in Windows based programming language such as Visual Basic?
Please help me, if possible, with some link to study materials related to
such topic.

-- 
Thanks  Regards,

Soumen Pal

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to generate integers from uniform distribution with fixed mean

2010-09-04 Thread Yi

Sorry I forgot to talk about the range.

But as an example, range (17,23) works.

In your codes, mean is not exactly 20 and the samples are not integer.
However, what I want is integers with mean 20 exactly.

Any tips?

Thanks

On Thu, Sep 2, 2010 at 12:16 AM, Barry Rowlingson 
b.rowling...@lancaster.ac.uk wrote:

  On Thu, Sep 2, 2010 at 7:17 AM, Yi liuyi.fe...@gmail.com wrote:
  Hi, folks,
 
  runif (n,min,max) is the typical code for generate R.V from uniform dist.
 
  But what if we need to fix the mean as 20, and we want the values to be
  integers only?

  It's not clear what you want. Uniformly random integers with expected
 mean 20 - but what range? Any range centred on 20 will work, for
 example you could use sample() with replacement. To see the
 distribution, use sample()

  table(sample(17:23,1,TRUE))

  which gives a uniform distribution of integers from 17 to 23, so the
 mean is 20.0057 for 1 samples.

  Is that what you want?

 Barry


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to generate integers from uniform distribution with fixed mean

2010-09-04 Thread Barry Rowlingson

On Sat, Sep 4, 2010 at 8:07 AM, Yi liuyi.fe...@gmail.com wrote:
 Sorry I forgot to talk about the range.

 But as an example, range (17,23) works.

 In your codes, mean is not exactly 20 and the samples are not integer.

 The samples *are* integers. sample(17:23,1,TRUE) returns integers.

 However, what I want is integers with mean 20 exactly.

 Any tips?


 Well, something will have to go. You can't have a random uniform
sample of integers within a given range and have an exact mean every
time.

 Suppose your range was -1 to 1, so possible values -1,0,1, and you
want integer mean 0. The only way to do that is to have equal numbers
of -1s and +1s in your sample, and the number of zeros is irrelevant -
you could have 5000 zeroes and the mean would still be 0 if you had 25
-1s and 25 +1s - thats clearly not a uniform distribution, and you'll
have to impose certain conditions if that's what you want.

 By extension, as long as you have an odd number of integers in your
sample and you want the mean to be the median value (so in the 17:23
example, mean of 20) it is sufficient to generate the same number of
17s as 23s, the same number of 18s as 22s, the same number of 19s as
21s, and as many 20s as you like.

 Not exactly sure of the maths for non-median means, you'd have to
pick fewer values on one side to cancel out the extra weight on the
other. But given that this 'distribution' is going to be weird in many
ways, perhaps you should answer the question: Why?

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to generate integers from uniform distribution with

2010-09-04 Thread Ted Harding

There is still ambiguity (and I think some misunderstanding)
in your query! First, Barry's code does yield integers as the
values in the sample. As a smaller illustrative example:

  x - sample(17:23,20,TRUE)

will give results like

  x
  # [1] 21 17 23 21 17 17 19 18 17 17 17 22 20 23 20 20 18 20 19 20

which are all integers.

Secondly, in general, the mean of the sampled numbers will not be 20
exactly, even though their *expected* mean is 20:

  mean(x)
  # [1] 19.3

Barry gave an example of a sample size so large that the mean would
very probably be extremely close to 20 (20.0057 when he did it).
This will of course vary from sample to sample:

  mean(sample(17:23,1,TRUE))
  # [1] 19.9991
  mean(sample(17:23,1,TRUE))
  # [1] 20.031
  mean(sample(17:23,1,TRUE))
  # [1] 20.0207
  mean(sample(17:23,1,TRUE))
  # [1] 19.9819

You say: However, what I want is integers with mean 20 exactly.
This is ambiguous. On the one hand, Barry's procedure samples
integers from (17,18,19,20,21,22,23) with equal probability,
a distribution which has mean exactly 20 *as the distribution
which is being sampled from*, although the mean of the values
in any particular sample will very probably not be exactly 20.
So, in that sense, Barry's procedure does give you a *method*
of sampling integers which has mean 20 exactly.

On the other hand, a possible interpretation of what you say
is that you want every sample to be such that, after you have
obtained the sample (say 'x'), then mean(x) = 20 exactly (as
opposed to what you will get from Barry's code, where the mean
will be close to, but almost never equal to, 20).

If that is what you want, then it is more tricky to acieve.
You are then effectively sampling from the conditional distribution:
X1, X2, ... , Xn uniformly distributed on (17:23) conditional on
X1 + X2 + ... Xn = 20*n.

This can be done, but before working out how to do it one would
need to be assured that this really is what you mean!

Ted.

On 04-Sep-10 07:07:41, Yi wrote:
 Sorry I forgot to talk about the range.
 
 But as an example, range (17,23) works.
 
 In your codes, mean is not exactly 20 and the samples are not integer.
 However, what I want is integers with mean 20 exactly.
 
 Any tips?
 Thanks
 
 On Thu, Sep 2, 2010 at 12:16 AM, Barry Rowlingson 
 b.rowling...@lancaster.ac.uk wrote:
 
  On Thu, Sep 2, 2010 at 7:17 AM, Yi liuyi.fe...@gmail.com wrote:
  Hi, folks,
 
  runif (n,min,max) is the typical code for generate R.V from uniform
  dist.
 
  But what if we need to fix the mean as 20, and we want the values
  to be integers only?

  It's not clear what you want. Uniformly random integers with expected
 mean 20 - but what range? Any range centred on 20 will work, for
 example you could use sample() with replacement. To see the
 distribution, use sample()

  table(sample(17:23,1,TRUE))

  which gives a uniform distribution of integers from 17 to 23, so the
 mean is 20.0057 for 1 samples.

  Is that what you want?

 Barry

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 04-Sep-10   Time: 08:56:41
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Function try and Results of a program

2010-09-04 Thread Evgenia


Hello, users.

Dear users,

***I have a function f to simulate data from a model (example below used
only to show my problems)

f-function(n,mean1){
a-matrix(rnorm(n, mean1 , sd = 1),ncol=5)
b-matrix(runif(n),ncol=5)
data-rbind(a,b)
out-data
out}

*I want to simulate 1000 datasets (here only 5) so I use
S-list()

for (i in 1:5){
S[[i]]-f(n=10,mean1=0)}

**I have a very complicated function  for estimation of a model which I
want to apply to Each one of the above simulated datasets

fun-function(data){data-as.matrix(data)
sink(' Example.txt',append=TRUE)
  cat(\n***\nEstimation
\n\nDataset Sim : ,
i )
d-data%*%t(data)
s-solve(d)
print(s)
out-list (s,d)
out
}
results-list()
for(i in 1:5){
 tmp - try(fun(data=S[[i]]))
 results[[i]] - ifelse(is(tmp,try-error),NA,tmp)
}

My problem is that results have only the 1st element of the result lists
of fun (i.e. only although tmp gives me both s and d. 

Thanks 

Evgenia


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Function-try-and-Results-of-a-program-tp2526621p2526621.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R program google search

2010-09-04 Thread Duncan Temple Lang

Hi there

One way to use Google's search service from R is

libary(RCurl)
library(RJSONIO)  # or library(rjson)

val = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = 
Google search AJAX , v = 1.0)
results = fromJSONIO(val)

Google requests that you provide your GoogleAPI key

  val = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = 
Google search AJAX , v = 1.0,
  k= my google api key)

Similarly, you should provide header information to identify your application, 
e.g

xx = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = Google 
search AJAX , v = 1.0,
 .opts = list(useragen = RGoogleSearch, verbose = TRUE))




  D.

On 9/3/10 10:33 PM, Waverley @ Palo Alto wrote:
 My question is how to use R to program google search.
 I found this information:
 The SOAP Search API was created for developers and researchers
 interested in using Google Search as a resource in their
 applications.  Unfortunately google no longer supports that.  They
 are supporting the AJAX Search API.  What about R?
 
 Thanks.
 
 
 
 On Fri, Sep 3, 2010 at 2:23 PM, Waverley @ Palo Alto
 waverley.paloa...@gmail.com wrote:
 Hi,

 Can someone help as how to use R to program google search in the R
 code?  I know that other languages can allow or have the google search
 API

 If someone can give me some links or sample code I would greatly appreciate.

 Thanks.

 --
 Waverley @ Palo Alto

 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] return from .Call()

2010-09-04 Thread raje...@cse.iitm.ac.in


Hi,

I have a .Call in my R function in a loop that repeats a certain number of 
times. Each time, the .Call returns a list. So, when I say something like,

y-func()

would y be a list of lists?(as many as the number of loops?)
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] limit on read.socket?

2010-09-04 Thread raje...@cse.iitm.ac.in


Hi,

I have the following piece of code,

repeat{

ss-read.socket(sockfd);
if(ss==) break
output-paste(output,ss)
}

but somehow, output is not receiving all the data that is coming through the 
socket.My suspicion is on the if statement. what happens if a white space 
occurs in between the string arriving over the socket?
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to free memory? (gc() doesn't work for me)

2010-09-04 Thread jim holtman

Seems to work for me:

 x - matrix(0,1,1)
 object.size(x)
80112 bytes
 gc()
used  (Mb) gc trigger  (Mb)  max used  (Mb)
Ncells174104   4.7 741108  19.8741108  19.8
Vcells 101761938 776.4  113632405 867.0 102762450 784.1
 rm(x)
 gc()
  used (Mb) gc trigger  (Mb)  max used  (Mb)
Ncells  174202  4.7 741108  19.8741108  19.8
Vcells 1761954 13.5   90905923 693.6 102762450 784.1


On Sat, Sep 4, 2010 at 12:46 AM, Hyunchul Kim
hyunchul.kim@gmail.com wrote:
 Hi, all

 I have a huge object that use almost all of available memory.

 R rm(a_huge_object)
 R gc()

 doesn't free memory and ?gc doesn't show anything.

 Are there any suggestion?

 Thanks in advance,

 Regards,

 Hyunchul

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Levels in returned data.frame after subset

2010-09-04 Thread Ulrik Stervbo

Dear List,

When I subset a data.frame, the levels are not re-adjusted (see
example). Why is this? Am I missing out on some basic stuff here?

Thanks
Ulrik


 m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = 
 c(91,99, 74))
 dim(m)
[1] 3 3

 levels(m$gender)
[1] F M

 s - subset(m, m$gender == M)
 dim(s)
[1] 2 3

 levels(s$gender)
[1] F M

 cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor)
 dim(s)
[1] 2 3

 levels(s$gender)
[1] M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] return from .Call()

2010-09-04 Thread Barry Rowlingson

On Sat, Sep 4, 2010 at 10:17 AM, raje...@cse.iitm.ac.in
raje...@cse.iitm.ac.in wrote:

 Hi,

 I have a .Call in my R function in a loop that repeats a certain number of 
 times. Each time, the .Call returns a list. So, when I say something like,

 y-func()

 would y be a list of lists?(as many as the number of loops?)

No, it'll be the last thing evaluated or the result of a return()
call. Why haven't you tried this?

Try a simple example:

func = function(){
  for(i in 1:10){
 z=list(a=1,b=2)
   }
}

and see what comes back. My suspicion is its not going to be a list of lists.

 If you want a list of lists then you'll have to put the list together
yourself from the returns of the .Call, something like (not tested,
but looks okay):

 func = function(){
   ret=list()
   for(i in 1:10){
 ret[[i]]=list(i,i*2,i*3)
}
return(ret)
  }

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] tail.matrix returns matrix, while tail.mts return vector

2010-09-04 Thread mat


Hi

I have a few problems with tail/head when applied to multiple time 
series. I'm not sure as whether I did not understand the function or 
whether it correspond to an unexpected behavior.


When head(a,n) is applied on data.frame or matrix, it returns a 
data-frame or matrix with first n obs of *each* variable. When applied 
to a mts object, it returns first n obs of *first* variable only,  not 
of all... The same for tail(). See:


head(freeny)
###mts object
head(EuStockMarkets)
#is equivalent to:
head(EuStockMarkets[,1])

I guess it comes from absence of a head method for mts. Does it seem 
reasonable to have also a head.mts or did I misunderstand something?


Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Levels in returned data.frame after subset

2010-09-04 Thread Ista Zahn

Hi Ulrik

On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo ulrik.ster...@gmail.com wrote:
 Dear List,

 When I subset a data.frame, the levels are not re-adjusted (see
 example). Why is this? Am I missing out on some basic stuff here?

Only that this issue has come up many times before, and that this list
is archived and searchable. Try

RSiteSearch(subset drop levels, restrict = c(Rhelp10, Rhelp08, Rhelp02))


-Ista


 Thanks
 Ulrik


 m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = 
 c(91,99, 74))
 dim(m)
 [1] 3 3

 levels(m$gender)
 [1] F M

 s - subset(m, m$gender == M)
 dim(s)
 [1] 2 3

 levels(s$gender)
 [1] F M

 cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor)
 dim(s)
 [1] 2 3

 levels(s$gender)
 [1] M

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tail.matrix returns matrix, while tail.mts return vector

2010-09-04 Thread Ista Zahn

Hi Mat,
You might be able to use the matrix method to get what you want.

head.matrix(EuStockMarkets)

-Ista

On Sat, Sep 4, 2010 at 1:15 PM, mat matthieu.stig...@gmail.com wrote:
 Hi

 I have a few problems with tail/head when applied to multiple time series.
 I'm not sure as whether I did not understand the function or whether it
 correspond to an unexpected behavior.

 When head(a,n) is applied on data.frame or matrix, it returns a data-frame
 or matrix with first n obs of *each* variable. When applied to a mts object,
 it returns first n obs of *first* variable only,  not of all... The same for
 tail(). See:

 head(freeny)
 ###mts object
 head(EuStockMarkets)
 #is equivalent to:
 head(EuStockMarkets[,1])

 I guess it comes from absence of a head method for mts. Does it seem
 reasonable to have also a head.mts or did I misunderstand something?

 Thanks

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Luis Miguel Delgado Gomez/BBK está ausente d e la oficina.

2010-09-04 Thread Luis Miguel Delgado Gomez



Estaré ausente de la oficina desde el  03/09/2010 y no volveré hasta el
11/10/2010.

Responderé a su mensaje cuando regrese.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] A basic question in model/formula specification

2010-09-04 Thread telm8


Hi,

I am currently trying to fit a multinomial logit model on my data. I have
tried to search for some example, and this is the one that I followed and
worked. 

http://www.ats.ucla.edu/stat/r/dae/mlogit.htm

However, I am having difficulties finding out the meaning of the model
specified in the following line:

mlogit.model- mlogit(brand~1|female+age, data = mldata, reflevel=1)

The main issue is the |. I found out that it means multi-part formula but
I have no idea what it means mathematically in this particular case. Can
anyone enlighten me?

Many thanks



-- 
View this message in context: 
http://r.789695.n4.nabble.com/A-basic-question-in-model-formula-specification-tp2526765p2526765.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Function try and Results of a program

2010-09-04 Thread David Winsemius



On Sep 4, 2010, at 6:10 AM, Evgenia wrote:



Hello, users.

Dear users,

***I have a function f to simulate data from a model (example  
below used

only to show my problems)

f-function(n,mean1){
a-matrix(rnorm(n, mean1 , sd = 1),ncol=5)
b-matrix(runif(n),ncol=5)
data-rbind(a,b)
out-data
out}

*I want to simulate 1000 datasets (here only 5) so I use
S-list()

for (i in 1:5){
S[[i]]-f(n=10,mean1=0)}

**I have a very complicated function  for estimation of a model  
which I

want to apply to Each one of the above simulated datasets

fun-function(data){data-as.matrix(data)
sink(' Example.txt',append=TRUE)
 cat(\n***\nEstimation
\n\nDataset Sim : ,
   i )
d-data%*%t(data)
s-solve(d)
print(s)
out-list (s,d)
out
}
results-list()
for(i in 1:5){
tmp - try(fun(data=S[[i]]))
results[[i]] - ifelse(is(tmp,try-error),NA,tmp)
}

My problem is that results have only the 1st element of the  
result lists

of fun (i.e. only although tmp gives me both s and d.


Two problems:
One:  is the misguided use of unmatched sink calls resulting in an  
accumulation of diversions of the R output. If your run that at the  
console you need to type sink() five times to get any response back  
from the console.


Two: the misguided use of ifelse when you should be using if () 
{}else{} to test a single condition and execute conditional  
assignment. ifelse if for working with vectors, not with lists.


Suggestions:
use the append = TRUE parameter to sink and unsink at the end of that  
function


I'm not sure about how you are using the test for error but since you  
did not construct any errors I cannot really be too sure. If it is  
working  for you then use this instead:


if (is(tmp,try-error) ){results[[i]] - NA} else{results[[i]] - tmp}

--
David.




Thanks

Evgenia


--
View this message in context: 
http://r.789695.n4.nabble.com/Function-try-and-Results-of-a-program-tp2526621p2526621.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Decision Tree in Python or C++?

2010-09-04 Thread noclue_



Have anybody used Decision Tree in Python or C++?  (or written their own 
decision tree implementation in Python or C++)?  My goal is to run decision
tree on 8 million obs as training set and score 7 million in test set.

I am testing 'rpart' package on a 64-bit-Linux + 64-bit-R environment. But 
it seems that rpart is either not stable or running out of memory very 
quickly. (Is it because R is passing everything as copy instead of as object
reference?)

Any idea would be greatly appreciated!  

Have a nice weekend!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Decision-Tree-in-Python-or-C-tp2526810p2526810.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Function try and Results of a program

2010-09-04 Thread Evgenia


David, 

your suggestion about try works  perfect for me.

I still have a problem with sink. Could you explain me better your
suggestion?

Thanks alot

Evgenia
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Function-try-and-Results-of-a-program-tp2526621p2526822.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Function try and Results of a program

2010-09-04 Thread David Winsemius



On Sep 4, 2010, at 12:41 PM, Evgenia wrote:



David,

your suggestion about try works  perfect for me.

I still have a problem with sink. Could you explain me better your
suggestion?



When you sink to a file, you will continue sending console output to  
that file until you issue sink(). And every time you do it it creates  
an extra layer of redirection (the help page calls these diversions)  
that will need to be undone to get back to regular console behavior.


?sink # yes, one needs to R~all~TM

If you wanted a record of what that function was doing you would need  
to:


a) initialize the file with append=FALSE outside the loop (not sure if  
you need to do that, but it does help to get rid of earlier failed  
efforts as well

b) open the sink file with append=TRUE inside the function
c) cat() the two matrices separately since lists cannot be cat()- 
ted,,, and

d)unsink with sink() at the end of the function.


 sink(example.txt, append=FALSE); cat(\n ); sink() #blank line to  
initialize


 fun-function(data){ data-as.matrix(data)
  sink(example.txt, append=TRUE); cat(\nEstimate : , i, \n )
  d-data%*%t(data); cat(d= \n,d, \n)
  s-solve(d);   cat(s= \n,s, \n)
  out-list(s=s,d=d);  sink()
  return(out)
 }



View this message in context: 
http://r.789695.n4.nabble.com/Function-try-and-Results-of-a-program-tp2526621p2526822.html
Sent from the R help mailing list archive at Nabble.com.

--

David Winsemius, MD
West Hartford, CT


#
An unfortunate effect of Nabble use is that it leads one to believe  
that the entire world sees your earlier postings:

#-
f-function(n,mean1){
a-matrix(rnorm(n, mean1 , sd = 1),ncol=5)
b-matrix(runif(n),ncol=5)
data-rbind(a,b)
out-data
out}

*I want to simulate 1000 datasets (here only 5) so I use
S-list()

for (i in 1:5){
S[[i]]-f(n=10,mean1=0)}

**I have a very complicated function  for estimation of a model  
which I

want to apply to Each one of the above simulated datasets

fun-function(data){data-as.matrix(data)
sink(' Example.txt',append=TRUE)
 cat(\n***\nEstimation
\n\nDataset Sim : ,
   i )
d-data%*%t(data)
s-solve(d)
print(s)
out-list (s,d)
out
}
results-list()
for(i in 1:5){
tmp - try(fun(data=S[[i]]))
results[[i]] - ifelse(is(tmp,try-error),NA,tmp)
}

My problem is that results have only the 1st element of the result  
lists

of fun (i.e. only although tmp gives me both s and d.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What solve() does?

2010-09-04 Thread Paul Johnson

On Wed, Sep 1, 2010 at 5:36 AM, Petar Milin pmi...@ff.uns.ac.rs wrote:
 Hello!
 Can anyone explain me what solve() function does: Gaussian elimination or
 iterative, numeric solve? In addition, I would need both the Gaussian
 elimination and iterative solution for the course. Are the two built in R?

 Thanks!

 PM

Hello, Petar:

I think you are assuming that solve uses an elementary linear algebra
paper and pencil procedure, but I don't think it does.  In a digital
computer, those things are not precise, and I think the folks here
will even say you shouldn't use solve to get an inverse, but I can't
remember all of the details.

To see how solve works ...

Let me show you a trick I just learned. Read

?solve

notice it is a generic method, meaning it does not actually do the
calculations for you. Rather, there are specific implementations for
different types of cases. To find the implementations, run

methods(solve)

I get:

 methods(solve)
[1] solve.default solve.qr

Then if you want to read HOW solve does what it does (which I think
was your question), run this:

 solve.default

or

 solve.qr

In that code, you will see the chosen procedure depends on the linear
algebra libraries you make available.  I'm no expert on the details,
but it appears QR decomposition is the preferred method.  You can read
about that online or in numerical algebra books.



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Paul Johnson

I've been doing some consulting with students who seem to come to R
from SAS.  They are usually pre-occupied with do loops and it is tough
to persuade them to trust R lists rather than keeping 100s of named
matrices floating around.

Often it happens that there is a list with lots of matrices or data
frames in it and we need to stack those together.  I thought it
would be a simple thing, but it turns out there are several ways to
get it done, and in this case, the most elegant way using do.call is
not the fastest, but it does appear to be the least prone to
programmer error.

I have been staring at ?do.call for quite a while and I have to admit
that I just need some more explanations in order to interpret it.  I
can't really get why this does work

do.call( rbind, mylist)

but it does not work to do

sapply ( mylist, rbind).

Anyway, here's the self contained working example that compares the
speed of various approaches.  If you send yet more ways to do this, I
will add them on and then post the result to my Working Example
collection.

## stackMerge.R
## Paul Johnson pauljohn at ku.edu
## 2010-09-02


## rbind is neat,but how to do it to a lot of
## data frames?

## Here is a test case

df1 - data.frame(x=rnorm(100),y=rnorm(100))
df2 - data.frame(x=rnorm(100),y=rnorm(100))
df3 - data.frame(x=rnorm(100),y=rnorm(100))
df4 - data.frame(x=rnorm(100),y=rnorm(100))

mylist -  list(df1, df2, df3, df4)

## Usually we have done a stupid
## loop  to get this done

resultDF - mylist[[1]]
for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]])

## My intuition was that this should work:
## lapply( mylist, rbind )
## but no! It just makes a new list

## This obliterates the columns
## unlist( mylist )

## I got this idea from code in the
## complete function in the mice package
## It uses brute force to allocate a big matrix of 0's and
## then it places the individual data frames into that matrix.

m - 4
nr - nrow(df1)
nc - ncol(df1)
dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]



## I searched a long time for an answer that looked better.
## This website is helpful:
## http://stackoverflow.com/questions/tagged/r
## I started to type in the question and 3 plausible answers
## popped up before I could finish.

## The terse answer is:
shortAnswer - do.call(rbind,mylist)

## That's the right answer, see:

shortAnswer == dataComplete
## But I don't understand why it works.

## More importantly, I don't know if it is fastest, or best.
## It is certainly less error prone than dataComplete

## First, make a bigger test case and use system.time to evaluate

phony - function(i){
  data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000))
}
mylist - lapply(1:1000, phony)


### First, try the terse way
system.time( shortAnswer - do.call(rbind, mylist) )


### Second, try the complete way:
m - 1000
nr - nrow(df1)
nc - ncol(df1)

system.time(
   dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 )

system.time(
   for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]
)


## On my Thinkpad T62 dual core, the shortAnswer approach takes about
## three times as long:


##  system.time( bestAnswer - do.call(rbind,mylist) )
##user  system elapsed
##  14.270   1.170  15.433

##  system.time(
## +dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
## +  )
##user  system elapsed
##   0.000   0.000   0.006

##  system.time(
## + for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]
## + )
##user  system elapsed
##   4.940   0.050   4.989


## That makes the do.call way look slow, and I said hey,
## our stupid for loop at the beginning may not be so bad.
## Wrong. It is a disaster.  Check this out:


##  resultDF - phony(1)
##  system.time(
## + for (i in 2:1000) resultDF - rbind(resultDF, mylist[[i]])
## +)
##user  system elapsed
## 159.740   4.150 163.996


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Erik Iverson


On 09/04/2010 01:37 PM, Paul Johnson wrote:

I've been doing some consulting with students who seem to come to R
from SAS.  They are usually pre-occupied with do loops and it is tough
to persuade them to trust R lists rather than keeping 100s of named
matrices floating around.

Often it happens that there is a list with lots of matrices or data
frames in it and we need to stack those together.  I thought it
would be a simple thing, but it turns out there are several ways to
get it done, and in this case, the most elegant way using do.call is
not the fastest, but it does appear to be the least prone to
programmer error.

I have been staring at ?do.call for quite a while and I have to admit
that I just need some more explanations in order to interpret it.  I
can't really get why this does work

do.call( rbind, mylist)


do.call is *constructing* a function call from the list of arguments,
my.list.

It is shorthand for

rbind(mylist[[1]], mylist[[2]], mylist[[3]]) assuming mylist has
3 elements.




but it does not work to do

sapply ( mylist, rbind).


That's because sapply is calling rbind once for each item
in mylist, not what you want to do to accomplish your goal.


It might help to use a debugging technique to watch when
rbind gets called, and see how many times it gets called
and with what arguments using those two approaches.




Anyway, here's the self contained working example that compares the
speed of various approaches.  If you send yet more ways to do this, I
will add them on and then post the result to my Working Example
collection.

## stackMerge.R
## Paul Johnsonpauljohn at ku.edu
## 2010-09-02


## rbind is neat,but how to do it to a lot of
## data frames?

## Here is a test case

df1- data.frame(x=rnorm(100),y=rnorm(100))
df2- data.frame(x=rnorm(100),y=rnorm(100))
df3- data.frame(x=rnorm(100),y=rnorm(100))
df4- data.frame(x=rnorm(100),y=rnorm(100))

mylist-  list(df1, df2, df3, df4)

## Usually we have done a stupid
## loop  to get this done

resultDF- mylist[[1]]
for (i in 2:4) resultDF- rbind(resultDF, mylist[[i]])

## My intuition was that this should work:
## lapply( mylist, rbind )
## but no! It just makes a new list

## This obliterates the columns
## unlist( mylist )

## I got this idea from code in the
## complete function in the mice package
## It uses brute force to allocate a big matrix of 0's and
## then it places the individual data frames into that matrix.

m- 4
nr- nrow(df1)
nc- ncol(df1)
dataComplete- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ]- mylist[[j]]



## I searched a long time for an answer that looked better.
## This website is helpful:
## http://stackoverflow.com/questions/tagged/r
## I started to type in the question and 3 plausible answers
## popped up before I could finish.

## The terse answer is:
shortAnswer- do.call(rbind,mylist)

## That's the right answer, see:

shortAnswer == dataComplete
## But I don't understand why it works.

## More importantly, I don't know if it is fastest, or best.
## It is certainly less error prone than dataComplete

## First, make a bigger test case and use system.time to evaluate

phony- function(i){
   data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000))
}
mylist- lapply(1:1000, phony)


### First, try the terse way
system.time( shortAnswer- do.call(rbind, mylist) )


### Second, try the complete way:
m- 1000
nr- nrow(df1)
nc- ncol(df1)

system.time(
dataComplete- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
  )

system.time(
for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ]- mylist[[j]]
)


## On my Thinkpad T62 dual core, the shortAnswer approach takes about
## three times as long:


##  system.time( bestAnswer- do.call(rbind,mylist) )
##user  system elapsed
##  14.270   1.170  15.433

##  system.time(
## +dataComplete- as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
## +  )
##user  system elapsed
##   0.000   0.000   0.006

##  system.time(
## + for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ]- mylist[[j]]
## + )
##user  system elapsed
##   4.940   0.050   4.989


## That makes the do.call way look slow, and I said hey,
## our stupid for loop at the beginning may not be so bad.
## Wrong. It is a disaster.  Check this out:


##  resultDF- phony(1)
##  system.time(
## + for (i in 2:1000) resultDF- rbind(resultDF, mylist[[i]])
## +)
##user  system elapsed
## 159.740   4.150 163.996




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Save data as .pdf or .JPG

2010-09-04 Thread Paul Johnson

On Wed, Sep 1, 2010 at 7:56 AM, khush  bioinfo.kh...@gmail.com wrote:
 Hi all ,

 I have following script to plot some data.

 plot( c(1,1100), c(0,15), type='n', xlab='', ylab='', ylim=c(0.1,25) ,
 las=2)
 axis (1, at = seq(0,1100,50), las =2)
 axis (2, at = seq(0,25,1), las =2)

 When I source(script.R), I got the image on interface but I do not want to
 use screenshot option to save the image? How can save the output to .pdf or
 .jpg format?

 Thank you
 Khushwant

Hi!  This is one of the things that is difficult for newcomers. I've
written down a pretty thorough answer:

http://pj.freefaculty.org/R/Rtips.html#5.2


pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Save data as .pdf or .JPG

2010-09-04 Thread lanczos

On Sat, 2010-09-04 at 13:57 -0500, Paul Johnson wrote:
On Wed, Sep 1, 2010 at 7:56 AM, khush  bioinfo.kh...@gmail.com
wrote:
  Hi all ,
 
  I have following script to plot some data.
 
  plot( c(1,1100), c(0,15), type='n', xlab='', ylab='', ylim=c(0.1,25) ,
  las=2)
  axis (1, at = seq(0,1100,50), las =2)
  axis (2, at = seq(0,25,1), las =2)
 
  When I source(script.R), I got the image on interface but I do not
want to
  use screenshot option to save the image? How can save the output to
.pdf or
  .jpg format?
 
  Thank you
  Khushwant

 Hi!  This is one of the things that is difficult for newcomers. I've
 written down a pretty thorough answer:

 http://pj.freefaculty.org/R/Rtips.html#5.2


 pj

Very nice text, indeed. As my 2 cents I strongly recommend to use for
graphs the .png format, .jpg is primarily designed for photographs.

Have a nice time

Tomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What solve() does?

2010-09-04 Thread David Winsemius



On Sep 4, 2010, at 2:29 PM, Petar Milin wrote:


Thank you so much! This is very useful!
Any thoughts about how to run Gaussian elimination?


Do some searching?

RSiteSearch(gaussian elimination, restrict = c(Rhelp10, Rhelp08,  
Rhelp02, functions  ) )


 returns (among other things) a link to a John Fox post from 2005:

http://finzi.psych.upenn.edu/R/Rhelp02/archive/49950.html

--
David.


Best,
PM

On 04/09/10 20:23, Paul Johnson wrote:
On Wed, Sep 1, 2010 at 5:36 AM, Petar Milinpmi...@ff.uns.ac.rs   
wrote:



Hello!
Can anyone explain me what solve() function does: Gaussian  
elimination or
iterative, numeric solve? In addition, I would need both the  
Gaussian
elimination and iterative solution for the course. Are the two  
built in R?


Thanks!




PM


Hello, Petar:

I think you are assuming that solve uses an elementary linear algebra
paper and pencil procedure, but I don't think it does.  In a  
digital

computer, those things are not precise, and I think the folks here
will even say you shouldn't use solve to get an inverse, but I can't
remember all of the details.

To see how solve works ...

Let me show you a trick I just learned. Read

?solve

notice it is a generic method, meaning it does not actually do the
calculations for you. Rather, there are specific implementations for
different types of cases. To find the implementations, run

methods(solve)

I get:



methods(solve)


[1] solve.default solve.qr

Then if you want to read HOW solve does what it does (which I think
was your question), run this:



solve.default


or



solve.qr


In that code, you will see the chosen procedure depends on the linear
algebra libraries you make available.  I'm no expert on the details,
but it appears QR decomposition is the preferred method.  You can  
read

about that online or in numerical algebra books.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] non-zero exit status error when install GenomeGraphs

2010-09-04 Thread chen chao

Hi,

I am trying to install GenomeGraphs package from bioconductor, but failed by
a non-zero exit error. From the error message, it seems that there is a
shared library problem. Any suggestion on fixing it? Thanks so much.

 sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.iso885915   LC_NUMERIC=C
 [3] LC_TIME=en_US.iso885915LC_COLLATE=en_US.iso885915
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.iso885915
 [7] LC_PAPER=en_US.iso885915   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.10.1
 source(http://bioconductor.org/biocLite.R;)
biocLite(GenomeGraphs)Warning messages:
1: In safeSource() : Redefining 'biocinstall'
2: In safeSource() : Redefining 'biocinstallPkgGroups'
3: In safeSource() : Redefining 'biocinstallRepos'
 biocLite(GenomeGraphs)
Using R version 2.10.1, biocinstall version 2.5.11.
Installing Bioconductor version 2.5 packages:
[1] GenomeGraphs
Please wait...

Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
  argument 'lib' is missing: using
'/cchome/cchen1/R/x86_64-unknown-linux-gnu-li
brary/2.10'
trying URL '
http://www.bioconductor.org/packages/2.5/bioc/src/contrib/GenomeGrap
hs_1.6.0.tar.gz'
Content type 'application/x-gzip' length 585078 bytes (571 Kb)
opened URL
==
downloaded 571 Kb

* installing *source* package 'GenomeGraphs' ...
** R
** data
** inst
** preparing package for lazy loading
Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared library
'/apps/rhel5/x86_64/R/R-2.10.1//lib64/R/library/
XML/libs/XML.so':
  libxmlsec1.so.1: cannot open shared object file: No such file or directory
Error : .onLoad failed in 'loadNamespace' for 'XML'
Error : package 'biomaRt' could not be loaded
ERROR: lazy loading failed for package 'GenomeGraphs'
* removing
'/userhom2/3/cchen1/R/x86_64-unknown-linux-gnu-library/2.10/GenomeGra
phs'

The downloaded packages are in
'/tmp/Rtmp3wsJxw/downloaded_packages'
Warning message:
In install.packages(pkgs = pkgs, repos = repos, ...) :
  installation of package 'GenomeGraphs' had non-zero exit status




-- 
Chen, Chao
Psychiatry
University of Chicago
924 E 57th St, Chicago, IL 60637
U. S. A.
MOE Key Laboratory of Contemporary Anthropology and Center for
Evolutionary Biology,
School of Life Sciences and Institutes of Biomedical Sciences,
Fudan University
220# Handan Road, Shanghai (200433)
P.R.China

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] non-zero exit status error when install GenomeGraphs

2010-09-04 Thread David Winsemius

(Caveat: I am not a bioc user.) The error messages suggest that you  
are missing dependencies. I looked at the documentation for  
GenomeGraphs and it does not list any dependencies, but I have no way  
of knowing how careful or knowledgeable the authors may or may not  
have bben when they composed that document. The fact that you are  
posting to the wrong mailing list and are not including what version  
of linux (although there is a hint it may be RedHat5) you are running  
suggests you could be fairly new at this.


Is biocLite the correct function for installing a bioc package? It  
appears it may be, but I'm wondering if there is an argument for  
dependencies as there is in install.packages() that you need to set to  
TRUE? If biocLite has a ,... in its argument list (and the error  
message suggests that it does) then you may get better results with  
the same call with an addition of dependencies=TRUE.


Or you could first install the packages that are reported missing: XML  
and biomaRt, and then try again as you did before.


Links to the bioc mailing lists can be found here:

http://www.bioconductor.org/help/index.html

--
David.



On Sep 4, 2010, at 4:07 PM, chen chao wrote:


Hi,

I am trying to install GenomeGraphs package from bioconductor, but  
failed by
a non-zero exit error. From the error message, it seems that there  
is a

shared library problem. Any suggestion on fixing it? Thanks so much.


sessionInfo()

R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu

locale:
[1] LC_CTYPE=en_US.iso885915   LC_NUMERIC=C
[3] LC_TIME=en_US.iso885915LC_COLLATE=en_US.iso885915
[5] LC_MONETARY=C  LC_MESSAGES=en_US.iso885915
[7] LC_PAPER=en_US.iso885915   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.10.1

source(http://bioconductor.org/biocLite.R;)

   biocLite(GenomeGraphs)Warning messages:
1: In safeSource() : Redefining 'biocinstall'
2: In safeSource() : Redefining 'biocinstallPkgGroups'
3: In safeSource() : Redefining 'biocinstallRepos'

   biocLite(GenomeGraphs)

Using R version 2.10.1, biocinstall version 2.5.11.
Installing Bioconductor version 2.5 packages:
[1] GenomeGraphs
Please wait...

Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using
'/cchome/cchen1/R/x86_64-unknown-linux-gnu-li
brary/2.10'
trying URL '
http://www.bioconductor.org/packages/2.5/bioc/src/contrib/GenomeGrap
hs_1.6.0.tar.gz'
Content type 'application/x-gzip' length 585078 bytes (571 Kb)
opened URL
==
downloaded 571 Kb

* installing *source* package 'GenomeGraphs' ...
** R
** data
** inst
** preparing package for lazy loading
Error in dyn.load(file, DLLpath = DLLpath, ...) :
 unable to load shared library
'/apps/rhel5/x86_64/R/R-2.10.1//lib64/R/library/
XML/libs/XML.so':
 libxmlsec1.so.1: cannot open shared object file: No such file or  
directory

Error : .onLoad failed in 'loadNamespace' for 'XML'
Error : package 'biomaRt' could not be loaded
ERROR: lazy loading failed for package 'GenomeGraphs'
* removing
'/userhom2/3/cchen1/R/x86_64-unknown-linux-gnu-library/2.10/GenomeGra
phs'

The downloaded packages are in
   '/tmp/Rtmp3wsJxw/downloaded_packages'
Warning message:
In install.packages(pkgs = pkgs, repos = repos, ...) :
 installation of package 'GenomeGraphs' had non-zero exit status




--
Chen, Chao
Psychiatry
University of Chicago
924 E 57th St, Chicago, IL 60637
U. S. A.
MOE Key Laboratory of Contemporary Anthropology and Center for
Evolutionary Biology,
School of Life Sciences and Institutes of Biomedical Sciences,
Fudan University
220# Handan Road, Shanghai (200433)
P.R.China

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Joshua Wiley

To echo what Erik said, the second argument of do.call(), arg, takes a
list of arguments that it passes to the specified function.  Since
rbind() can bind any number of data frames, each dataframe in mylist
is rbind()ed at once.

These two calls should take about the same time (except for time saved typing):

rbind(mylist[[1]], mylist[[2]], mylist[[3]], mylist[[4]]) # 1
do.call(rbind, mylist) # 2

On my system using:

set.seed(1)
dat - rnorm(10^6)
df1 - data.frame(x=dat, y=dat)
mylist -  list(df1, df1, df1, df1)

They do take about the same time (I started two instances of R and ran
both calls but swithed the order because R has a way of being faster
the second time you do the same thing).

[1] Order: 1, 2
   user  system elapsed
   0.600.140.75
   user  system elapsed
   0.410.140.54
[1] Order: 2, 1
   user  system elapsed
   0.560.210.76
   user  system elapsed
   0.410.140.55

Using the for loop is much slower in your later example because
rbind() is getting called over and over, plus you are incrementally
increasing the size of the object containing your results.

 Often it happens that there is a list with lots of matrices or data
 frames in it and we need to stack those together

For my own curiosity, are you reading in a bunch of separate data
files or are these the results of various operations that you
eventually want to combine?

Cheers,

Josh

On Sat, Sep 4, 2010 at 11:37 AM, Paul Johnson pauljoh...@gmail.com wrote:
 I've been doing some consulting with students who seem to come to R
 from SAS.  They are usually pre-occupied with do loops and it is tough
 to persuade them to trust R lists rather than keeping 100s of named
 matrices floating around.

 Often it happens that there is a list with lots of matrices or data
 frames in it and we need to stack those together.  I thought it
 would be a simple thing, but it turns out there are several ways to
 get it done, and in this case, the most elegant way using do.call is
 not the fastest, but it does appear to be the least prone to
 programmer error.

 I have been staring at ?do.call for quite a while and I have to admit
 that I just need some more explanations in order to interpret it.  I
 can't really get why this does work

 do.call( rbind, mylist)

 but it does not work to do

 sapply ( mylist, rbind).

 Anyway, here's the self contained working example that compares the
 speed of various approaches.  If you send yet more ways to do this, I
 will add them on and then post the result to my Working Example
 collection.

 ## stackMerge.R
 ## Paul Johnson pauljohn at ku.edu
 ## 2010-09-02


 ## rbind is neat,but how to do it to a lot of
 ## data frames?

 ## Here is a test case

 df1 - data.frame(x=rnorm(100),y=rnorm(100))
 df2 - data.frame(x=rnorm(100),y=rnorm(100))
 df3 - data.frame(x=rnorm(100),y=rnorm(100))
 df4 - data.frame(x=rnorm(100),y=rnorm(100))

 mylist -  list(df1, df2, df3, df4)

 ## Usually we have done a stupid
 ## loop  to get this done

 resultDF - mylist[[1]]
 for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]])

 ## My intuition was that this should work:
 ## lapply( mylist, rbind )
 ## but no! It just makes a new list

 ## This obliterates the columns
 ## unlist( mylist )

 ## I got this idea from code in the
 ## complete function in the mice package
 ## It uses brute force to allocate a big matrix of 0's and
 ## then it places the individual data frames into that matrix.

 m - 4
 nr - nrow(df1)
 nc - ncol(df1)
 dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]



 ## I searched a long time for an answer that looked better.
 ## This website is helpful:
 ## http://stackoverflow.com/questions/tagged/r
 ## I started to type in the question and 3 plausible answers
 ## popped up before I could finish.

 ## The terse answer is:
 shortAnswer - do.call(rbind,mylist)

 ## That's the right answer, see:

 shortAnswer == dataComplete
 ## But I don't understand why it works.

 ## More importantly, I don't know if it is fastest, or best.
 ## It is certainly less error prone than dataComplete

 ## First, make a bigger test case and use system.time to evaluate

 phony - function(i){
  data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000))
 }
 mylist - lapply(1:1000, phony)


 ### First, try the terse way
 system.time( shortAnswer - do.call(rbind, mylist) )


 ### Second, try the complete way:
 m - 1000
 nr - nrow(df1)
 nc - ncol(df1)

 system.time(
   dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
  )

 system.time(
   for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]
 )


 ## On my Thinkpad T62 dual core, the shortAnswer approach takes about
 ## three times as long:


 ##  system.time( bestAnswer - do.call(rbind,mylist) )
 ##    user  system elapsed
 ##  14.270   1.170  15.433

 ##  system.time(
 ## +    dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 ## +  )

Re: [R] How to generate integers from uniform distribution with

2010-09-04 Thread Ted Harding

On 04-Sep-10 19:27:54, Yi wrote:
 Enh, I see.
 It totally makes sense.
 Thank you for your perfect explanation.
 Enjoy the long weekend~
 Yi

You're welcome! Earlier I tried an experiment with rejection
sampling, which seems to work well for the case where you want
mean of the sampled values to exactly be the mean of the range
being sampled from. The number of tries, even for a large sample,
was lower than I had anticipated. Example (sample size = 2,
sampled range is {-3,-2,-1,0,1,2,3}, mean = 0 therefore require
that sum of sample = 0):

  S - (-10) ; n - 0
  P - ((-3):3)
  while((S != 0)){
x - sample(P,2,replace=TRUE,prob=c(1,1,1,1,1,1,1))
S - sum(x) ; n - (n+1)
  }
  n

  hist(x)

To get your case of sampling the integers from (17:23) with
sample mean always exactly 20, simply add 20 to the result x
of the above loop.

I found that I got values of n like:
  126, 43, 403, 811, 385, 568, 590, 1758, 317, 456, 643, ...
with every run being completed well within 2 seconds.

Conditioning the value of the sum to be equal to the central
value (0) of the range is also conditioning the value to be
equal to the most probable value of the sum, so the runs will
on average be shortest. Conditioning on a different mean
(say mean = +1 for a sample of size 2, so sum = 2)
would take much much longer (see below).

One can use the Normal approximation to the distribution of
the sum to estimate how long it might take. One value sampled
from ((-3):3) has mean 0 and variance 4.66.. , so the sum of
2 has mean 0 and variance 9.33, hence the probability
that the sum will be 0 is approximated by
  pnorm(0.5,0,sqrt(9.33)) - pnorm(-0.5,0,sqrt(9.33))
  =  0.001305845
so the probability of success with one sample of 2 is
about 1/766 (which is consistent with the above results for n).

On the other hand, conditioning on the mean of x being 1,
i.e. on the sum being 2, the chance of success is
  pnorm(2.5,0,sqrt(9.33)) - pnorm(1.5,0,sqrt(9.33))
which R computes as zero! Hence you have practically no chance
of achieving this within any reasonable time. However, of course,
the SE of the mean is sqrt((sum(P^2)/6)/2) = 0.01527525,
so you are aiming at a point which is about 60 SEs from the mean.

The numbers are more reasonable if, instead of conditioning on
the mean, you condition on the sum (not too far from 0), e.g.
with sample size 2 as before:

1 Sum must be 50, Prob(success) =
  pnorm(50.5,0,sqrt(9.33)) - pnorm(49.5,0,sqrt(9.33))
  = 0.001288472 ~= 1/776

2 Sum must be 100, Prob(success) =
  pnorm(100.5,0,sqrt(9.33)) - pnorm(99.5,0,sqrt(9.33))
  = 0.001237729 ~= 1/808

3 Sum must be 200, Prob(success) =
  pnorm(200.5,0,sqrt(9.33)) - pnorm(199.5,0,sqrt(9.33))
  = 0.001053971 ~= 1/949

4 Sum must be 500, Prob(success) =
  pnorm(500.5,0,sqrt(9.33)) - pnorm(499.5,0,sqrt(9.33))
  = 0.0003421745 ~= 1/2922

and so on. So even aiming at 500 it would on average only take
about 3000 tries to hit it. After that it rapidly becomes less likely.

Ted.

On 04-Sep-10 19:27:54, Yi wrote:
 Enh, I see.
 It totally makes sense.
 Thank you for your perfect explanation.
 Enjoy the long weekend~
 Yi


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 04-Sep-10   Time: 21:53:58
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decision Tree in Python or C++?

2010-09-04 Thread Wensui Liu

for python, please check
http://onlamp.com/pub/a/python/2006/02/09/ai_decision_trees.html

On Sat, Sep 4, 2010 at 11:21 AM, noclue_ tim@netzero.net wrote:


 Have anybody used Decision Tree in Python or C++?  (or written their own
 decision tree implementation in Python or C++)?  My goal is to run decision
 tree on 8 million obs as training set and score 7 million in test set.

 I am testing 'rpart' package on a 64-bit-Linux + 64-bit-R environment. But
 it seems that rpart is either not stable or running out of memory very
 quickly. (Is it because R is passing everything as copy instead of as object
 reference?)

 Any idea would be greatly appreciated!

 Have a nice weekend!
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Decision-Tree-in-Python-or-C-tp2526810p2526810.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
==
WenSui Liu
wens...@paypal.com
statcompute.spaces.live.com
==

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread David Winsemius


Paul;

There is another group of functions that are similar to do.call in  
their action of serial applications of a function to a list or vector.  
They are somewhat more tolerant in that dyadic operators can be used  
as the function argument, whereas do.call is really just expanding the  
second argument The one that is _most_ similar is Reduce()


?Reduce

A somewhat smaller example than ours...
 df1- data.frame(x=rnorm(5),y=rnorm(5))
 df2- data.frame(x=rnorm(5),y=rnorm(5))
 df3- data.frame(x=rnorm(5),y=rnorm(5))
 df4- data.frame(x=rnorm(5),y=rnorm(5))

 mylist-  list(df1, df2, df3, df4)
 Reduce(rbind, mylist)
 x   y
1  -0.40175483 -0.96187409
2   0.76629538  0.92201312
3   2.44535842  0.90634825
4   0.57784258 -2.12756145
5  -1.62083235 -0.96310011
6   0.02625574  1.17684408
7   1.52412427 -0.26432372
snipped remaining rows

 do.call(+, list(1:3))
[1] 1 2 3
 do.call(+, list(a=1:3, b=3:5))
[1] 4 6 8
 do.call(+, list(a=1:3, b=3:5, cc=7:9))
Error in `+`(a = 1:3, b = 3:5, cc = 7:9) :
  operator needs one or two arguments
 Reduce(+, list(a=1:3, b=3:5, cc=7:9))
[1] 11 14 17


Reduce has the capability of accumulate-ing its intermediate results:

 Reduce(+, 1:10)
[1] 55
 Reduce(+, 1:10, accumulate=TRUE)
 [1]  1  3  6 10 15 21 28 36 45 55



On Sep 4, 2010, at 4:41 PM, Joshua Wiley wrote:


To echo what Erik said, the second argument of do.call(), arg, takes a
list of arguments that it passes to the specified function.  Since
rbind() can bind any number of data frames, each dataframe in mylist
is rbind()ed at once.

These two calls should take about the same time (except for time  
saved typing):


rbind(mylist[[1]], mylist[[2]], mylist[[3]], mylist[[4]]) # 1
do.call(rbind, mylist) # 2

On my system using:

set.seed(1)
dat - rnorm(10^6)
df1 - data.frame(x=dat, y=dat)
mylist -  list(df1, df1, df1, df1)

They do take about the same time (I started two instances of R and ran
both calls but swithed the order because R has a way of being faster
the second time you do the same thing).

[1] Order: 1, 2
  user  system elapsed
  0.600.140.75
  user  system elapsed
  0.410.140.54
[1] Order: 2, 1
  user  system elapsed
  0.560.210.76
  user  system elapsed
  0.410.140.55

Using the for loop is much slower in your later example because
rbind() is getting called over and over, plus you are incrementally
increasing the size of the object containing your results.


Often it happens that there is a list with lots of matrices or data
frames in it and we need to stack those together


For my own curiosity, are you reading in a bunch of separate data
files or are these the results of various operations that you
eventually want to combine?

Cheers,

Josh

On Sat, Sep 4, 2010 at 11:37 AM, Paul Johnson pauljoh...@gmail.com  
wrote:

I've been doing some consulting with students who seem to come to R
from SAS.  They are usually pre-occupied with do loops and it is  
tough

to persuade them to trust R lists rather than keeping 100s of named
matrices floating around.

Often it happens that there is a list with lots of matrices or data
frames in it and we need to stack those together.  I thought it
would be a simple thing, but it turns out there are several ways to
get it done, and in this case, the most elegant way using do.call  
is

not the fastest, but it does appear to be the least prone to
programmer error.

I have been staring at ?do.call for quite a while and I have to admit
that I just need some more explanations in order to interpret it.  I
can't really get why this does work

do.call( rbind, mylist)

but it does not work to do

sapply ( mylist, rbind).

Anyway, here's the self contained working example that compares the
speed of various approaches.  If you send yet more ways to do this, I
will add them on and then post the result to my Working Example
collection.

## stackMerge.R
## Paul Johnson pauljohn at ku.edu
## 2010-09-02


## rbind is neat,but how to do it to a lot of
## data frames?

## Here is a test case

df1 - data.frame(x=rnorm(100),y=rnorm(100))
df2 - data.frame(x=rnorm(100),y=rnorm(100))
df3 - data.frame(x=rnorm(100),y=rnorm(100))
df4 - data.frame(x=rnorm(100),y=rnorm(100))

mylist -  list(df1, df2, df3, df4)

## Usually we have done a stupid
## loop  to get this done

resultDF - mylist[[1]]
for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]])

## My intuition was that this should work:
## lapply( mylist, rbind )
## but no! It just makes a new list

## This obliterates the columns
## unlist( mylist )

## I got this idea from code in the
## complete function in the mice package
## It uses brute force to allocate a big matrix of 0's and
## then it places the individual data frames into that matrix.

m - 4
nr - nrow(df1)
nc - ncol(df1)
dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] -  
mylist[[j]]




## I searched a long time for an answer that looked better.
## This website is

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Gabor Grothendieck

On Sat, Sep 4, 2010 at 2:37 PM, Paul Johnson pauljoh...@gmail.com wrote:
 I've been doing some consulting with students who seem to come to R
 from SAS.  They are usually pre-occupied with do loops and it is tough
 to persuade them to trust R lists rather than keeping 100s of named
 matrices floating around.

 Often it happens that there is a list with lots of matrices or data
 frames in it and we need to stack those together.  I thought it

This has nothing specifically to do with do.call but note that
R is faster at handling matrices than data frames.  Below
we see that rbind-ing 4 data frames takes over 100 times as
long as rbind-ing matrices with the same data:

 mylist -  list(iris[-5], iris[-5], iris[-5], iris[-5])
 L - lapply(mylist, as.matrix)

 library(rbenchmark)
 benchmark(
+ df = do.call(rbind, mylist),
+ mat = do.call(rbind, L),
+ order = relative, replications = 250
+ )
  test replications elapsed relative user.self sys.self user.child sys.child
2  mat  2500.011  0.02 0.00 NANA
1   df  2501.06  106  1.03 0.01 NANA

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Dennis Murphy

Hi:

Here's my test:

l - vector('list', 1000)
for(i in seq_along(l)) l[[i]] - data.frame(x=rnorm(100),y=rnorm(100))
system.time(u1 - do.call(rbind, l))
   user  system elapsed
   0.490.060.60
resultDF - data.frame()
system.time(for (i in 1:1000) resultDF - rbind(resultDF, l[[i]]))
   user  system elapsed
  10.340.06   10.53
identical(u1, resultDF)
[1] TRUE

The problem with the second approach, which is really kind of an FAQ
by now, is that repeated application of rbind as a standalone function
results in 'Spaceballs: the search for more memory!' The base
object gets bigger as the iterations proceed, something new is being
added, so more memory is needed to hold both the old and new objects.
This is an inefficient time killer because as the loop proceeds,
increasingly
more time is invested in finding new memory.

Interestingly, this doesn't scale linearly: if we make a list of 1 100 x
2
data frames, I get the following:

 l - vector('list', 1)
 for(i in seq_along(l)) l[[i]] - data.frame(x=rnorm(100),y=rnorm(100))
 system.time(u1 - do.call(rbind, l))
   user  system elapsed
  55.56   30.62   88.11
 dim(u1)
[1] 100   2
 str(u1)
'data.frame':   100 obs. of  2 variables:
 $ x: num  -0.9516 -0.6948 0.0523 2.5798 -0.0862 ...
 $ y: num  1.466 0.165 1.375 0.571 -1.099 ...
 rm(u1)
 rm(resultDF)
 resultDF - data.frame()
# go take a shower and come back
 system.time(for (i in 1:10) resultDF - rbind(resultDF, l[[i]]))
   user  system elapsed
 977.33 121.41 1130.26
 dim(resultDF)
[1] 100   2

This time, neither do.call nor iterative rbind did very well.

One common way around this is to pre-allocate memory and then to
populate the object using a loop, but a somewhat easier solution here
turns out to be ldply() in the plyr package. The following is the same
idea as do.call(rbind, l), only faster:

 system.time(u3 - ldply(l, rbind))
   user  system elapsed
   6.070.016.09
 dim(u3)
[1] 100   2
 str(u3)
'data.frame':   100 obs. of  2 variables:
 $ x: num  -0.9516 -0.6948 0.0523 2.5798 -0.0862 ...
 $ y: num  1.466 0.165 1.375 0.571 -1.099 ...

HTH,
Dennis

On Sat, Sep 4, 2010 at 11:37 AM, Paul Johnson pauljoh...@gmail.com wrote:

 I've been doing some consulting with students who seem to come to R
 from SAS.  They are usually pre-occupied with do loops and it is tough
 to persuade them to trust R lists rather than keeping 100s of named
 matrices floating around.

 Often it happens that there is a list with lots of matrices or data
 frames in it and we need to stack those together.  I thought it
 would be a simple thing, but it turns out there are several ways to
 get it done, and in this case, the most elegant way using do.call is
 not the fastest, but it does appear to be the least prone to
 programmer error.

 I have been staring at ?do.call for quite a while and I have to admit
 that I just need some more explanations in order to interpret it.  I
 can't really get why this does work

 do.call( rbind, mylist)

 but it does not work to do

 sapply ( mylist, rbind).

 Anyway, here's the self contained working example that compares the
 speed of various approaches.  If you send yet more ways to do this, I
 will add them on and then post the result to my Working Example
 collection.

 ## stackMerge.R
 ## Paul Johnson pauljohn at ku.edu
 ## 2010-09-02


 ## rbind is neat,but how to do it to a lot of
 ## data frames?

 ## Here is a test case

 df1 - data.frame(x=rnorm(100),y=rnorm(100))
 df2 - data.frame(x=rnorm(100),y=rnorm(100))
 df3 - data.frame(x=rnorm(100),y=rnorm(100))
 df4 - data.frame(x=rnorm(100),y=rnorm(100))

 mylist -  list(df1, df2, df3, df4)

 ## Usually we have done a stupid
 ## loop  to get this done

 resultDF - mylist[[1]]
 for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]])

 ## My intuition was that this should work:
 ## lapply( mylist, rbind )
 ## but no! It just makes a new list

 ## This obliterates the columns
 ## unlist( mylist )

 ## I got this idea from code in the
 ## complete function in the mice package
 ## It uses brute force to allocate a big matrix of 0's and
 ## then it places the individual data frames into that matrix.

 m - 4
 nr - nrow(df1)
 nc - ncol(df1)
 dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]



 ## I searched a long time for an answer that looked better.
 ## This website is helpful:
 ## http://stackoverflow.com/questions/tagged/r
 ## I started to type in the question and 3 plausible answers
 ## popped up before I could finish.

 ## The terse answer is:
 shortAnswer - do.call(rbind,mylist)

 ## That's the right answer, see:

 shortAnswer == dataComplete
 ## But I don't understand why it works.

 ## More importantly, I don't know if it is fastest, or best.
 ## It is certainly less error prone than dataComplete

 ## First, make a bigger test case and use system.time to evaluate

 phony - function(i){

Re: [R] non-zero exit status error when install GenomeGraphs

2010-09-04 Thread Martin Morgan

 On 09/04/2010 01:38 PM, David Winsemius wrote:
 (Caveat: I am not a bioc user.) The error messages suggest that you
 are missing dependencies. I looked at the documentation for
 GenomeGraphs and it does not list any dependencies, but I have no way
 of knowing how careful
 packageDescription(GenomeGraphs)$Depends
[1] methods, biomaRt, grid

and likewise for biomaRt. Also at
http://bioconductor.org/packages/release/bioc/html/GenomeGraphs.html for
the current (release) version, or replacing 'release' with '2.5' (from
the original poster's biocLite invocation) for the version relevant to
R-2.10
http://bioconductor.org/packages/release/bioc/html/GenomeGraphs.html
 or knowledgeable the authors may or may not have bben when they
 composed that document. The fact that you are 
Not sure what 'that document' means, but if it's the package vignette or
reference manual then that's not the appropriate place for stating
package dependencies -- like all R packages, this information belongs in
the package DESCRIPTION file, with dependencies enforced at installation
time. Also the Bioconductor build system only makes available packages
that do not produce errors on R CMD build and R CMD check, i.e.,
packages that have fully specified their dependencies. And the build
system is versioned, so the original poster is getting (Bioconductor)
packages that are appropriate for their system (though CRAN packages
come from CRAN and so are not versioned in sync with R).
 posting to the wrong mailing list and are not including what version
 of linux (although there is a hint it may be RedHat5) you are running
 suggests you could be fairly new at this.

 Is biocLite the correct function for installing a bioc package? It
 appears it may be, but I'm wondering if there is an 
Yes it is. It does install dependencies, to the same extent that
install.packages() does; biocLite is a wrapper around install.packages
that inserts the appropriate (for the R version) Bioconductor
repositories in front of CRAN repositories.

Likely the original poster has an installed version of the XML package,
but it is not installed correctly or the installation has become
compromised in some way, e.g., by removing or updating libxml in the
operating system. Your advice -- install (or otherwise troubleshoot) XML
-- is likely part of the right solution, but since XML comes from the R
repository and is therefore only known to build with the current version
of R, it makes sense for the original poster to update their version of
R first.

Martin
 argument for dependencies as there is in install.packages() that you
 need to set to TRUE? If biocLite has a ,... in its argument list (and
 the error message suggests that it does) then you may get better
 results with the same call with an addition of dependencies=TRUE.

 Or you could first install the packages that are reported missing: XML
 and biomaRt, and then try again as you did before.

 Links to the bioc mailing lists can be found here:

 http://www.bioconductor.org/help/index.html


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)

2010-09-04 Thread stats

Hi I know asking which test to use is frowned upon on this list... so 
please do read on for at least a couple on sentences...


I have some multivariate data slit as follows

Tumour Site (one of 5 categories) #
Chemo Schedule (one of 3 cats) ##
Cycle (one of 3 cats*) ##
Dose (one of 3 cats*) #

*These are actually integers but for all our other analysis so far we 
have grouped them into logical bands of categories.


The dependant variable is Reaction or No Reaction

I have individually analysed each of the independant variables against 
Reaction/No Reaction using ChiSq and Fisher Tests. Those marked ## 
produced p values less than 0.05, and those marked # produce p values 
close to 0.05.


We believe that Cycle is the crucial piece of data - the others just 
appear to be different because there are more early cycles in certain 
groups than others.


SO - I believe what I need to do is a Linear Logistic Regression on the 
4 independant variables. And I'm expecting it to show that the tumour 
site, schedule and dose don't matter, only the cycle matters. Done a lot 
of reading and I'm clueless!!


I think I want to do something like:

glm (reaction ~ site + sched + cycle + dose, data=mydata, family=poisson)


I am then expecting to see some very long output with lots of numbers... 
...my question is TWO fold -


1. is glm the right thing to use before I waste my time

and 2. how do I interpret the result! (I'm kind of expect a lecture here 
as I'm really looking for a nice snappy 'p0.05 means this variable is 
the one having the influence' type answer and I suspect I'm going to be 
told thats not possible...!


To be clear the example given in the docs is:


 library(MASS)



 data(anorexia)



 anorex.1- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data 
= anorexia)



The output of anorex.1 is:

Call:  glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, 
 data = anorexia)

Coefficients:

(Intercept)PrewtTreatCont  TreatFT

49.7711  -0.5655  -4.0971   4.5631

Degrees of Freedom: 71 Total (i.e. Null);  68 Residual

Null Deviance:4525

Residual Deviance: 3311 AIC: 490



and the output of summary(anorex.1) is:

Call:

glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian,

data = anorexia)

Deviance Residuals:

 Min1QMedian3Q   Max

-14.1083   -4.2773   -0.54845.4838   15.2922

Coefficients:

Estimate Std. Error t value Pr(|t|)

(Intercept)  49.771113.3910   3.717 0.000410 ***

Prewt-0.5655 0.1612  -3.509 0.000803 ***

TreatCont-4.0971 1.8935  -2.164 0.033999 *

TreatFT   4.5631 2.1333   2.139 0.036035 *

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 48.69504)

Null deviance: 4525.4  on 71  degrees of freedom

Residual deviance: 3311.3  on 68  degrees of freedom

AIC: 489.97

Number of Fisher Scoring iterations: 2



---
Either can someone point me to a decent place that would explain what 
the means or provide me some pointers? i.e. which of the variables has 
the influence on the outcome in the anorexia data?


Please don't shout!! happy to be pointed to a reference but would prefer 
one in common english not some stats mumbo jumbo!


Calum

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Query regarding Windows based statistical software development using R as programming language

2010-09-04 Thread Greg Snow

I am not sure how best to answer your question since the phrases 
user-friendly and like SPSS do not belong in the same sentence in my mind 
(unless separated by a word along the lines of unlike).  And Windows Based 
Programming Language feels a bit like an oxymoron.

But since the R Commander package exists (there are other tools as well, JGR, 
R-PLUS, others) and provides a menu/dialog interface to R, and since the Rexcel 
project integrates this into MS Excel so that the user can use the power of R 
without ever leaving Excel and realizing that they are using a superior tool, I 
expect the answer is Yes.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Soumen Pal
 Sent: Friday, September 03, 2010 10:21 PM
 To: r-help@r-project.org
 Subject: [R] Query regarding Windows based statistical software
 development using R as programming language
 
 Hi,
 
 I am a beginner in R. I have a query as below:
 
 Is it possible to develop a Windows based statistical software
 (user-friendly) like SPSS using R as a programming language?
 
 Otherwise, is it possible to use R code directly (no command-line
 execution) in Windows based programming language such as Visual Basic?
 Please help me, if possible, with some link to study materials related
 to
 such topic.
 
 --
 Thanks  Regards,
 
 Soumen Pal
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Levels in returned data.frame after subset

2010-09-04 Thread Greg Snow

The advantage of computers is that they do exactly what they are told.
The disadvantage of computers is that they do exactly what they are told.

R is a set of instructions to the computer, those instructions are a 
combinations from the original programmers and from you.  Who should make 
important decisions about the structure of your data?  A group of (admittedly 
brilliant) programmers who have never seen your data nor know what questions 
you are trying to answer, or you (who hopefully knows more about your data and 
questions)?

I don't claim to be more intelligent/knowledgable than the programmers of R, 
but I am grateful that they have/had sufficient humility to allow for the 
possibility that I may actually know something about my data and questions that 
they don't (or maybe they are just to lazy to do my job for me, but that is 
also appropriate).

In your example below, why do you care what the levels of gender are after the 
subset?  Why waste time/effort dropping the levels for a column that by 
definition only has one value?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ulrik Stervbo
 Sent: Saturday, September 04, 2010 6:53 AM
 To: r-help@r-project.org
 Subject: [R] Levels in returned data.frame after subset
 
 Dear List,
 
 When I subset a data.frame, the levels are not re-adjusted (see
 example). Why is this? Am I missing out on some basic stuff here?
 
 Thanks
 Ulrik
 
 
  m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt
 = c(91,99, 74))
  dim(m)
 [1] 3 3
 
  levels(m$gender)
 [1] F M
 
  s - subset(m, m$gender == M)
  dim(s)
 [1] 2 3
 
  levels(s$gender)
 [1] F M
 
  cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor)
  dim(s)
 [1] 2 3
 
  levels(s$gender)
 [1] M
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)

2010-09-04 Thread David Winsemius

On Sep 4, 2010, at 6:53 PM, st...@wittongilbert.free-online.co.uk wrote:

Hi I know asking which test to use is frowned upon on this list...
so please do read on for at least a couple on sentences...

I have some multivariate data slit as follows

Tumour Site (one of 5 categories) #
Chemo Schedule (one of 3 cats) ##
Cycle (one of 3 cats*) ##
Dose (one of 3 cats*) #

*These are actually integers but for all our other analysis so far
we have grouped them into logical bands of categories.

The dependant variable is Reaction or No Reaction

I have individually analysed each of the independant variables
against Reaction/No Reaction using ChiSq and Fisher Tests. Those
marked ## produced p values less than 0.05, and those marked #
produce p values close to 0.05.

We believe that Cycle is the crucial piece of data - the others just
appear to be different because there are more early cycles in
certain groups than others.

SO - I believe what I need to do is a Linear Logistic Regression on
the 4 independant variables. And I'm expecting it to show that the
tumour site, schedule and dose don't matter, only the cycle matters.
Done a lot of reading and I'm clueless!!

I think I want to do something like:

glm (reaction ~ site + sched + cycle + dose, data=mydata,
family=poisson)

I am then expecting to see some very long output with lots of
numbers... ...my question is TWO fold -

1. is glm the right thing to use before I waste my time

Yes, but if your outcome variable is binomial then the family argument
should be binomial. (And if you thought it should be poisson,
then why below did you use gaussian???

and 2. how do I interpret the result!

Result? What result? I do see any description of your data, nor any
code.

(I'm kind of expect a lecture here as I'm really looking for a nice
snappy 'p0.05 means this variable is the one having the influence'
type answer and I suspect I'm going to be told thats not possible...!

I think you need to consult a statistician or someone who has taken
the time to read that statistical mumbo jumbo you don't want to
learn. This mailing list is not set up to be a tutorial site.

(Re your request below: Some years ago I saw one of those programmed
learning texts by Kleinbaum on logistic regression. Maybe you could
read it and see if it makes your consulting sessions go more smoothly.)

http://www.bookfinder.com/search/?author=kleinbaumtitle=logistic+regressionlang=enisbn=submit=Begin+searchnew_used=*destination=uscurrency=USDmode=basicst=srac=qr

I have a couple of Kleinbaum's (et al) other texts and find them to be
well written and reasoned, so I suspect the citation above would be as
accessible as any.

To be clear the example given in the docs is:

library(MASS)

snipped an example that was not relevant to logistic regression

---
Either can someone point me to a decent place that would explain
what the means or provide me some pointers? i.e. which of the
variables has the influence on the outcome in the anorexia data?

Please don't shout!! happy to be pointed to a reference but would
prefer one in common english not some stats mumbo jumbo!

Calum

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to use prediction

2010-09-04 Thread James


Hi, I have a question regarding the usage of prediction in R:

I have an input data set X, and an output data set Y, I can build up the
correlation between them using kcca() of kernlab, but after I have that
correlation, how can I predict the output Y1 of a new input X1?

I read about the gausspr() but I don't know how to bring the result of
kcca() to use as parameters for gausspr().

Any replies is appreciated!

Thanks a lot,

James.



-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-use-prediction-tp2527030p2527030.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How can I fixe convergence=1 in optim

2010-09-04 Thread Sally Luo

Hi R users,

I am using the optim funciton to maximize a log likelihood function.  My
code is as follows:

p-optim(c(-0.2392925,0.4653128,-0.8332286, 0.0657, -0.0031, -0.00245,
3.366, 0.5885, -0.8,
   0.0786,-0.00292,-0.00081, 3.266, -0.3632, -0.49,0.1856,
0.00394, -0.00193, -0.889, 0.5379, -0.63,
   0.213, 0.00338, -0.00026, -0.8912, -0.3023, -0.56), f, method
=BFGS, hessian =TRUE, y=y,X=X,W=W)
After I ran the code, I got the following results:

 p
$par
 [1]  2.235834e-02  1.282826e-01 -3.786014e-01  7.422526e-02  3.037931e-02
-2.570156e-03  3.365872e+00  2.618893e-01 -1.987859e-06
[10]  7.970083e-02  2.878574e-03 -1.391019e-03  3.265966e+00 -4.153697e-01
-3.185684e-03  1.833200e-01 -7.247683e-03 -3.156813e-03
[19] -8.889219e-01  6.208612e-01  2.678643e-04  2.183787e-01  2.715062e-02
2.943905e-04 -8.913260e-01 -5.100482e-01 -3.477559e-04

$value
[1] -932.1423

$counts
function gradient
1439  100

$convergence
[1] 1
$message
NULL

$hessian  ( I omitted the approximation results for the hessian here to save
space)
~~

The error code 1 for convergence shown above means that the iteration limit
maxit had been reached.  How can I fix this problem and achieve convergence
for my optimization problem?  Can I increase the number of maxit so that
convergence might occur?

Thanks for your help.  If more information is needed, please let me know.

Maomao

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Function Gini or Ineq

2010-09-04 Thread Karen Kotschy

Hi Marcio

You might like to look at some equivalents from the field of ecology, for 
which there are existing functions. Have a look at the function 
diversity in the package vegan. This provides the Simpson diversity 
index, which is the complement of the Gini coefficient (Gini = 1 - 
Simpson). See attached paper by Stirling (2007).

I'm not sure what you want to do with your weightings, but you could have 
a look at Rao's quadratic entropy index: this is a weighted diversity 
index (in ecology usually weighted by the abundance of the species, which 
are the objects for which diversity is measured). You can get this from 
the function divc in the package ade4. There are also some other 
weighted diversity indices in the package FD (functional diversity).

HTH
Karen


On Fri 03Sep10, Mestat wrote:
 
 Hi listers,
 Does it necessary to install any package in order to use the GINI or INEQ
 functions.
 If I use the following command the R tells me that didn't find the GINI
 function.
 
 x-c(541, 1463, 2445, 3438, 4437, 5401, 6392, 8304, 11904, 22261)
 G-gini(x)
 
 Thanks in advance,
 Marcio
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/Function-Gini-or-Ineq-tp2525852p2525852.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 -- 
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.
 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Query regarding Windows based statistical software development using R as programming language

2010-09-04 Thread C.H.

Have a look at deducer.

http://www.deducer.org/manual.html



On Sat, Sep 4, 2010 at 12:20 PM, Soumen Pal soumen.4...@gmail.com wrote:
 Hi,

 I am a beginner in R. I have a query as below:

 Is it possible to develop a Windows based statistical software
 (user-friendly) like SPSS using R as a programming language?

 Otherwise, is it possible to use R code directly (no command-line
 execution) in Windows based programming language such as Visual Basic?
 Please help me, if possible, with some link to study materials related to
 such topic.

 --
 Thanks  Regards,

 Soumen Pal

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
CH Chan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I fixe convergence=1 in optim

2010-09-04 Thread David Winsemius



On Sep 4, 2010, at 4:18 PM, Sally Luo wrote:


Hi R users,

I am using the optim funciton to maximize a log likelihood  
function.  My

code is as follows:

p-optim(c(-0.2392925,0.4653128,-0.8332286, 0.0657, -0.0031,  
-0.00245,

3.366, 0.5885, -0.8,
  0.0786,-0.00292,-0.00081, 3.266, -0.3632, -0.49, 
0.1856,

0.00394, -0.00193, -0.889, 0.5379, -0.63,
  0.213, 0.00338, -0.00026, -0.8912, -0.3023, -0.56), f,  
method

=BFGS, hessian =TRUE, y=y,X=X,W=W)
After I ran the code, I got the following results:
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~~

p

$par
[1]  2.235834e-02  1.282826e-01 -3.786014e-01  7.422526e-02   
3.037931e-02

-2.570156e-03  3.365872e+00  2.618893e-01 -1.987859e-06
[10]  7.970083e-02  2.878574e-03 -1.391019e-03  3.265966e+00  
-4.153697e-01

-3.185684e-03  1.833200e-01 -7.247683e-03 -3.156813e-03
[19] -8.889219e-01  6.208612e-01  2.678643e-04  2.183787e-01   
2.715062e-02

2.943905e-04 -8.913260e-01 -5.100482e-01 -3.477559e-04

$value
[1] -932.1423

$counts
function gradient
   1439  100

$convergence
[1] 1
$message
NULL

$hessian  ( I omitted the approximation results for the hessian here  
to save

space)
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~ 
~~


The error code 1 for convergence shown above means that the  
iteration limit
maxit had been reached.  How can I fix this problem and achieve  
convergence
for my optimization problem?  Can I increase the number of maxit so  
that

convergence might occur?


I am wondering how you expect us to guess at the answer? You are the  
one who know what f is and you are the one who has the option of  
increasing maxit. If the question is how to increase maxit, then the  
answer is perhaps as easy as:


?optim

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to free memory? (gc() doesn't work for me)

2010-09-04 Thread Hyunchul Kim

Hi, all

Thank you for your comments.
I think that I misunderstood what gc() does because gc() is working as you
posted.

I posted my question because gc() doesn't reduce memory in use in a few
system memory monitoring tools that I tested.

Regards,

Hyunchul

On Sat, Sep 4, 2010 at 8:50 PM, jim holtman jholt...@gmail.com wrote:

 Seems to work for me:

  x - matrix(0,1,1)
  object.size(x)
 80112 bytes
  gc()
used  (Mb) gc trigger  (Mb)  max used  (Mb)
 Ncells174104   4.7 741108  19.8741108  19.8
 Vcells 101761938 776.4  113632405 867.0 102762450 784.1
  rm(x)
  gc()
  used (Mb) gc trigger  (Mb)  max used  (Mb)
 Ncells  174202  4.7 741108  19.8741108  19.8
 Vcells 1761954 13.5   90905923 693.6 102762450 784.1


 On Sat, Sep 4, 2010 at 12:46 AM, Hyunchul Kim
 hyunchul.kim@gmail.com wrote:
  Hi, all
 
  I have a huge object that use almost all of available memory.
 
  R rm(a_huge_object)
  R gc()
 
  doesn't free memory and ?gc doesn't show anything.
 
  Are there any suggestion?
 
  Thanks in advance,
 
  Regards,
 
  Hyunchul
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Hadley Wickham

 One common way around this is to pre-allocate memory and then to
 populate the object using a loop, but a somewhat easier solution here
 turns out to be ldply() in the plyr package. The following is the same
 idea as do.call(rbind, l), only faster:

 system.time(u3 - ldply(l, rbind))
   user  system elapsed
   6.07    0.01    6.09

I think all you want here is rbind.fill:

 system.time(a - rbind.fill(l))
   user  system elapsed
  1.426   0.044   1.471

 system.time(b - do.call(rbind, l))
   user  system elapsed
 98  60 162

 all.equal(a, b)
[1] TRUE

This is considerably faster than do.call + rbind because I spend a lot
of time working out how to do this most efficiently. You can see the
underlying code at http://github.com/hadley/plyr/blob/master/R/rbind.r
- it's relatively straightforward except for ensuring the output
columns are the same type as the input columns.  This is a good
example where optimised R code is much faster than C code.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I fixe convergence=1 in optim

2010-09-04 Thread Peng, C


To change the default maximum number of iterations (mxit =100 for derivative
based algorithm), add mxit = whatever number you want.

In most cases, you need a very good initial value! This is a real challenge
in using optim(). Quite often, if the initial values is not well selected,
optim() can give you nonsense estimates even the algorithm converges after
number of iterations.  

-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-can-I-fixe-convergence-1-in-optim-tp2527034p2527087.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] simple ts() object question

2010-09-04 Thread StatWM


Dear Community,

say, I have an annual ts() object sampled from 1960 to 1969 like:

ta-ts(1:10, start=1960, frequency=1)

How can I extract the value from the year 1965?

I mean, not by:

ta[6]

but by something like:
 
ta[1965]

where I'm directly referring to the year of the observation?

Thank you in advance!

-- 
View this message in context: 
http://r.789695.n4.nabble.com/simple-ts-object-question-tp2527085p2527085.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple ts() object question

2010-09-04 Thread Gabor Grothendieck

On Sun, Sep 5, 2010 at 12:08 AM, StatWM wmus...@gmx.de wrote:

 Dear Community,

 say, I have an annual ts() object sampled from 1960 to 1969 like:

 ta-ts(1:10, start=1960, frequency=1)

 How can I extract the value from the year 1965?

 I mean, not by:

 ta[6]

 but by something like:

 ta[1965]

 where I'm directly referring to the year of the observation?

 Thank you in advance!


Use window.ts

 ta - ts(1:10, start = 1960)
 window(ta, start = 1965, end = 1965)
Time Series:
Start = 1965
End = 1965
Frequency = 1
[1] 6


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple ts() object question

2010-09-04 Thread Joshua Wiley

Hi,

There is probably an easier way, but this will work:

 ta[time(ta)==1965]

With your data, I get:

 ta[time(ta)==1965]
[1] 6

HTH,

Josh

On Sat, Sep 4, 2010 at 9:08 PM, StatWM wmus...@gmx.de wrote:

 Dear Community,

 say, I have an annual ts() object sampled from 1960 to 1969 like:

 ta-ts(1:10, start=1960, frequency=1)

 How can I extract the value from the year 1965?

 I mean, not by:

 ta[6]

 but by something like:

 ta[1965]

 where I'm directly referring to the year of the observation?

 Thank you in advance!

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/simple-ts-object-question-tp2527085p2527085.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I fixe convergence=1 in optim

2010-09-04 Thread Ben Bolker

Peng, C cpeng.usm at gmail.com writes:

 
 
 To change the default maximum number of iterations (mxit =100 for derivative
 based algorithm), add mxit = whatever number you want.
 

  that's maxit

i.e.

optim(...,control=list(maxit=...))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

52 matches

Mail list logo