from:"Mike Marchywka"

Re: [R] Completely Off Topic:Link to IOM report on use of -omics tests in clinical trials

2012-03-27 Thread Mike Marchywka






Thanks, I had totally missed this controversy but from quick read of summary 
the impact on open source analysis was unclear.Can you explain the punchline? I 
think many users of R have concluded the biggest problem in most analyses 
isfirst getting the data and then verfiying any results you derive, both issues 
that sound related to your post.
( The jumble below is illustrative of what hotmail has been doing with plain 
text, getting plain data withoutall the formatting junk is a recurring problem 
LOL).






#62; Date#58; Mon, 26 Mar 2012 22#58;38#58;56 #43;0100#13;#10;#62; 
From#58; iaingallagher#64;btopenworld.com#13;#10;#62; To#58; 
gunter.berton#64;gene.com#59; r-help#64;r-project.org#13;#10;#62; 
Subject#58; Re#58; #91;R#93; Completely Off Topic#58;Link to IOM report on 
use of #34;-omics#34; tests in clinical trials#13;#10;#62;#13;#10;#62; 
I followed this case while it was 
ongoing.#13;#10;#62;#13;#10;#62;#13;#10;#62; It was a very interesting 
example of basic mistakes but also #40;for me#41; of journal 
politicking.#13;#10;#62;#13;#10;#62;#13;#10;#62; Keith Baggerly and 
Kevin Coombes wrote a great paper - #34;DERIVING CHEMOSENSITIVITY FROM CELL 
LINES#58; FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT 
BIOLOGY#34; in The Annals of Applied Statistics #40;2009, Vol. 3, No. 4, 
1309#8211;1334#41; which explains some of the background and investigative 
work they had to do to bring those mistakes to light.!
 #13;#10;#62;#13;#10;#62;#13;#10;#62; 
Best#13;#10;#62;#13;#10;#62; 
iain#13;#10;#62;#13;#10;#62;#13;#10;#62;#13;#10;#62; - Original 
Message -#13;#10;#62; From#58; Bert Gunter 
#60;gunter.berton#64;gene.com#62;#13;#10;#62; To#58; 
r-help#64;r-project.org#13;#10;#62; Cc#58;#13;#10;#62; Sent#58; 
Monday, 26 March 2012, 19#58;12#13;#10;#62; Subject#58; #91;R#93; 
Completely Off Topic#58;Link to IOM report on use of #34;-omics#34; tests in 
clinical trials#13;#10;#62;#13;#10;#62; Warning#58; This has little 
directly to do with R, although R and related#13;#10;#62; tools #40;e.g. 
sweave and other reproducible research tools#41; have a#13;#10;#62; natural 
role to play.#13;#10;#62;#13;#10;#62; The IOM 
report#58;#13;#10;#62;#13;#10;#62; 
http#58;//www.iom.edu/Reports/2012/Evolution-of-Translational-Omics.aspx#13;#10;#62;#13;#10;#62;
 that arose out of the Duke Univ. genomics testing scandal ha!
 s been#13;#10;#62; released. My thanks to Keith Baggerly for forwar
ding this. I believe#13;#10;#62; that many R users in the medical research 
community will find this#13;#10;#62; interesting, and I hope I do not 
venture too far out of line by#13;#10;#62; passing on the link to readers of 
this list. It #42;#42;will#42;#42; have an#13;#10;#62; important impact 
on so-called Personalized Health Care #40;which I guess#13;#10;#62; affects 
all of us#41;, and open source analytical #40;statistical#41;#13;#10;#62; 
methodology is a central issue.#13;#10;#62;#13;#10;#62; For those 
interested, try the summary first.#13;#10;#62;#13;#10;#62; Best to 
all,#13;#10;#62; Bert#13;#10;#62;#13;#10;#62;#13;#10;#62; 
--#13;#10;#62;#13;#10;#62; Bert Gunter#13;#10;#62; Genentech 
Nonclinical Biostatistics#13;#10;#62;#13;#10;#62; Internal Contact 
Info#58;#13;#10;#62; Phone#58; 467-7374#13;#10;#62; 
Website#58;#13;#10;#62; 
http#58;//pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pd!
 b-biostatistics/pdb-ncb-home.htm#13;#10;#62;#13;#10;#62; 
__#13;#10;#62; 
R-help#64;r-project.org mailing list#13;#10;#62; 
https#58;//stat.ethz.ch/mailman/listinfo/r-help#13;#10;#62; PLEASE do read 
the posting guide 
http#58;//www.R-project.org/posting-guide.html#13;#10;#62; and provide 
commented, minimal, self-contained, reproducible 
code.#13;#10;#62;#13;#10;#62;#13;#10;#62; 
__#13;#10;#62; 
R-help#64;r-project.org mailing list#13;#10;#62; 
https#58;//stat.ethz.ch/mailman/listinfo/r-help#13;#10;#62; PLEASE do read 
the posting guide 
http#58;//www.R-project.org/posting-guide.html#13;#10;#62; and provide 
commented, minimal, self-contained, reproducible code.#13;#10;

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Future of R | API to Public Databases

2012-01-14 Thread Mike Marchywka








LOL, I remember posting about this in the past. The US gov agencies vary but 
mostare quite good. The big problem appears to be people who push proprietary 
orcommercial standards for which only one effective source exists. Some 
formats,like Excel and PDF come to mind and there is a disturbing trend towards 
theiradoption in some places where raw data is needed by many. The best thing 
to do is contact the informationprovider and let them know you want raw data, 
not images or stuff that worksin limited commercial software packages. Often 
data sources are valuable andthe revenue model impacts availability. 

If you are just arguing over different open formats,  it is usually easy for 
someone towrite some conversion code and publish it- CSV to JSON would not be a 
problem for example. Data of course are quite variable and there is 
nothingwrong with giving provider his choice. 


 Date: Sat, 14 Jan 2012 10:21:23 -0500
 From: ja...@rampaginggeek.com
 To: r-help@r-project.org
 Subject: Re: [R] The Future of R | API to Public Databases

 Web services are only part of the problem. In essence, there are at
 least two facets:
 1. downloading the data using some protocol
 2. mapping the data to a common model

 Having #1 makes the import/download easier, but it really becomes useful
 when both are included. I think #2 is the harder problem to address.
 Software can usually be written to handle #1 by making a useful
 abstraction layer. #2 means that data has consistent names and meanings,
 and this requires people to agree on common definitions and a common
 naming convention.

 RDF (Resource Description Framework) and its related technologies
 (SPARQL, OWL, etc) are one of the many attempts to try to address this.
 While this effort would benefit R, I think it's best if it's part of a
 larger effort.

 Services such as DBpedia and Freebase are trying to unify many data sets
 using RDF.

 The task view and package ideas a great ideas. I'm just adding another
 perspective.

 Jason

 On 01/13/2012 05:18 PM, Roy Mendelssohn wrote:
  HI Benjamin:
 
  What would make this easier is if these sites used standardized web 
  services, so it would only require writing once. data.gov is the worst 
  example, they spun the own, weak service.
 
  There is a lot of environmental data available through OPenDAP, and that is 
  supported in the ncdf4 package. My own group has a service called ERDDAP 
  that is entirely RESTFul, see:
 
  http://coastwatch.pfel.noaa.gov/erddap
 
  and
 
  http://upwell.pfeg.noaa.gov/erddap
 
  We provide R (and matlab) scripts that automate the extract for certain 
  cases, see:
 
  http://coastwatch.pfeg.noaa.gov/xtracto/
 
  We also have a tool called the Environmental Data Connector (EDC) that 
  provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you 
  to subset data that is served by OPeNDAP, ERDDAP, certain Sensor 
  Observation Service (SOS) servers, and have it read directly into R. It is 
  freely available at:
 
  http://www.pfeg.noaa.gov/products/EDC/
 
  We can write such tools because the service is either standardized 
  (OPeNDAP, SOS) or is easy to implement (ERDDAP).
 
  -Roy
 
 
  On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote:
 
  Dear R Users -
 
  R is a wonderful software package. CRAN provides a variety of tools to
  work on your data. But R is not apt to utilize all the public
  databases in an efficient manner.
  I observed the most tedious part with R is searching and downloading
  the data from public databases and putting it into the right format. I
  could not find a package on CRAN which offers exactly this fundamental
  capability.
  Imagine R is the unified interface to access (and analyze) all public
  data in the easiest way possible. That would create a real impact,
  would put R a big leap forward and would enable us to see the world
  with different eyes.
 
  There is a lack of a direct connection to the API of these databases,
  to name a few:
 
  - Eurostat
  - OECD
  - IMF
  - Worldbank
  - UN
  - FAO
  - data.gov
  - ...
 
  The ease of access to the data is the key of information processing with R.
 
  How can we handle the flow of information noise? R has to give an
  answer to that with an extensive API to public databases.
 
  I would love your comments and ideas as a contribution in a vital 
  discussion.
 
  Benjamin
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  **
  The contents of this message do not reflect any position of the U.S. 
  Government or NOAA.
  **
  Roy Mendelssohn
  Supervisory Operations Research Analyst
  NOAA/NMFS
  Environmental Research Division
  Southwest Fisheries Science Center

Re: [R] HELP!! - PHP calling R to execute a r-code file (*.r)

2011-12-31 Thread Mike Marchywka













 Date: Fri, 30 Dec 2011 16:04:08 -0600
 From: xiuquan.w...@gmail.com
 To: r-help@r-project.org
 Subject: [R] HELP!! - PHP calling R to execute a r-code file (*.r)

 Hi,

 I have met a tough problem when using PHP to call R to generate some plots.
 I tested it okay in my personal computer with WinXP. But when I was trying
 to update to my server (Win2003 server), I found it did not work. Below is
 the details:

I've run into lots of problems like this. Generally first check the php error 
log file, I have noidea where it is on your machine, and see if you can get 
your script to dump output somewhere,possibly with absolute path so you know 
where to look for it LOL. Often the change in user creates unexpected problems 
with file permissions and libraries and paths. You need to checkthe specific 
direcories for permissions not just top level. 
I would also point out that there is Rapache available as well as 
Rserver. Curious if people are using R with any other unique situations server 
side. We have a java webserver which I use to invoke R via bash scripts and 
generate rathercomplicated files. These could take very long to generate but if 
you have flexible caching system,it can be easy to re use output files or even 
generate them ahead of time. Starting R or any otherprocess is not 
instantaneous and often image generation is quite time consuming. Thereare a 
lot of issues making it work well in a server setting in real time. Scale 
up has also been an issue. Apache threading or process model is quite expensive 
if you careabout performance. We were able to use netty front end and so far 
that has worked very well.PHP AFAIK is not thread safe however. 


 1 r-code file (E:/mycode.r):
 --
 jpeg(E:/mytest.jpg)
 plot(1:10)
 dev.off()
 --

 2 php code:
 ---
 exec(R CMD BATCH --vanilla --slave --no-timing E:/mycode.r  exit);
 ---
 3 results:
 for WinXP: the image can be generated successfully.
 for Server(win2003): can not be generated.

 BTW, I have added a user everyone with full control permission to the E
 disk.

[[elided Hotmail spam]]

 Thanks.


 All the best,
 Xiuquan Wang

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert CSV file to FASTA

2011-08-31 Thread Mike Marchywka



















 Date: Wed, 31 Aug 2011 01:36:51 -0700
 From: oliviacree...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Convert CSV file to FASTA

 Hi there,

 I have large excel files which I can save as CSV files.

 Each excel file contains two columns. One contains the chromosome number and
 the second contains a DNA sequence.

 I need to convert this into a fasta file that looks like this
 chromosomenumber
 CGTCGAGCGTCGAGCGGAGCG

 Can anyone show me an R script to do this?


If you can post a few lines of your csv someone can probably give you a bach
script to do it. It may be possible in R but sed/awk probbly work better. IIRC, 
fasta
is just a name line followed by sequence. If your csv looks like name, 
XX
it may be possible to change comma to space and use awk with something like 
print $1\n$2 
etc.



 Many thanks

 x

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Convert-CSV-file-to-FASTA-tp3780498p3780498.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is R the right choice for simulating first passage times of random walks?

2011-07-27 Thread Mike Marchywka





Top posting cuz hotmail decided not to highlight...

Personally I would tend to use java or c++ for the inner loops
but you could of course later make an R package out of that.
This is especially true if your code will be used elsewhere
in a performance critical system. For example, I wrote some
c++ code for dealing with graphs nothing fancy but it
let me play with some data structure ideas and I could
then build it into standalone programs or perhaps an R
package. 

Many of these things slow down due to memory incoherence
or IO long before you use up the processor. With c++
in principle anyway you have a lot of control over these things.

Once you have your results and want to anlyze them, that's
when I would use R. Dumping simulation samples to a text file
is easy to also lets you use other things like sed/grep or
vi to explore as needed. 








From: paulepan...@users.sourceforge.net
To: r-help@r-project.org
Date: Thu, 28 Jul 2011 02:00:13 +0200
Subject: Re: [R] Is R the right choice for simulating first passage times of 
random walks?


Dear R folks,


Am Donnerstag, den 28.07.2011, 01:36 +0200 schrieb Paul Menzel:

 I need to simulate first passage times for iterated partial sums. The
 related papers are for example [1][2].

 As a start I want to simulate how long a simple random walk stays
 negative, which should result that it behaves like n^(-½). My code looks
 like this.

  8  code  8 
 n = 10 # number of simulations

 length = 10 # length of iterated sum
 z = rep(0, times=length + 1)

 for (i in 1:n) {
   x = c(0, sign(rnorm(length)))
   s = cumsum(x)
   for (i in 1:length) {
   if (s[i]  0  s[i + 1] = 0) {
   z[i] = z[i] + 1
   }
   }
 }
 plot(1:length(z), z/n)
 curve(x**(-0.5), add = TRUE)
  8  code  8 

Of course the program above is not complete, because it only checks for
the first passage from negativ to positive. `if (s[2]  0) {}` should be
added before the for loop.

 This code already runs for over half an hour on my system¹.

 Reading about the for loop [3] it says to try to avoid loops and I
 probably should use a matrix where every row is a sample.

 Now my first problem is that there is no matrix equivalent for
 `cumsum()`. Can I use matrices to avoid the for loop?

I mean the inner for loop.

Additionally I wonder if `cumsum` is really faster or if I should sum
the elements by myself and check after every step if the walk gets
non-negative/0. With a length of 100 this should save some cycles.
On the other hand adding numbers should be really fast and adding checks
in between could potentially be slower.

 My second question is, is R the right choice for such simulations? It
 would be great when R can also give me a confidence interval(?) and also
 try to fit a curve through the result and give me the rule of
 correspondence(?) [4]. Do you have pointers for those?

 I glanced at simFrame [5] and read `?simulate` but could not understand
 it right away and think that this might be overkill.

 Do you have any suggestions?


Thanks,

Paul


 ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz


 [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
 [2] http://arxiv.org/abs/0911.5456
 [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
 [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
 [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there an R program that produces optimal solution/mix of multiple samples' varying volumes and values

2011-07-26 Thread Mike Marchywka

 Date: Mon, 25 Jul 2011 11:39:22 -0700
 From: lukescore...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Is there an R program that produces optimal solution/mix of 
 multiple samples' varying volumes and values

 Sorry about the lengthy subject line.
 Anyone know of an R' program that can look at several sources' varying
 available volumes/amounts and their individual set of values, compared
 to a target range/curve for these values, to find the optimal
 mixture(s) of these possible sources for the desired curve, and for a
 specified amount? I hope that makes sense as a reader.

Well, whatever you are talking about you need to write some error as a function
of your fit parameters and decide if you can minimize it analytically or not.
If you can do it analytically, then there are probably matrix functions you 
want.
If not, there are several optimizers that should come up on google.

If you are trying to express some data in terms of nice basis set that
may be different than unknown collection of things. For example, if you
have sin/cos curves fft may work, polynomial something else etc. Often
when people ask questions like this, they try to fit to a collection of things
they think may work and then the optimizer gets stuck since it can't
optimize a and b for a*x + b*x etc so make sure your error function has a 
specific
best fit etc. Generally coding mistakes can be addressed on the list if you 
get that far.

 Thanks for your time.
 Luke

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Life Cycle Assessment with R.

2011-07-26 Thread Mike Marchywka

Date: Mon, 25 Jul 2011 19:03:08 +0100
From: jbustosm...@yahoo.es
To: r-help@r-project.org
Subject: [R] Life Cycle Assessment with R.

Hello everyone,
There's something really important about climate change and how many 
institutions around the globe are looking for softwares solutions in order to 
achieve they (and everyone) needs to improve life conditions in all the planet.
Currently, they're many comercial softwares working with this important topic 
named as: Life Cycle Assesment, monitoring carbon emition,  but as many of 
you may know, commercial softwares controlling or managing our planet could be 
another big mistake.
To sum up briefly, it could be a good idea creating a R package doing Life 
Cycle Assessment (if has not being created) in order to gain a better 
understanding and making these important decisions about global warming and how 
can we as humanity control how Carbon Footprint is measured by commercial or 
not commercial propouses.
Who knows if there's people working in Life Cycle Assesment (carbon emition) 
with R? or If there's someone interested in doing a package about it, please 
let me know!
I explain this here, because of  the R philosophy.Best wishes!José 
BustosChilean Biostatistician
 www.aespro.cl

[ my text below]
Well, generally R packages are more general purpose tools than specific 
applications such as this- although there may be an iphone save the world app 
LOL.  
I have no idea but usually the issue here is getting the required 
data in an open form. Many govt agencies think excel is open and that you 
would not want
to do an analysis that wasn't supported in such popular software. This comes up 
with financial data all the time, I even asked for account information
in csv and I got a reply, thank you for asking about exporting to excel and a 
detailed explanation that was completely irrelevant. Modelling of complicated
systems is often well complicated and predictions about the future involve 
assumptions that are often made to suit the needs of the immediate
analyst. On topics which involve money or emotion, getting unbiased analysis is 
impossible and getting data can be very difficult.  This list is
not designed for advocacy  or even discussion of analysis results but ability 
to get data in form usable by R may be or more general interest
to those seeking help on R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xml2-config issues

2011-07-23 Thread Mike Marchywka















Date: Fri, 22 Jul 2011 20:06:34 -0600
From: abmathe...@gmail.com
To: r-help@r-project.org
Subject: [R] xml2-config issues


I'm trying to install the XML package on Ubuntu 10.10, and I keep getting
a warning message the XML could not be found and had non-zero exit
status. How can I fix this problem?

 install.packages()
Loading Tcl/Tk interface ... done
--- Please select a CRAN mirror for use in this session ---
Installing package(s) into ‘/home/amathew/R/i686-pc-linux-gnu-library/2.13’
(as ‘lib’ is unspecified)
trying URL '
http://streaming.stat.iastate.edu/CRAN/src/contrib/XML_3.4-0.tar.gz'
Content type 'application/x-gzip' length 896195 bytes (875 Kb)
opened URL
==
downloaded 875 Kb

* installing *source* package ‘XML’ ...
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
No ability to remove finalizers on externalptr objects in this verison of R
checking for sed... /bin/sed
checking for pkg-config... /usr/bin/pkg-config
checking for xml2-config... no
Cannot find xml2-config
ERROR: configuration failed for package ‘XML’
* removing ‘/home/amathew/R/i686-pc-linux-gnu-library/2.13/XML’

[ my text, hotmail won't highlight original  ]

you probably need to get libxml2 and then you should be able to run 
xml2-config from command line to verfiy it is there. This is a specific
pkg-config ( man pkg-config for details ). For some non-R things
yum or apt-get has not gotten the most recent versions but generally
your package manager should be just fine. 




The downloaded packages are in
‘/tmp/Rtmp2V4huR/downloaded_packages’
Warning message:
In install.packages() :
  installation of package 'XML' had non-zero exit status


Thank You,
Abraham

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cox model approximaions (was comparing SAS and R survival....)

2011-07-22 Thread Mike Marchywka













 From: thern...@mayo.edu
 To: abouesl...@gmail.com
 Date: Fri, 22 Jul 2011 07:04:15 -0500
 CC: r-help@r-project.org
 Subject: Re: [R] Cox model approximaions (was comparing SAS and R 
 survival)

 For time scale that are truly discrete Cox proposed the exact partial
 likelihood. I call that the exact method and SAS calls it the
 discrete method. What we compute is precisely the same, however they
 use a clever algorithm which is faster. To make things even more
 confusing, Prentice introduced an exact marginal likelihood which is
 not implemented in R, but which SAS calls the exact method.

 Data is usually not truly discrete, however. More often ties are the
 result of imprecise measurement or grouping. The Efron approximation
 assumes that the data are actually continuous but we see ties because of
 this; it also introduces an approximation at one point in the
 calculation which greatly speeds up the computation; numerically the
 approximation is very good.
 In spite of the irrational love that our profession has for anything
 branded with the word exact, I currently see no reason to ever use
 that particular computation in a Cox model. I'm not quite ready to
 remove the option from coxph, but certainly am not going to devote any
 effort toward improving that part of the code.

 The Breslow approximation is less accurate, but is the easiest to
 program and therefore was the only method in early Cox model programs;
 it persists as the default in many software packages because of history.
 Truth be told, unless the number of tied deaths is quite large the
 difference in results between it and the Efron approx will be trivial.

 The worst approximation, and the one that can sometimes give seriously
 strange results, is to artificially remove ties from the data set by
 adding a random value to each subject's time.

Care to elaborate on this at all? First of course I would agree that doing
anything to the data, or making up data, and then handing it to an analysis 
tool that doesn't
know you maniplated it can be a problem ( often called interpolation or 
something 
with a legitimate name LOL).  However, it is not unreasonable to do a 
sensitivity
analysis by adding noise and checking the results.  Presumaably adding 
noise to remove things the algorighm doesn't happen to like would
work but you would need to take many samples and examine stats
of how your broke the ties. Now if the model is bad to begin with or
the data is so coarsely binned that you can't get much out of it then ok.

I guess in this case, having not thought about it too much, ties would
be most common either with lots of data or if hazards spiked over time scales 
simlar to your
measurement precision or if the measurement resolution is not comparable to 
hazard
rate. In the latter 2 cases of course the approach is probably quite  limited . 
Consider turning
exponential curves into step functions for example. 


 Terry T


 --- begin quote --
 I didn't know precisely the specifities of each approximation method.
 I thus came back to section 3.3 of Therneau and Grambsch, Extending the
 Cox
 Model. I think I now see things more clearly. If I have understood
 correctly, both discrete option and exact functions assume true
 discrete event times in a model approximating the Cox model. Cox partial
 likelihood cannot be exactly maximized, or even written, when there are
 some
 ties, am I right ?

 In my sample, many of the ties (those whithin a single observation of
 the
 process) are due to the fact that continuous event times are grouped
 into
 intervals.

 So I think the logistic approximation may not be the best for my problem
 despite the estimate on my real data set (shown on my previous post) do
 give
[[elided Hotmail spam]]
 I was thinking about distributing the events uniformly in each interval.
 What do you think about this option ? Can I expect a better
 approximation
 than directly applying Breslow or Efron method directly with the grouped
 event data ? Finally, it becomes a model problem more than a
 computationnal
 or algorithmic one I guess.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different result of multiple regression in R and SPSS

2011-07-19 Thread Mike Marchywka

 From: dwinsem...@comcast.net
 To: seoulseoulse...@gmail.com
 Date: Tue, 19 Jul 2011 18:45:47 -0400
 CC: r-help@r-project.org
 Subject: Re: [R] Different result of multiple regression in R and SPSS

 On Jul 19, 2011, at 6:29 PM, J. wrote:

  Thanks for the answer.

  However, I am still curious about which result I should use? The
  result from
  R or the one from SPSS?

 It is becoming apparent that you do not know how to use the results
 from either system. The progress of science would be safer if you get
 some advice from a person that knows what they are doing.

  Why the results from two programs are different?

 Different parametrizations. If I had to guess I would bet that the
 gender coefficient is R is exactly twice that of the one from SPSS.
 They are probably both correct in the context of their respective
 codings.

I guess I would also suggest, again, run some samples with known data sets
and see what you get(RSSWKDSASWYG). You would want to do this anyway if you 
want to insure
your real data is being used reasonably. You still need to have some way to 
check your
opinion from the expert mentioned above and known data will help there too.  A 
factor
of 2 often shows up from just looking at pictures once you have some intuition. 
I've
often been wrong on intuition, but chasing it down and proving it helps you 
learn a lot :)

 --
 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Very slow optim(): solved

2011-07-14 Thread Mike Marchywka

 Date: Thu, 14 Jul 2011 12:44:18 -0800
 From: toshihide.hamaz...@alaska.gov
 To: r-h...@stat.math.ethz.ch
 Subject: Re: [R] Very slow optim(): solved

 After Googling and trial and errors, the major cause of optimization was not 
 functions, but data setting.
 Originally, I was using data.frame for likelihood calculation. Then, I 
 changed data.frame to vector and matrix for the same likelihood calculation. 
 Now convergence takes ~ 14 sec instead of 25 min. Certainly, I didn't know 
 this simple change makes huge computational difference.

Thanks, can you pass along any additional details like google links you found 
or comment on
the resulting limitation( were you CPU limited converting data formats or did 
this cause memory
problems leading to VM thrashing?)? I've often had c++ code that turns out to 
be IO limited
when I expected I was doing real complicated computations, it never hurts to go 
beyond
the usual suspects LOL. 

 Toshihide Hamachan Hamazaki, 濱崎俊秀PhD
 Alaska Department of Fish and Game: アラスカ州漁業野生動物課
 Diivision of Commercial Fisheries: 商業漁業部
 333 Raspberry Rd. Anchorage, AK 99518
 Phone: (907)267-2158
 Cell: (907)440-9934

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Ben Bolker
 Sent: Wednesday, July 13, 2011 12:21 PM
 To: r-h...@stat.math.ethz.ch
 Subject: Re: [R] Very slow optim()

 Hamazaki, Hamachan (DFG toshihide.hamazaki at alaska.gov writes:

  Dear list,

  I am using optim() function to MLE ~55 parameters, but it is very slow to
 converge (~ 25 min), whereas I can do
  the same in ~1 sec. using ADMB, and ~10 sec using MS EXCEL Solver.

  Are there any tricks to speed up?

  Are there better optimization functions?

 There's absolutely no way to tell without knowing more about your code. You
 might try method=CG:

 Method ‘CG’ is a conjugate gradients method based on that by
 Fletcher and Reeves (1964) (but with the option of Polak-Ribiere
 or Beale-Sorenson updates). Conjugate gradient methods will
 generally be more fragile than the BFGS method, but as they do not
 store a matrix they may be successful in much larger optimization
 problems.

 If ADMB works better, why not use it? You can use the R2admb
 package (on R forge) to wrap your ADMB calls in R code, if you
 prefer that workflow.

 Ben

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Very slow optim()

2011-07-13 Thread Mike Marchywka






 Hamazaki, Hamachan (DFG toshihide.hamazaki at alaska.gov writes:

 
  Dear list,
 
  I am using optim() function to MLE ~55 parameters, but it is very slow to
 converge (~ 25 min), whereas I can do
  the same in ~1 sec. using ADMB, and ~10 sec using MS EXCEL Solver.
 
  Are there any tricks to speed up?
 
  Are there better optimization functions?
 

 There's absolutely no way to tell without knowing more about your code. You
 might try method=CG:

I guess the first thing to do is look at  task manager and see if it is memory
or CPU limited. In the absence of gross algorithmic differences, the other 
stuff
that you may be doing could be a big deal. I saw similar performance issues
in my own c++ code where the fast result was obtained just by sorting the input
data to stop VM thrashing. A sort on large data is noramlly considered slow
and wastful if later algorithm doesn't require that but memory coherence can be
a big deal.






 Method ‘CG’ is a conjugate gradients method based on that by
 Fletcher and Reeves (1964) (but with the option of Polak-Ribiere
 or Beale-Sorenson updates). Conjugate gradient methods will
 generally be more fragile than the BFGS method, but as they do not
 store a matrix they may be successful in much larger optimization
 problems.

 If ADMB works better, why not use it? You can use the R2admb
 package (on R forge) to wrap your ADMB calls in R code, if you
 prefer that workflow.

 Ben

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using t tests

2011-07-10 Thread Mike Marchywka

( after getting confirmation of lack of posts try again, LOL )

 From: marchy...@hotmail.com
 To: r-help@r-project.org
 Subject: RE: [R] Using t tests
 Date: Sun, 10 Jul 2011 10:13:51 -0400

 ( sorry if this is a repost but I meant to post to list and never received
 any indication it was sent to the list, thanks asking for comments about 
 approach
 to data analysis).

  From: marchy...@hotmail.com
  To: j...@bitwrit.com.au; gwanme...@aol.com
  CC: r-help@r-project.org
  Subject: RE: [R] Using t tests
  Date: Sun, 10 Jul 2011 07:35:32 -0400

   Date: Sat, 9 Jul 2011 18:40:43 +1000
   From: j...@bitwrit.com.au
   To: gwanme...@aol.com
   CC: r-help@r-project.org
   Subject: Re: [R] Using t tests

   On 07/08/2011 07:22 PM, gwanme...@aol.com wrote:
Dear Sir,

I am doing some work on a population of patients. About half of them are
admitted into hospital with albumin levels less than 33. The other half 
have
albumin levels greater than 33, so I stratify them into 2 groups, x and 
y
respectively.

I suspect that the average length of stay in hospital for the group of
patients (x) with albumin levels less than 33 is greater than those with
albumin levels greater than 33 (y).

What command function do I use (assuming that I will be using the chi
square test) to show that the length of stay in hospital of those in 
group x is
statistically significantly different from those in group y?

   Hi Ivo,
   Just to make things even more complicated for you, Mark's suggestion
   that the length_of_stay measure is unlikely to be normally distributed
   might lead you to look into a non-parametric test like the Wilcoxon (aka

  ( please correct any of the following which is wrong, but note that
  the discusion is more interesting and useful with details of your goals )
  I'm curious why people still jump to setting arbitrary cutoff points,
  in this case based on what you happen to have sampled, rather than
  try to find a functional relationship between the two parametric
  variables? Generally the thing that separates likely cause from
  noise is smotthness or something you can at least rationalize
  in terms of physical mechanisms. If your question relates
  to the reprodiciblity of a given result ( well this experiment showed
  hi and low were significantly different on hospital stays, maybe the next
  experiement will show the same ) you'd probably like to consider
  the data in relation to possible causes. I'd not sure your disease process
  would know about your median test results when patients walk in. BTW,
  what is terminating the hospital stay, cure death or insurance exhaustion?
  This sounds like you are just trying to reproduce something that is already
  in the literature:cutoff is on the low side of normal and often hypoprotein
  is suspected of being bad, that the higher group would be usually expected 
  to do better no? Although
  I suppose this could have something to do with dehydration etc but the point
  of course is that data interpretation is difficult to do in a vacuum.

   Mann-Whitney in your case) test. You will have to split your
   length_of_stay measure into two like this (assume your data frame is
   named losdf):

   albumin_hilo - albumin  33
   wilcox.test(losdf$length-of-stay[albumin_hilo],
   losdf$length_of_stay[!albumin_hilo])

   or if you use wilcox_test in the coin package:

   albumin_hilo - albumin  33
   wilcox_test(length_of_stay~albumin_hilo,losdf)

   Do remember that the chi-square test is used for categorical variables,
   for instance if you dichotomized your length_of_stay into less than 10
   days or 10 days and over.

   Jim

   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem loading rgdal with Rapache, problem solved due to libexpat with apache.

2011-07-08 Thread Mike Marchywka


Well, I'm not sure it is really an rgdal problem as much as 
having a stale so and I didn;t want to go wreck the thing to capture
the exact error message, the point was that apache build seems to have
come with incompatible expat so.  In the apache/lib dir, I apparently had a much
different libs, now named xxx, than that which was in /lib64. If I dump
the symbols with objdump the one rgdal complained about, IIRC XML_StopParser
does not appear in either output but they do grep differently
as shown below ( 

-rw-r--r-- 1 root root 605314 May  1 07:09 libexpat.a
-rwxr-xr-x 1 root root    805 May  1 07:09 libexpat.la
lrwxrwxrwx 1 root root 17 May  1 07:09 libexpat.so - libexpat.so.0.5.0
lrwxrwxrwx 1 root root 17 May  1 07:09 libexpat.so.0 - libexpat.so.0.5.0
-rwxr-xr-x 1 root root 143144 Jul  6 16:29 libexpat.so.0.5.0
-rwxr-xr-x 1 root root 398422 Jul  6 16:27 libexpat.soxxx.0.5.0

[marchywka@351915-www1 lib]$ grep XML_StopParser libexpat*
Binary file libexpat.so matches
Binary file libexpat.so.0 matches
Binary file libexpat.so.0.5.0 matches
[marchywka@351915-www1 lib]$ 



libexpat.so.0.5.0:
    linux-vdso.so.1 =  (0x2b4a1fdd2000)
    libc.so.6 = /lib64/libc.so.6 (0x0035aca0)
    /lib64/ld-linux-x86-64.so.2 (0x0035ac60)
libexpat.soxxx.0.5.0:
    linux-vdso.so.1 =  (0x7fff8bdfc000)
    libc.so.6 = /lib64/libc.so.6 (0x2b143b19e000)
    /lib64/ld-linux-x86-64.so.2 (0x0035ac60)







 To: r-help@r-project.org
 Subject: Re: [R] problem loading rgdal with Rapache, problem solved due to 
 libexpat with apache.

 You should provide at least the output of sessionInfo() and the messages
 issued by rgdal on load. rgdal has recently been updated to try to address
 issues of interference between applications, and it would help very much to
 know your intentions, platform, and the versions of the software you are
 using.

 Roger


 Mike Marchywka wrote:
 
  Has anyone had problems with Rapache that don't show up on command line
  execution of R? I just ran into this loading rgdal in Rapache page and
  having
  a problem with loading shared object. The final complaint was that
  Stop_XMLParser
  was undefined- this was surprising since grep -l showed it in expat lib
  and it worked
  fine from command line. Finally, I noted the search path had apache/lib
  ahead
  of the other places where expat exists.  The libexpat there although
  apparently having
  same versio, so.0.5.0 did not grep for this symbol. Anyway, copying one
  into
  the other fixed the problem and the page works fine but curious is anyone
  has thoughts on what could have caused this. Sorry I don't have
  specific output but i thought you may remember if you ran into
  it and it is not worth trying to replicate. Thanks.
 
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 -
 Roger Bivand
 Department of Economics
 NHH Norwegian School of Economics
 Helleveien 30
 N-5045 Bergen, Norway

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/problem-loading-rgdal-with-Rapache-problem-solved-due-to-libexpat-with-apache-tp3650119p3652639.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] superimposing network graphs

2011-07-06 Thread Mike Marchywka

 Date: Wed, 6 Jul 2011 13:10:14 +0200
 To: r-help@r-project.org
 CC: but...@uci.edu
 Subject: [R] superimposing network graphs

 Dear all,

 I have a undirected network (g), representing all the sexual relationships 
 that ever existed in a model community.
 I also have a directed edgelist (e) which is a subset of the edgelist of g. e 
 represents the transmission pathway of HIV.
 Now I would like to superimpose the picture of the sexual relationships with 
 arrows in a different colour, to indicate where in the network HIV was 
 transmitted.

If you can't find an R answer, I've found that dot from graphviz is really 
helpful. It should not be too hard to generate
the source code from your raw data file or R but others may have a more R 
answer. Package sna may be helpful for
an R-only approach. 

 Any ideas on how to do this?
 Many thanks,

 Wim

 Wim Delva MD, PhD
 International Centre for Reproductive Health
 Ghent University, Belgium
 www.icrh.org

 South African Centre for Epidemiological Modelling and Analysis
 Stellenbosch University, South Africa
 www.sacema.com
 epi update: www. sacemaquarterly.com

 Tel: +27 21 808 27 79 (work)
 Cell: +27 72 842 82 33

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem loading rgdal with Rapache, problem solved due to libexpat with apache.

2011-07-06 Thread Mike Marchywka



Has anyone had problems with Rapache that don't show up on command line
execution of R? I just ran into this loading rgdal in Rapache page and having
a problem with loading shared object. The final complaint was that 
Stop_XMLParser
was undefined- this was surprising since grep -l showed it in expat lib and it 
worked
fine from command line. Finally, I noted the search path had apache/lib ahead
of the other places where expat exists.  The libexpat there although apparently 
having
same versio, so.0.5.0 did not grep for this symbol. Anyway, copying one into
the other fixed the problem and the page works fine but curious is anyone
has thoughts on what could have caused this. Sorry I don't have
specific output but i thought you may remember if you ran into
it and it is not worth trying to replicate. Thanks.



  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] wavelets

2011-07-04 Thread Mike Marchywka

 From: jdnew...@dcn.davis.ca.us
 Date: Mon, 4 Jul 2011 00:45:41 -0700
 To: tyagi...@gmail.com; r-help@r-project.org
 Subject: Re: [R] wavelets

 Study the topic more carefully, I suppose. My understanding is that wavelets 
 do not in themselves compress anything, but because they sort out the 
 interesting data from the uninteresting data, it can be easy to toss the 
 uninteresting data (lossy data compression). Perhaps you should understand 
 better what your Matlab library is doing.
 ---
 Jeff Newmiller The . . Go Live...
 DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
 Live: OO#.. Dead: OO#.. Playing
 Research Engineer (Solar/Batteries O.O#. #.O#. with
 /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 user123 tyagi...@gmail.com wrote:

 I'm new to the topic of wavelets. When I tried to use the mra function in the
 wavelets package, the data is not getting compressed. eg. if the original
 data has 500 values , the output data also has the same.
 However in MATLAB, depending on the level of decompositon, the data gets
 compressed.
 How do I implement this in R?

can you post some code? You can always compress into one value of course by 
turning
bytes into a single char string, what you want is entropy. I posted some
example code before and I remember it took effort to not get the subsampling.
mra is probably multi-resolution analysis and I'd suppose you want all the 
samples.
You probably need paper and pencil however at this point. 

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/wavelets-tp3642973p3642973.html
 Sent from the R help mailing list archive at Nabble.com.

 _

 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Protecting R code

2011-07-04 Thread Mike Marchywka

Put it on rapache or otherwise server but this seems like a waste depending on 
what you are doing 

Server side is only good way but making c++ may be interesting test
Sent from my Verizon Wireless BlackBerry

-Original Message-
From: Vaishali Sadaphal vaishali.sadap...@tcs.com
Date: Mon, 4 Jul 2011 16:48:13 
To: spencer.gra...@prodsyse.com
Cc: r-help@r-project.org; b.rowling...@lancaster.ac.uk
Subject: Re: [R] Protecting R code

Hey All,

Thank you so much for quick replies.
Looks like translation to C/C++ is the only robust option. Do you think 
there exists any ready-made R to C translator?

Thanks
--
Vaishali

Vaishali Paithankar Sadaphal
Tata Consultancy Services
Mailto: vaishali.sadap...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
    Business Solutions
    Outsourcing

From:
Spencer Graves spencer.gra...@prodsyse.com
To:
Barry Rowlingson b.rowling...@lancaster.ac.uk
Cc:
Vaishali Sadaphal vaishali.sadap...@tcs.com, r-help@r-project.org
Date:
07/04/2011 08:42 PM
Subject:
Re: [R] Protecting R code

Hello:

On 7/4/2011 7:41 AM, Barry Rowlingson wrote:
 On Mon, Jul 4, 2011 at 8:47 AM, Vaishali Sadaphal
 vaishali.sadap...@tcs.com  wrote:
 Hi All,

 I need to give my R code to my client to use. I would like to protect 
the
 logic/algorithms that have been coded in R. This means that I would not
 like anyone to be able to read the code.
    At some point the R code has to be run. Which means it has to be
 read by an interpreter that can handle R code. Which means, unless you
 rewrite the interpreter, the R code must exist as such.

   Even if you could compile R into C code into machine code and
 distribute a .exe file, its still possible in theory to
 reverse-engineer it and get something like the original back - the
 original logic if not the original names of the variables and
 functions.

   You could rewrite the interpreter to only run encrypted, signed code
 that requires a decryption key, but you still have to give the user
 the decryption key at some point in order to get the plaintext code.
 Again, its an obfuscation problem of hiding the key somewhere, and
 hence is going to fail.

   It all depends on how much expense you want to go to in order to make
 the expense of circumventing your solution more than its worth. Tell
 me how much that is, and I will tell you the solution.

   For total security[1], you need to run the code on servers YOU
 control, and only give access via a network API. You can do this with
 RServe or any of the HTTP-based systems like Rapache.

   An organization I know that encrypted R code started with making 
it available only on their servers.  This was maybe four years ago.  I'm 
not sure what they do now, but I think they have since lost their major 
proponents of R internally and have probably translated all the code 
they wanted to sell into a compiled language in a way that didn't 
require R at all.

   Spencer

 Barry

 [1] Except of course servers can be hacked or socially-engineered
 into. For total security, disconnect your machine from the network and
 from any power supply.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to fit ARMA model

2011-07-01 Thread Mike Marchywka

 From: bbol...@gmail.com
 Date: Fri, 1 Jul 2011 13:16:45 +
 Subject: Re: [R] How to fit ARMA model

 UnitRoot akhussanov at gmail.com writes:

  Hello,
  I am having some problems with fitting an ARMA model to my time series data
  (randomly generated numbers). The thing is I have tried many packages
  [tseries, fseries, FitARMA etc.] and all of them giving very different
  results. I would appreciate if someone could post here what the best package
  is for my purpose. 
  Also, after having done the fitting, I would like to check for the model's
  adequacy. How can I do this?
  Thanks.

   It's hard to say without more detail -- we don't know what your
 purpose is (beyond the general one of fitting an ARMA model to data).
 I would say that when in doubt, if there is functionality in 'core' R --
 the base and recommended packages -- that it is likely to be the most
 stable and well tested.  (Not always true -- there are some very good
 contributed packages, and sometimes the functions in R are missing
 some advanced features -- but a good rule of thumb.)  So try ?arima.

  Another rule of thumb is that Venables and Ripley 2002 is a good
 starting point (although again not necessarily as extensive as specialized
 topics) for how do I do xxx in R?  See Chapter 14.

I had to check my memory but IIRC wiki or other non-authoritative source
suggested that this is not much different than trying to design or analyze
an IIR (infinte impulse response) filter that is presumed excited with white 
noise. In such a case, the
power spectrum could be interpretted in terms of zero and pole ( zeroes
of denomintator) of the transfer function giving you some graphical info
to help develop intuition. 

It is not hard to make your own tests and sanity checks if you are just getting
started. 
( hand typed as my 'dohs machine is busy running malware  scans and this one R 
install
has a few issues ) 
x-runif(8192)
x1-c(x[-1],x[1])
x2-c(x1[-1],x1[1])

y=x+x1+x2
plot(log(abs(fft(y

makes bode plot ( if I have the name right) with zeroes at the root. Extension 
to
IIR case should be obvious. 

   As suggested in the help for ?arima, ?tsdiag is a generic function
 to plot time series diagnostics

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running R from windows command prompt

2011-06-28 Thread Mike Marchywka

 To: lig...@statistik.tu-dortmund.de
 CC: r-help@r-project.org
 Subject: Re: [R] Running R from windows command prompt

 Thanks for your help.

 I tried the way you mentioned for my first question. But I am not getting
 any results.
 Can you please explain in detail the process through which I can run a R
 code from windows command prompt.

While your problem likely has nothing to do with windoh's, I would suggest
you go get cygwin ( see google) and use that. I and probably others
have lots of scripts that work there and on linux. Bash scripts are
a better way to proceed if you want to do more than a test case and
they integrate with lots of other existing things. 

 2011/6/28 Uwe Ligges lig...@statistik.tu-dortmund.de

  On 28.06.2011 11:54, siddharth arun wrote:

  1. I have a R program in a file say functions.R.
  I load the functions.R file the R using source(function.R) and then
  call
  functionsf1(), f2() etc. which are declared and defined within
  function.R
   file.
  I also need to load a couple of R libraries using library() before I can
  use f1(), f2() etc.

  My question is can I acheive all this (i.e. calling function f1() and
  f2())
  from the windows prompt without opening the R environment ? If yes then
  how?

  Put all the code into a file, e.g. foo.R, and run

  R CMD BATCH foo.R from the windows command shell.

   2. Also, Is there any way to scan strings directly. Like scan() function
  only scans numerical values. Is there any way to scan strings?

  Yes, it is called scan() which is for arbitrary data, including
  character(). See ?scan. Otherwise, you may want to look into ?readLines

  Uwe Ligges

 -- 
 Siddharth Arun,
 4th Year Undergraduate student
 Industrial Engineering and Management,
 IIT Kharagpur

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-25 Thread Mike Marchywka

Date: Thu, 23 Jun 2011 15:41:25 -0700
From: jmo...@student.canterbury.ac.nz
To: r-help@r-project.org
Subject: Re: [R] Time-series analysis with treatment effects - statistical
approach

Mike Marchywka wrote:

I discovered a way to do repetitive tasks that can be concisely specified
using
something called a computer.

Now that's funny :)

well, there is a point to that and that is that with cheap computations you can
do different analyses than you did in the past.

There were not controlled tests. It was a field experiment testing the
effects that various pavement designs have on underlying soil moisture. Two
designs incorporated a porous pavement surface course, while two others were
based on standard impervious concrete pavement...the control was just bare,
exposed soil.

As you can see from the graph, the control responds quickly to rainfall
events, but dries out quickly as well due to evaporation. The porous
pavement allows for quick infiltration of precipitation, while the
impervious pavement eventually allows infiltration of rainfall, but it's
delayed.

My objective is to be able to differentiate between the pavement treatments,
such that I can state with statistical confidence that porous pavements
affects underlying soil moisture differently than impervious pavements.

I think this is obvious just looking at it, but I wanted to be able to back
it up with stats. What I'd done previously is to average by week. But as I
mentioned, I thought that an anova table with 104 rows relating to each week
was a poor way of analyzing the data. But that being said, it effectively
allows me to check for treatment-related differences.

I don't think we've mentioned R in the past few posts but I guess pointing
people to useful things that R can do is not too big a problem and
if you have ever dealt with analysis for the sake of rationalization
you can appreciation that is a huge problem :) Generally you'd
like to have reproducible results and if you don't have IID ( stationary
parameters of the population you wish to characterize) you
are not even asking a good question about the system. It may
be helpful as quick check of something but otherwise difficult to
interpret- do your results mean these things are different in the
desert? You appear to have data points from a bunch of different situations.
After the fact selection is often helpful but generally stats people frown
on that as backing up anything ( unless it supports sponsor's opinion LOL).

http://www.itl.nist.gov/div898/handbook/prc/section4/prc432.htm

I guess I'd either go with dynamic model or convert into dollars and then
see if you have clinically and statistically significant differences in things
of relevance.

Thanks for the suggestions to date. Maybe the more I explain what I'm trying
to achieve, the more focussed the suggestions will be. The vaguer the
question, the broader the response, right?

Thanks again,
Justin

--
View this message in context:
http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3621179.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

Re: [R] Access R functions from web

2011-06-25 Thread Mike Marchywka

 From: orvaq...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Access R functions from web

 I need a way to send R objects and call R functions from web. Is there any
 project close or similar to that?

 I want to be able to send an HTTP rquest from an existing application with
 some data. And obtain a plot from R.

This should be a faq but it can take a while to find, see Rserve and Rapache.
I have been using Rapache now on red hat and debian  and it works nicely.
Also, the goog visualization API works in some limited testing but apparently
a lot of that requires flash ( which I either did not have or had turned off 
LOL).

 Thanks in advance
 Caveman

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is R a solution to my problem?

2011-06-24 Thread Mike Marchywka

 From: jdnew...@dcn.davis.ca.us
 Date: Fri, 24 Jun 2011 02:13:29 -0700
 To: ikoro...@gmail.com; R-help@r-project.org
 Subject: Re: [R] Is R a solution to my problem?

 Almost any tool can be bent to perform almost any task, but that combination 
 sounds like a particularly difficult way to get started. There are a variety 
 of purpose-built software development environments for serving 
 dynamically-constructed web pages, including C++ libraries.

I guess Rapache and Rserve would be worth looking at if you think it is a match 
with R.

 However, discussions of such topics do not belong on this mailing list. Give 
 Google a chance.
 ---
 Jeff Newmiller The . . Go Live...
 DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
 Live: OO#.. Dead: OO#.. Playing
 Research Engineer (Solar/Batteries O.O#. #.O#. with
 /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
 --- 
 Sent from my phone. Please excuse my brevity.

 Igor Korot ikoro...@gmail.com wrote:

 Hi, ALL,
 My name is Igor and I am a Software Developer.
 Recently I wrote a program for a company using C++.

 The problem I am facing right now is that this program (demo) has to
 run using web interface. I.e. the potential user goes to the website,
 click on the link and the program executes somewhere on the server.

 Would that be possible with the R extension / add-on?

 Thank you.

 _

 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help

2011-06-24 Thread Mike Marchywka



I'm surprised the pdf made it through mail list but many will not open
these and I have just cleaned malware off my 'dohs system. Can you explain
what you want to do in simple text?



Subject: [R] R-help

Hi
 
Please assist me to code the attached pdf in R.
 
Your help will be greatly appreciated.
 
Edward
Actuarial Science Student

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.  
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-23 Thread Mike Marchywka










 From: rvarad...@jhmi.edu
 To: marchy...@hotmail.com; jmo...@student.canterbury.ac.nz; 
 r-help@r-project.org
 Subject: RE: [R] Time-series analysis with treatment effects - statistical 
 approach
 Date: Thu, 23 Jun 2011 02:59:19 +
 
 If you have any specific features of the time series of soil moisture, you 
 could either model that or directly estimate it and test for differences in 
 the 4 treatments.  If you do not have any such specific considerations, you 
 might want to consider some nonparametric approaches such as functional data 
 analysis, in particular  functional principal components analysis (fPCA) 
 might be relevant.  You could also consider semiparametric methods. For 
 example, take a look at the SemiPar package.  
 
 Ravi.

I guess just playing with it while waiting for other code to finish, I'd be 
curious if you had
any controlled tests such as impulse response- what did treatment do when you 
held
at constant temp and humidity and illumination in stll air after single burst 
of rain? 
If you were pursing the model approach, quick look suggests qualitatitve rather 
than
just quantitative effects - in one case looks like linear or biphasic dry out 
dynamics, others
seem to just fall off of cliff. 

Objective of course matters too, if you are trying to sell this to farmers, 
maybe a plot of
moisture for each treatment against control would help. I just did that after 
averaging over sensors
and it may be a reasonable analysis for cost effectiveness if you can translate 
moisture into
dollars. Now you would still need to put error bars on comparisons and use 
words carefully etc
but that approach may be more important than getting at dynamics. I dunno.
Consider that in fact maybe all you care about is peaks, if too dry for one day
kills the crop then that is what you want to focus the analysis on etc etc etc.






 
 From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf 
 of Mike Marchywka [marchy...@hotmail.com]
 Sent: Wednesday, June 22, 2011 9:31 PM
 To: jmo...@student.canterbury.ac.nz; r-help@r-project.org
 Subject: Re: [R] Time-series analysis with treatment effects - statistical 
 approach
 
  Date: Wed, 22 Jun 2011 17:21:52 -0700
  From: jmo...@student.canterbury.ac.nz
  To: r-help@r-project.org
  Subject: Re: [R] Time-series analysis with treatment effects - statistical 
  approach
 
  Hi Mike, here's a sample of my data so that you get an idea what I'm working
  with.
 
 Thanks, data helps make statements easier to test :)  I'm quite
 busy at moment but I will try to look during dead time.
 
 
  http://r.789695.n4.nabble.com/file/n3618615/SampleDataSet.txt
  SampleDataSet.txt
 
  Also, I've uploaded an image showing a sample graph of daily soil moisture
  by treatment. The legend shows IP, IP+, PP, PP+ which are the 4 treatments.
  Also, I've included precipitation to show the soil moisture response to
  precip.
 
 Personally I'd try to write a simple physical model or two and see which 
 one(s)
 fit best. It shouldn't be too hard to find sources and sinks of water and 
 write
 a differential equation with a few parameters.  There are probably online
 lecture notes that cover this or related examples. You probably suspect a
 mode of action for the treatments, see if that is consistent with observed 
 dyanmics.
 You may need to go get temperature and cloud data but it may or may not
 be worth it.
 
 
  http://r.789695.n4.nabble.com/file/n3618615/MeanWaterPrecipColour2ndSeasonOnly.jpeg
 
  I have used ANOVA previously, but I don't like it for 2 reasons. The first
  is that I have to average away all of the interesting variation. But mainly,
 
 There are  a number of assumptions that go into that to make it useful. If
 you are just drawing samples from populations of identical independent things
 great but here I would look at things related to non-stationary statistics of
 time series.
 
  it becomes quite cumbersome to do a separate ANOVA for each day (700+ days)
  or even each week (104 weeks).
 
 I discovered a way to do repetitive tasks that can be concisely specified 
 using
 something called a computer.  Writing loops is pretty easy, don't give up
 due to cumbersomeness. Also, you could try a few simple things like plotting
 difference charts ( plot treatment minus control for example).
 
 If you approach this purely empirically, there are time series packages
 and maybe the econ/quant financial analysts would have some thoughts
 that wouldn't be well known in your field.
 
 
 
  Thanks for your help,
  -Justin
 


- - - - - -
 
Mike Marchywka | V.P. Technology
 
415-264-8477
marchy...@phluant.com
 
Online Advertising and Analytics for Mobile
http://www.phluant.com
 
 




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read

Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-22 Thread Mike Marchywka

Subject: [R] Time-series analysis with treatment effects - statistical
approach

Hello all R listers,
I'm struggling to select an appropriate statistical method for my data set.
I have collected soil moisture measurements every hour for 2 years. There
are 75 sensors taking these automated measurements, spread evenly across 4
treatments and a control. I'm not interested in being able to predict soil
future soil moisture trends, but rather in knowing whether the treatment
affected soil moisture response overall. In particular, it would be
interesting to inspect treatment related response within defined periods.
For example, a visual inspection of my data suggests that soil moisture is
equivalent across treatments during wet winter months, but during the dry
summer months, a treatment effect appears.

Any help on this topic would be very appreciated. I've looked far and wide
through academic literature for similar experimental designs, but have not
had any success as yet.

Can you post your data? This makes it more interesting for potential responders
and makes it more likely you get a relevant answer or observation.

I guess the stats people would just suggest something like anova and
you could just compare moisture means among the treatments and controls
but presumably things are not stationary and it may be easy to get confusing
results.

http://en.wikipedia.org/wiki/Analysis_of_variance

Usually there is some analysis plan made up a priori I would imagine or at least
there is prior literature in your field. I guess citeseer or google
scholar/books may
be useful if you have some key words or author names. The data you have
sounds to an engineer just like any other discrete time signal and maybe digital
signal processing literature would be of interest. Presumably you have some
models to test, and more details depend on the specifics of your competing
hypotheses. Kalman filtering maybe?

Cheers,
Justin

Dr. Justin Morgenroth
New Zealand School of Forestry
Christchurch, New Zealand

--
View this message in context:
http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3615856.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

Re: [R] Time-series analysis with treatment effects - statistical approach

2011-06-22 Thread Mike Marchywka

Date: Wed, 22 Jun 2011 17:21:52 -0700
From: jmo...@student.canterbury.ac.nz
To: r-help@r-project.org
Subject: Re: [R] Time-series analysis with treatment effects - statistical
approach

Hi Mike, here's a sample of my data so that you get an idea what I'm working
with.

Thanks, data helps make statements easier to test :) I'm quite
busy at moment but I will try to look during dead time.

http://r.789695.n4.nabble.com/file/n3618615/SampleDataSet.txt
SampleDataSet.txt

Also, I've uploaded an image showing a sample graph of daily soil moisture
by treatment. The legend shows IP, IP+, PP, PP+ which are the 4 treatments.
Also, I've included precipitation to show the soil moisture response to
precip.

Personally I'd try to write a simple physical model or two and see which one(s)
fit best. It shouldn't be too hard to find sources and sinks of water and write
a differential equation with a few parameters. There are probably online
lecture notes that cover this or related examples. You probably suspect a
mode of action for the treatments, see if that is consistent with observed
dyanmics.
You may need to go get temperature and cloud data but it may or may not
be worth it.

http://r.789695.n4.nabble.com/file/n3618615/MeanWaterPrecipColour2ndSeasonOnly.jpeg

I have used ANOVA previously, but I don't like it for 2 reasons. The first
is that I have to average away all of the interesting variation. But mainly,

There are a number of assumptions that go into that to make it useful. If
you are just drawing samples from populations of identical independent things
great but here I would look at things related to non-stationary statistics of
time series.

it becomes quite cumbersome to do a separate ANOVA for each day (700+ days)
or even each week (104 weeks).

I discovered a way to do repetitive tasks that can be concisely specified using
something called a computer. Writing loops is pretty easy, don't give up
due to cumbersomeness. Also, you could try a few simple things like plotting
difference charts ( plot treatment minus control for example).

If you approach this purely empirically, there are time series packages
and maybe the econ/quant financial analysts would have some thoughts
that wouldn't be well known in your field.

Thanks for your help,
-Justin

--
View this message in context:
http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3618615.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

Re: [R] different results from nls in 2.10.1 and 2.11.1

2011-06-20 Thread Mike Marchywka


 Date: Mon, 20 Jun 2011 11:25:54 +0200
 From: lig...@statistik.tu-dortmund.de
 To: p.e.b...@dunelm.org.uk
 CC: r-help@r-project.org; pb...@astro.uni-bonn.de
 Subject: Re: [R] different results from nls in 2.10.1 and 2.11.1
 
 Since one is a 32-bit and the other one a 64-bit, and therefore the 
 compiler is also different, you can get different numerical results 
 easily, even with identical versions of R (and most of us do not have 
 outdated R installations around).
 
 I just tried your example on 3 different systems with R-2.13.0 and all 
 told me singular convergence...
 
 Uwe Ligges
 
 
 
 
 
 On 18.06.2011 15:44, Philip Bett wrote:
  Hi,
  I've noticed I get different results fitting a function to some data on
  my laptop to when I do it on my computer at work.

I guess the best all around approach is to dump the results from your
FisherAvgPdf function and get some idea what trajectory it takes
in the different cases. This is presumably not just an issue with R
and could have something to tell you about your data vs the model.
Even if it converged without incident, you'd probably want to 
look at more details.  You apparently are trying to fit a histogram to
a pdf and it is not too hard to just plot fit over histo and
spot check parameter space. 
 
Consider for example, 
 
plot(h$mids,h$density,type=b)
points(h$mids,FisherAvgdPdf(acos(h$mids),3.527e-8,3.198),pch=20)
points(h$mids,FisherAvgdPdf(acos(h$mids),3.527e-8,3.298),pch=13)
points(h$mids,FisherAvgdPdf(acos(h$mids),4.527e-8,3.198),pch=14)

and the find various measures for how good/bad these are etc. 
 
e-h$density-FisherAvgdPdf(acos(h$mids),3.527e-8,3.198)
sum(e*e)
e-h$density-FisherAvgdPdf(acos(h$mids),3.527e-8,3.298)
sum(e*e)

the error surface as function of parameters seems smooth etc,
 
foo - function (x,y) { e-h$density-FisherAvgdPdf(acos(h$mids),x,y) ; sum(e*e) 
}

x=(2+(1:100)*.02)/1e8;
y=3+((1:100)/100);


df-data.frame(x=0,y=0,z=0);
for ( i in x ) { for ( j in y ) { gg-data.frame(i,j,foo(i,j));str(gg); 
colnames(gg)-colnames(df); df-rbind(df,gg); }}
sel=which(df$x!=0)
scatterplot3d(df$x[sel],df$y[sel],df$z[sel])

 
 
 
 
  Here's a code snippet of what I do:
 
  ##--
  require(circular) ## for Bessel function I.0
 
  ## Data:
  dd - c(0.9975948929787, 0.9093316197395, 0.7838819026947,
  0.9096108675003, 0.8901804089546, 0.2995955049992, 0.9461286067963,
  0.8248071670532, 0.2442084848881, 0.2836948633194, 0.7353935241699,
  0.5812761187553, 0.8705610632896, 0.8744471669197, 0.7490273118019,
  0.9947383403778, 0.9154829382896, 0.8659985661507, 0.6448246836662,
  0.8588128685951, 0.7347437739372, -0.1645197421312, 0.970999121666,
  0.8038327097893, 0.9558997154236, 0.6846113204956, 0.6286814808846,
  0.9201356172562, 0.9422197341919, 0.3470877110958, 0.4154576957226,
  0.0721184238791, 0.14151956141, -0.6142936348915, -0.4688512086868,
  0.6805665493011, 0.3594025671482, 0.8991097211838, 0.7656877636909,
  0.9282909035683, 0.9454715847969, 0.9766132831573, 0.4316343963146,
  0.62679708004, 0.2093886137009, 0.3937581181526, 0.4254160523415,
  0.8684504628181, 0.3844584524632, 0.9578431844711, 0.956972181797,
  0.4456568360329, 0.9793710708618, 0.5825698971748, 0.929228246212,
  0.9211971759796, 0.9407976865768, 0.821156680584, 0.2048042863607,
  0.6473184227943, 0.9456319212914, 0.7021154165268, 0.9761978387833,
  0.1485801786184, 0.2195029109716, 0.5378785729408, 0.8304615020752,
  0.8596342802048, 0.950027525425, 0.9102076888084, 0.5108731985092,
  0.7200184464455, 0.3571084141731, 0.9765330553055, -0.143017962575,
  0.8576183915138, 0.1283493340015, -0.3226098418236, 0.7031792402267,
  0.8708582520485, 0.56754809618, 0.060470353812, 0.8015220761299,
  0.7363410592079, 0.671902179718, 0.8082517385483, 0.9468197822571,
  0.9729647636414, 0.7919752597809, 0.9539568424225, 0.4840737581253,
  0.850653231144, 0.5909016132355, 0.8414449691772, 0.9699150323868)
 
  xlims - c(-1,1)
  bw - 0.05
  b - seq(xlims[1],xlims[2],by=bw) ; nb - length(b)
  h - hist( dd, breaks=b, plot=FALSE)
 
  FisherAvgdPdf - function(theta,theta0,kappa){
  A - kappa/(2*sinh(kappa))
  A * I.0( kappa*sin(theta)*sin(theta0) ) * exp(
  kappa*cos(theta)*cos(theta0) )
  }
 
 
  nls(dens ~ FisherAvgdPdf(theta,theta0,kappa),
  data = data.frame( theta=acos(h$mids), dens=h$density ),
  start=c( theta0=0.5, kappa=4.0 ),
  algorithm=port, lower=c(0,0), upper=c(acos(xlims[1]),500),
  control=list(warnOnly=TRUE) )
 
  ##--
 
  On one machine, nls converges, and on the other it doesn't. Any ideas
  why, and which is right? I can't see anything in R News that could be
  relevant.
 
 
  The different R versions (and computers) are:
   R.version
  _
  platform i686-pc-linux-gnu
  arch i686
  os linux-gnu
  system i686, linux-gnu
  status
  major 2
  minor 11.1
  year 2010
  month 05
  day 31
  svn rev 52157
  language R
  version.string R version 2.11.1

Re: [R] Multivariate HPD credible volume -- is it computable in R?

2011-06-19 Thread Mike Marchywka

 Date: Sun, 19 Jun 2011 19:06:20 +1200
 From: agw1...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Multivariate HPD credible volume -- is it computable in R?

 Hi all,

 I'm new to the list and am hoping to get some advice. I have a set of
 multivariate data and would like to find the densest part of the data cloud
 containing 95% of the data, like a 95% HPD credible volume. Is there any R
 code available to compute that?

It looks like the LaplacesDemon pkg was just updated FWIW,

http://www.google.com/search?hl=enq=hpd+credible+volume+cran

If youjust  want to find the density of your data in some n-dim space, that 
sounds like
multi dim binning or histogram would work (you can check google but IIRC there 
is
no general n-dim binning pacakge.  I also mentioned a version based
on a density field being associated with each point which could be summed over
all points to get density at arbirtary point but you need to implrement that in 
some
intelligent way for it to be fast ). I guess you could bin using aggregate see 
if this
example would work,

df-data.frame(z=runif(100), x-runif(100)*10,y-runif(100)*5,c-rep(1,100))
d-aggregate(c~floor(x)+floor(y),df,FUN=sum)
d
which (d$c==max(d$c))

now at this point you can imagine you may be able to find a surface that 
encloses 
a certain amount of your data

 Thank you very much! Your help and patience are much appreciated.

 G.S.

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Server question

2011-06-18 Thread Mike Marchywka

 From: oliver.jo...@digred.com
 To: r-help@r-project.org
 Date: Fri, 17 Jun 2011 15:35:38 +0100
 CC: rebeccahard...@deltaeconomics.com
 Subject: [R] Server question

 Hi

 A client of mine has asked me to investigate the installation of R-software.

 Could anyone tell me whether the software works only on a client machine or
 whether it sits on a server with clients attaching to it?

Did anyone answer this yet? See Rserve and Rapache on google. Both could
be useful to you or get you started.

 Not immediately clear from the docs.

 Best

 Oliver

 --

 Oliver Jones

 T: 01845 595911
 M: 07977 122089

 DIG*RED Web Production | http://www.digred.com/ www.digred.com
 2 Vyner's Yard
 Rainton
 North Yorkshire

 YO7 3PH

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About 'hazard ratio', urgent~~~

2011-06-14 Thread Mike Marchywka














 Date: Mon, 13 Jun 2011 19:44:15 -0700
 From: dr.jz...@gmail.com
 To: r-help@r-project.org
 Subject: [R] About 'hazard ratio', urgent~~~

 Hi,

 I am new to R.

 My question is: how to get the 'hazard ratio' using the 'coxph' function in
 'survival' package?

You can probably search the docs for hazard terms, for example, 

http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf


and try running known test data through to verify. For example,  
it does seem that the exp column contains a decent estimate of hazard
ratio in simple cases( not quite right 3 sig figs but haven't given this
a lot of thought and it is still early here, hoping from input from someone
who can explain better) ,

 ?rexp

 ns=10
 df-data.frame(gr-c(rep(0,ns),rep(1,ns)),t=c(rexp(ns,1),rexp(ns,3)))
 coxph(Surv(t)~gr,df)
Call:
coxph(formula = Surv(t) ~ gr, data = df)

   coef exp(coef) se(coef)   z p
gr 1.09  2.98  0.00503 217 0
Likelihood ratio test=47382  on 1 df, p=0  n= 20, number of events= 2e+05


 df-data.frame(gr-c(rep(0,100),rep(1,100)),t=c(rexp(100,1),rexp(100,2)))
 coxph(Surv(t)~gr,df)
Call:
coxph(formula = Surv(t) ~ gr, data = df)

coef exp(coef) se(coef)z   p
gr 0.658  1.930.148 4.44 8.8e-06
Likelihood ratio test=19.6  on 1 df, p=9.5e-06  n= 200, number of events= 200
 df-data.frame(gr-c(rep(0,100),rep(1,100)),t=c(rexp(100,1),rexp(100,1)))
 coxph(Surv(t)~gr,df)
Call:
coxph(formula = Surv(t) ~ gr, data = df)

  coef exp(coef) se(coef)  zp
gr -0.0266 0.9740.142 -0.187 0.85
Likelihood ratio test=0.03  on 1 df, p=0.852  n= 200, number of events= 200



 thanks,

 karena

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/About-hazard-ratio-urgent-tp3595527p3595527.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need script to create new waypoint

2011-06-14 Thread Mike Marchywka


 
 
 Now I need to change this data set into one with waypoints at regular 
 intervals: for example 2 
 
I guess your question is about algorithm ideas for making up data due to
unspecified percrived need. Anything you can specify completely should be easy
to code in R, maybe a bash script, or c++ or java, not sure the Excel was 
intended
for arbitrary code. Generally the choice of how you manufacture data 
would be dictated by the percieved need. If you can elaborate on the need
maybe someone can help you pick an approach to making things up.
 
There is a sig list for geo problems that may know your requirement better,
 
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
 
and of course if your need is to replicate prior work in the field, you
can contact authors directly for questions about what they did.
 
If your question is about how R can handle time series with pre-packaged
things, there is quite a bit notably zoo and ts which
I think should come up with ?zoo and ?ts
 
http://www.google.com/search?hl=ensafe=offq=R+ts+regular+time+intervals
 
also try interpolation search. 
 
 library(zoo)
 ?zoo
 ?ts

 
 


 From: anti...@hotmail.com
 To: r-help@r-project.org
 Date: Tue, 14 Jun 2011 11:55:57 +
 Subject: [R] Need script to create new waypoint


 Dear help-list members,

 I am a student at Durham University (UK) conducting a PhD on spatial 
 representation in baboons. Currently, I'm analysing the effect of sampling 
 interval on home range calculations.

 I have followed the baboons for 234 days in the field, each day is 
 represented by about 1000 waypoints (x,y coordinates) recorded at irregular 
 time intervals. Consecutive waypoints in this raw data set have an average 
 distance interval of 5 meters an average time interval of 23 seconds (but 
 when baboons were stationary the time interval could be much larger - e.g. 
 waypoint 7 below).This raw data set needs to become a data set with waypoints 
 at regular intervals and thus, 'new' waypoints have to be 'created'. 
 Eventually, I want to use seven different time intervals: 2, 5, 10, 15, 30, 
 45 and 60 minute intervals. I have tried in Excel, but I am not managing it. 
 I have some experience with R, and although I can 'read' quite complicated 
 scripts, I am unable to write them, so I would very much appreciate any help 
 anybody would be willing to offer me.



 My current data set has 9 columns (in csv / excel file):
 x coordinate, y coordinate, year, month, day, record, time interval (duration 
 between this waypoint and the previous) (hh:dd:ss), summed time intervals, 
 distance interval (m)


 EXAMPLE (24th of april 2007)

 (wp1) x1, y1, 2007, 7, 24, 1, 00:00:00, 00:00:00, 0
 (wp2) x2, y2, 2007, 7, 24, 2, 00:00:23, 00:00:23, 2
 (wp3) x3, y3, 2007, 7, 24, 3, 00:00:50, 00:00:73, 3
 (wp4) x4, y4, 2007, 7, 24, 4, 00:01:20, 00:02:33, 5
 (wp5) x5, y5, 2007, 7, 24, 5, 00:00:03, 00:02:36, 1
 (wp6) x6, y6, 2007, 7, 24, 6, 00:00:12, 00:02:48, 2
 (wp7) x7, y7, 2007, 7, 24, 7, 00:05:45, 00:08:33, 2

 Now I need to change this data set into one with waypoints at regular 
 intervals: for example 2 minutes = 120 seconds  2 minutes after the first 
 waypoint (x1,y1) the baboons would be somewhere between WP3 and WP4 (at WP3 
 sum duration is 73 seconds and after WP4 sum duration is 153 seconds), and so 
 this is where I would like a new waypoint created. Note that there are time 
 intervals which will be so large that multiple 'new' waypoints have to be 
 made / copied (e.g. WP 7 for a 2 minute interval).

 Three ways of calculating the new coordinates for this new waypoint from very 
 precise to not so precise (in order of preference) are:



 1) Basing the new waypoint coordinates on the relative 'time distance' to 
 each waypoint of the two surrounding waypoints (therewith assuming a constant 
 movement during the time interval). The 'time distance' bewteen WP3 and WPnew 
 is (120-73) 47 seconds and the 'time distance' between WPnew and WP4 is 
 (153-120) 33 seconds (whereas total time interval between WP3 and WP4 is 80 
 seconds). WPnew (with coordinates Xnew,Ynew) should then be located at 
 80/33*100=41.25% from WP3: Xnew = X3 + (X4-X3)*41.25% and Ynew= Y3 + 
 (Y4-Y3)*41.25%


 2) Calculate the average location (average of x3,y3 and x4,y4), at which to 
 create a new waypoint at 2 minutes.


 3) A simpler alternative is that the location of the 'closer waypoint in 
 time', in this example WP 4, (WP4: 153-120=33 versus WP3: 120-73=47) could be 
 copied as being the new location.



 I hope I explained my query so that it makes sense to you. I realize I am 
 asking a 'big' question and apologize for this - the software programs I have 
 thorough knowledge of (ArcGIS, excel, Mapsource), are all unable to solve 
 this problem. I would be very grateful for any advice or suggestions.

 Best wishes,
 Louise

 [[alternative HTML version deleted]]

 __

Re: [R] About 'hazard ratio', urgent~~~

2011-06-14 Thread Mike Marchywka












 Date: Tue, 14 Jun 2011 09:11:25 -0700
 From: dr.jz...@gmail.com
 To: r-help@r-project.org
 Subject: Re: [R] About 'hazard ratio', urgent~~~

 Thanks a lot for the great help~


well, if you are referring to my post, as I indicated I made a lot of blunders 
to get that out the
door and posted the first thing that made sense. Note also that by
making the one hazard rate 1 it wasn't clear that the column
I cited was divided by 1 etc. Caveat emptor. I usually
just answer questions like this when I meant to look at that
anyway and I have a chance of putting forth an answer which
I and others can verify. In any case, if you are using
some new code it always helps to run known test cases through
it, even if it is just new to you LOL.






 --
 View this message in context: 
 http://r.789695.n4.nabble.com/About-hazard-ratio-urgent-tp3595527p3597025.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] smoothScatter function (color density question) and adding a legend

2011-06-13 Thread Mike Marchywka











 Date: Sat, 11 Jun 2011 21:52:21 -0400
 From: ccoll...@purdue.edu
 To: r-help@r-project.org
 Subject: [R] smoothScatter function (color density question) and adding a 
 legend

 Dear R experts,

 I am resending my questions below one more time
 just in case someone out there could help
 but missed my email.

Thanks, I was curious about this and so I installed the geneplot package
and that went fine but I had another issue and need to take care of that.
First, see if ?legend helps at all ( no idea, I have not used this stuff 
much).
Also source code should be available for the pacakge you care about.


On quick read, it sounds like this uses a binning system for colors. If you
want something slightly different and much slower, I have some examples
( IIRC) that calculate densities in R using something similar to electric
field calculation around points, ( again I'm in a hurry and pulling this
from archive so caveat emptor this plot may not correspond exactly to
example script below etc), 


http://98.129.232.232/coloumb1.pdf


mydensity-function(x1,x2)
{

len=length(x1);
z-1:len;
for ( y in 1:len )
{

nx=c(1:(y-1),(y+1):len); 
if ( y==1) {nx=2:len;} 
if (y==len) {nx=1:(len-1); } 
# coloumb is a bit much with overlapping data points, so limite it a bit
z[y]=sum(1.0/(.001*1+(x1[nx]-x1[y])*(x1[nx]-x1[y])+(x2[nx]-x2[y])*(x2[nx]-x2[y])));
 
#for
}
print(max(z));
print(min(z));
z=z-min(z);
z=100*z/max(z);
hist(z);
color=rainbow(100);
#color=heat.colors(10);
tmap=color[floor(z)+1];
scatterplot3d(x1,x2,z,color=tmap);
plot(x2,x1,col=tmap,cex=.5)
#library(VecStatGraphs2D)
#DrawDensityMap(x1,x2,PaintPoint=TRUE)
#color=terrain.colors(26)
#color=heat.colors(26)

}






 I don't think my questions are too hard.

 I am most concerned about the transformation function.
 See below.

 Thanks,
 Clayton

 Hello,

 I have a few questions, regarding the smoothScatter function.


 I have a scatter plot with more than 500,000 data points for
 two samples. So, I am wanting to display the density in colors
 to convince people that my good correlation coefficient is not
 due to an influential point effect and plus, I also want to make
 my scatter plot look pretty. Anyway ...

 I have been able to make the plot, but I have a couple of questions about it.
 I also want to make a legend for it, which I don't know how to do.
 I have only been playing around with R for a few weeks now off and on.
 So, I am definitely a beginner.


 1.
 I have 10 colors for my plot: white representing zero density and dark red 
 representing the maximum density (I presume).
 According to the R documentation, the transformation argument represents the 
 function mapping the density scale to the color scale.
 Note that in my R code below, transformation = function(x) x^0.14.
 I was wondering how exactly or mathematically that this function relates the 
 density to color.
 I believe the colorRampPalette ramps colors between 0 and 1. I am not sure if 
 x represents the color or the
 density. Since I have 10 colors, I suppose the colorRampPalette would assign 
 values of
 0, 0.11, 0.22, 0.33, 0.44, 0.55, 0.66, 0.77, 0.88, and 1 for white to dark 
 red. I am not sure though.
 Does anyone know how this works? I am sure its not too too complicated.


 2.
 In a related issue, I also would like to make a legend for this plot. Then, I 
 would be able to see the transformation
 function's effects on the color-density relationship. Could someone help me 
 in making a legend for my smoothScatter plot? I would like to place it 
 immediately to the right of my plot as a vertical bar, matching the vertical 
 length of the plot as is often convention.

 I really like the smoothScatter function. It is easy to use, and I believe 
 it's relatively new.

 Thanks in advance.

 -Clayton

  clayton - c(white, lightblue, blue, darkgreen, green, yellow, 
  orange, darkorange, red, darkred)
  x - read.table(c:/users/ccolling/log_u1m1.txt, sep =\t)
  smoothScatter(x$V8~x$V7,
 nbin=1000,
 colramp = colorRampPalette(c(clayton)),
 nrpoints=Inf,
 pch=,
 cex=.7,
 transformation = function(x) x^.14,
 col=black,
 main=M1 vs. M3 (r = 0.92),
 xlab=Normalized M1,
 ylab=Normalized M2,
 cex.axis = 1.75,
 cex.lab = 1.25
 cex.main = 2)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RES: Linear multivariate regression with Robust error

2011-06-11 Thread Mike Marchywka
















 Date: Fri, 10 Jun 2011 16:50:24 -0300
 From: filipe.bote...@vpar.com.br
 To: frien...@yorku.ca; bkkoc...@gmail.com
 CC: r-help@r-project.org
 Subject: [R] RES: Linear multivariate regression with Robust error



 --Forwarded Message Attachment--
 Subject: RES: [R] Linear multivariate regression with Robust error
 Date: Fri, 10 Jun 2011 16:50:24 -0300
 From: filipe.bote...@vpar.com.br
 To: frien...@yorku.ca; bkkoc...@gmail.com
 CC: r-help@r-project.org


 Hi Barjesh,

 I am not sure which data you analyze, but once I had a similar situation
 and it was a multicolinearity issue. I realized this after finding a
 huge correlation between the independent variables, then I dropped one
 of them and signs of slopes made sense.

 Beforehand, have a glance at your correlation matrix of independent
 variables to see the relationship between them.

I guess look at the data ( LATFD ) seems to be the recurring solution.
Certainly with polynomial regression, there's a tendency to think that
the fit parameter is a pure property of the system being examined. With
something like a Taylor series, and you have a slope at a specific point,
maybe you could think about it that way but here the coefficient
is just whatever optimizes your (arbitrary ) error function for the data
you have. A linear approaximation to a non-line could be made at
one point and maybe that property should remain constant but 
you have people claiming  past authors samples some curve near x=a
and got a different slope than my work largely smapling f(x) around
x=b what is wrong with my result? It may be interesting to try to
write a taylor series around some point and see how those coefficients
vary with data sets for example( you still need arbitrary way to
estimate slopes and simply differencing two points may be a bit noisy LOL 
but you could play with some wavelet families maybe ). 

If you try to describe a fruit as a linear combination of vegetables,
you may get confusing but possibly useful results even if they don't correspond 
to
properties of the fruits so much as a specific need you have. For example, if 
you
are compressing images of fruits and your decoder already has a dictionary
of vegetables, it may make sense to do this.  This is not much different from 
trying to compress non-vocal music with an ACELP codec that attempts to fit the 
sounds
to models of human vocal tract. Sometimes this may be even be informative
about how a given sound was produced even if it sounds silly. 













 Not sure how helpful my advice will be, but it did the trick for me back
 then ;)

 Cheers,
 Filipe Botelho

 -Mensagem original-
 De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Em nome de Michael Friendly
 Enviada em: sexta-feira, 10 de junho de 2011 12:51
 Para: Barjesh Kochar
 Cc: r-help@r-project.org
 Assunto: Re: [R] Linear multivariate regression with Robust error

 On 6/10/2011 12:23 AM, Barjesh Kochar wrote:
  Dear all,
 
  i am doing linear regression with robust error to know the effect of
  a (x) variable on (y)other if i execute the command i found positive
  trend.
  But if i check the effect of number of (x.x1,x2,x3)variables
  on same (y)variable then the positive effect shwon by x variable turns
  to negative. so plz help me in this situation.
 
  Barjesh Kochar
  Research scholar
 
 You don't give any data or provide any code (as the posting guide
 requests) , so I have to guess that you
 have just rediscovered Simpson's paradox -- that the coefficient of a
 variable in a marginal regression can have an opposite sign to that in
 a joint model with other predictors. I have no idea what you mean
 by 'robust error'.

 One remedy is an added-variable plot which will show you the partial
 contributions of each predictor in the joint model, as well as whether
 there are any influential observations that are driving the estimated
 coefficients.


 --
 Michael Friendly Email: friendly AT yorku DOT ca
 Professor, Psychology Dept.
 York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
 4700 Keele Street Web: http://www.datavis.ca
 Toronto, ONT M3J 1P3 CANADA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 This message and its attachments may contain confidential and/or
 privileged information. If you are not the addressee, please, advise
 the sender immediately by replying to the e-mail and delete this
 message. Este mensaje y sus anexos pueden contener información
 confidencial o privilegiada. Si ha recibido este e-mail por error por
 favor bórrelo y envíe un mensaje al remitente. Esta mensagem e seus
 anexos podem conter informação confidencial ou privilegiada. Caso não
 seja o destinatário, solicitamos a imediata notificação ao

Re: [R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Mike Marchywka

 Date: Sat, 11 Jun 2011 13:03:10 +0200
 From: lui.r.proj...@googlemail.com
 To: r-help@r-project.org
 Subject: [R] Amazon AWS, RGenoud, Parallel Computing

 Dear R group,

[...]

 I am a little bit puzzled now about what I could do... It seems like
 there are only very limited options for me to increase the
 performance. Does anybody have experience with parallel computations
 with rGenoud or parallelized sorting algorithms? I think one major
 problem is that the sorting happens rather quick (only a few hundred
 entries to sort), but needs to be done very frequently (population
 size 2000, iterations 500), so I guess the problem with the
 housekeeping of the parallel computation deminishes all benefits.

Your sort is part of algorithm or you have to sort results after 
getting then back out of order from async processes? One of 
my favorite anecdotes is how I used a bash sort on huge data
file to make program run faster ( from impractical zero percent CPU
to very fast with full CPU usage and you complain about exactly
a lack of CPU saturation). I guess a couple of comments. First, 
if you have specialized apps you need optimized, you may want
to write dedicated c++ code. However, this won't help if
you don't find the bottleneck. Lack of CPU saturation could
easily be due to waiting for stuff like disk IO or VM
swap. You really ought to find the bottle neck first, it
could be anything ( except the CPU maybe LOL). The sort
that I used prevented VM thrashing with no change to the app
code- the app got sorted data and so VM paging became infrequent.
If you can specify the problem precisely you may be able to find
a simple solution. 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Amazon AWS, RGenoud, Parallel Computing

2011-06-11 Thread Mike Marchywka













 Date: Sat, 11 Jun 2011 19:57:47 +0200
 Subject: Re: [R] Amazon AWS, RGenoud, Parallel Computing
 From: lui.r.proj...@googlemail.com
 To: marchy...@hotmail.com
 CC: r-help@r-project.org

 Hello Mike,

[[elided Hotmail spam]]
 Best to my knowledge the sort algorithm implemented in R is already
 backed by C++ code and not natively written in R. Writing the code
 in C++ is not really an option either (i think rGenoud is also written
 in C++). I am not sure whether there really is a bottleneck with
 respect to the computer - I/O is pretty low, plenty of RAM left etc.
 It really seems to me as if parallelizing is not easily possible or
 only at high costs so that the benefits diminish through all the
 coordination and handling needed...
 Did anybody use rGenoud in cluster mode an experience sth similar?
 Are quicksort packages available using multiple processors efficiently
 (I didnt find any... :-( ).

I'm no expert but these don't seem to be terribly subtle problems
in most cases. Sure, if the task is not suited to parallelism and
you force it to be parallel and it spends all its time syncing
up, that can be a problem. Just making more tasks to fight over
the bottle neck- memory, CPU, locks- can easily make things worse.
I think I posted my link earlier on IEEE blurb showing 
how easy it is for many cores to make things worse on non-contrived
benchmarks.





 I am by no means an expert on parallel processing, but is it possible,
 that benefits from parallelizing a process greatly diminish if a large
 set of variables/functions need to be made available and the actual
 function (in this case sorting a few hundred entries) is quite short
 whereas the number of times the function is called is very high!? It
 was quite striking that the first run usually took several hours
 (instead of half an hour) and the subsequent runs were much much
 faster..

 There is so much happening behind the scenes that it is a little
 hard for me to tell what might help - and what will not...

 Help appreciated :-)
 Thank you

 Lui

 On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka  wrote:
 
 
 
 
  
  Date: Sat, 11 Jun 2011 13:03:10 +0200
  From: lui.r.proj...@googlemail.com
  To: r-help@r-project.org
  Subject: [R] Amazon AWS, RGenoud, Parallel Computing
 
  Dear R group,
 
 
  [...]
 
  I am a little bit puzzled now about what I could do... It seems like
  there are only very limited options for me to increase the
  performance. Does anybody have experience with parallel computations
  with rGenoud or parallelized sorting algorithms? I think one major
  problem is that the sorting happens rather quick (only a few hundred
  entries to sort), but needs to be done very frequently (population
  size 2000, iterations 500), so I guess the problem with the
  housekeeping of the parallel computation deminishes all benefits.
 
  Your sort is part of algorithm or you have to sort results after
  getting then back out of order from async processes? One of
  my favorite anecdotes is how I used a bash sort on huge data
  file to make program run faster ( from impractical zero percent CPU
  to very fast with full CPU usage and you complain about exactly
  a lack of CPU saturation). I guess a couple of comments. First,
  if you have specialized apps you need optimized, you may want
  to write dedicated c++ code. However, this won't help if
  you don't find the bottleneck. Lack of CPU saturation could
  easily be due to waiting for stuff like disk IO or VM
  swap. You really ought to find the bottle neck first, it
  could be anything ( except the CPU maybe LOL). The sort
  that I used prevented VM thrashing with no change to the app
  code- the app got sorted data and so VM paging became infrequent.
  If you can specify the problem precisely you may be able to find
  a simple solution.
 
 
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error with geomap in googleVis

2011-06-10 Thread Mike Marchywka

 Subject: RE: [R] error with geomap in googleVis
 Date: Fri, 10 Jun 2011 11:06:47 +0100
 From: markus.gesm...@lloyds.com
 To: marchy...@hotmail.com; r-h...@stat.math.ethz.ch

 Hi Mike,

 I believe you are trying to put two charts on the same page.
 The demo(googleVis) provides you with an example for this.
 So applying the same approach to your data sets would require:

Thanks, I thought however I tried to different examples the first
being my own which failed and  then the Fruits example as a second
test. I was lazy and used install package from probably wrong mirror
as in the past on 'dohs I had problems with CMD INSTALL. I reloaded
as per your suggestion and tried first example. The output did
change but it is still lacking a chart. Do I need API key or something?
Ive never used google visualization before and was just reacting to OP
as I had wanted to try it. Process is below, final result may
be here, http://98.129.232.232/xxx.html , but I will be varying as
I get time. Thanks again.

--2011-06-10 05:17:47--  http://cran.r-project.org/src/contrib/googleVis_0.2.5.t
ar.gz
Resolving cran.r-project.org... 137.208.57.37
Connecting to cran.r-project.org|137.208.57.37|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Fri, 10 Jun 2011 10:17:47 GMT
  Server: Apache/2.2.9 (Debian)
  Last-Modified: Tue, 07 Jun 2011 06:35:56 GMT
  ETag: a83451-b36e9-4a5196ea64b00
  Accept-Ranges: bytes
  Content-Length: 734953
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Content-Type: application/x-gzip
Length: 734953 (718K) [application/x-gzip]
Saving to: `googleVis_0.2.5.tar.gz'

100%[==] 734,953  527K/s   in 1.4s

2011-06-10 05:17:49 (527 KB/s) - `googleVis_0.2.5.tar.gz' saved [734953/734953]

[marchywka@351915-www1 downloads]$ R CMD INSTALL googleVis_0.2.5.tar.gz
* installing to library `/usr/local/lib64/R/library'
* installing *source* package `googleVis' ...
** R
** data
**  moving datasets to lazyload DB
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded

* DONE (googleVis)
[marchywka@351915-www1 downloads]$ R

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 library(googleVis)
df-data.frLoading required package: RJSONIO
ame(foo=c(BrazilTo suppress the following message use the statement:
suppressPackageStartupMessages(library(googleVis))

Welcome to googleVis version 0.2.5

Type ?googleVis to access the overall documentation and
vignette('googleVis') for the package vignette.
You can execute the demo of the package via: demo(googleVis)

More information is available on the googleVis project web-site:
http://code.google.com/p/google-motion-charts-with-r/

Please read also the Google Visualisation API Terms of Use:
http://code.google.com/apis/visualization/terms.html

Feel free to send us an email rvisualisat...@gmail.com
if you would like to be keept informed of new versions,
or if you have any feedback, ideas, suggestions or would
like to collaborate.

,C df-data.frame(foo=c(Brazil,Canada), bar=c(123,456))

 map1-gvisGeoMap(df,locationvar='foo',numvar='bar')
 cat(map1$html)
Error in cat(list(...), file, sep, fill, labels, append) :
  argument 1 (type 'list') cannot be handled by 'cat'
 cat(map1$html$header)
!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN
    http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
html xmlns=http://www.w3.org/1999/xhtml;
head
  titleGeoMapID23b504e5/title
  meta http-equiv=content-type content=text/html;charset=utf-8 /
  style type=text/css
    body {
  color: #44;
  font-family: Arial,Helvetica,sans-serif;
  font-size: 75%;
    }
    a {
  color: #4D87C7;
  text-decoration: none;
    }
  /style
/head
body
 cat(map1$html$header,file=xxx.html,appeand=F)
 cat(map1$html$chart,file=xxx.html,appeand=t)
Error in cat(list(...), file, sep, fill, labels, append) :
  argument 2 (type 'closure') cannot be handled by 'cat'
 str(map1$html)
List of 4
 $ header : chr !DOCTYPE html PUBLIC \-//W3C//DTD XHTML 1.0 Strict//EN\\n
 \http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\;\nht| __truncated__

 $ chart  : Named chr [1:7] !-- GeoMap generated in R 2.13.0 by googleVis

Re: [R] Linear multivariate regression with Robust error

2011-06-10 Thread Mike Marchywka














 Date: Fri, 10 Jun 2011 09:53:20 +0530
 From: bkkoc...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Linear multivariate regression with Robust error

 Dear all,

 i am doing linear regression with robust error to know the effect of
 a (x) variable on (y)other if i execute the command i found positive
 trend.
 But if i check the effect of number of (x.x1,x2,x3)variables
 on same (y)variable then the positive effect shwon by x variable turns
 to negative. so plz help me in this situation.

take y as goodness and x and x1 have something to do with
a product. The first analysis is from company A, second is from company
B and the underlying relationship is given with some noise LOL, 
( I'm still on first cup of cofee, this was fist example to
come to mind as these question keep coming up here everyday )

 x=1:100
 x1=x*x
 y-x-x1+runif(100)
 lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)    x
   1718 -100

 lm(y~x+x1)

Call:
lm(formula = y ~ x + x1)

Coefficients:
(Intercept)    x   x1
 0.5253   1.0024  -1.














 Barjesh Kochar
 Research scholar

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Resources for utilizing multiple processors

2011-06-09 Thread Mike Marchywka











 From: rjeffr...@ucla.edu
 Date: Wed, 8 Jun 2011 20:54:45 -0700
 To: r-help@r-project.org
 Subject: [R] Resources for utilizing multiple processors

 Hello,

 I know of some various methods out there to utilize multiple processors but
 am not sure what the best solution would be. First some things to note:
 I'm running dependent simulations, so direct parallel coding is out
 (multicore, doSnow, etc).

the
 *nix languages.

Well, for the situation below you seem to want a function
server. You could consider Rapache and just write this like a big
web application. A web server, like a DB, is not the first thing
you think of with high performance computing but if your computationally
intenstive tasks are in native code this could be a reasoanble
overhead that requires little learning. 

If you literally means cores instead of machines keep in mind
that cores can end up fighting over resources, like memory
( this cites IEEE article with cores making things worse
in non-contrived case)

http://lists.boost.org/boost-users/2008/11/42263.php


I think people have mentioned some classes like bigmemory, I forget
the names exactly, that let you handle larger things. Launching a bunch
of threads and letting VM thrash can easily make things slower quickly.

I guess a better approach would be to get an implementation that is
block oriented and you can do the memory/file stuff in R until
they get a data frame that uses disk transparently and with hints on
expected access patterns ( prefetch etc). 




 My main concern deals with Multiple analyses on large data sets. By large I
 mean that when I'm done running 2 simulations R is using ~3G of RAM, the
 remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic
 to compare the two resulting samples, grinding the process to a halt. I'd
 like to have separate cores simultaneously run each analysis. That will save
 on time and I'll have to ponder the BGR calculation problem another way. Can
 R temporarily use HD space to write calculations to instead of RAM?

 The second concern boils down to whether or not there is a way to split up
 dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1
 to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate
 b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and
[[elided Hotmail spam]]


 So if anyone has any suggestions as to a direction I can look into, it would
 be appreciated.


 Robin Jeffries
 MS, DrPH Candidate
 Department of Biostatistics
 UCLA
 530-633-STAT(7828)

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error with geomap in googleVis

2011-06-09 Thread Mike Marchywka

 To: r-h...@stat.math.ethz.ch
 From: mjphi...@tpg.com.au
 Date: Wed, 8 Jun 2011 10:14:01 +
 Subject: Re: [R] error with geomap in googleVis

 SNV Krishna  primps.com.sg writes:

  Hi All,

  I am unable to get the plot geomap in googleVis package. data is as follows

   head(index.ret)
  country ytd
  1 Argentina -10.18
  2 Australia -3.42
  3 Austria -2.70
  4 Belgium 1.94
  5 Brazil -7.16
  6 Canada 0.56

   map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar = 'ytd')
   plot(map1)

  But it just displays a blank page, showing an error symbol at the right
  bottom corner. I tried demo(googleVis), it also had a similar problem. The
  demo showed all other plots/maps except for those geomaps. Could any one
  please hint me what/where could be the problem? Many thanks for the idea and
  support.

I had never used this until yesterday but it seems to generate html.
I didn't manage to get a chart to display but if you are familiar
with this package and html perhaps you could look at map1$html
and see if anything is obvious. One great thing about html/js
is that it is human readable and you can integrate it
well with other page material without much in the way of special
tools. 

  Regards,

  SNV Krishna

  [[alternative HTML version deleted]]

 Hi All,

 I have also encountered this problem. I have tested the problem in Windows XP

3.0. I
 have latest java and flash and I have tried both Firefox and IE (both latest

k just
 fine.

 I too would like to know how to solve this problem.

 Kind regards,

 Michael Phipps

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] any documents

2011-06-09 Thread Mike Marchywka














 Date: Thu, 9 Jun 2011 02:21:21 -0700
 From: wa7@gmail.com
 To: r-help@r-project.org
 Subject: [R] any documents

 Hi,

 I'm doing a textual analysis of several articles discussing the evolution of
 prices in order to give a forecast. if someone can give me a clear approach
 to this knowing that I work on the package tm.

LOL, are you talking about the computer generated analysis such as the thin
text platititudes around bandwagon stats such as  is trading xx above 30 day
moving average  etc etc. This sounds funny but is actually an interesting
test case as the hidden structured nature of the documents should be easier to 
analyse
than, say, poetry. The field is very much researchy AFAIK and you will need
to define an algorithm of do a literature search to get much in the
way of helpful response beyond ?tm  Absent that, you are almost asking for
someone to invent an algorithm. I've refered many posters to terms like
computational lingquistics but people who have used these kinds
of things don't seem to post here much. If you can give us more details
or source article maybe someone can point you in useful direction. 






 Thank you very much


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/any-documents-tp3584961p3584961.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error with geomap in googleVis

2011-06-09 Thread Mike Marchywka



I still got blanks with Firefox with the two examples below, I put html
up here if you want to look at it,

http://98.129.232.232/xxx.html

I just downloaded googlevis from mirror 68 and it claimed it
was 0.2.5 ( I thought, but maybe I should check again). 


install.packages(googleVis,dep=T)
library(googleVis)
df-data.frame(foo=c(Brazil,Canada), bar=c(123,456))
map1-gvisGeoMap(df,locationvar='foo',numvar='bar')

cat(map1$html$header,filename=xxx.html,append=F)
cat(map1$html$header,file=xxx.html,append=F)
cat(map1$html$chart,file=xxx.html,append=T)
cat(map1$html$caption,file=xxx.html,append=T)
cat(map1$html$footer,file=xxx.html,append=T)
m-gvisMotionChart(Fruits,idvar=Fruit,timevar=Year)
str(m)
cat(m$html$header,file=xxx.html,append=F)
cat(m$html$chart,file=xxx.html,append=T)
cat(m$html$caption,file=xxx.html,append=T)
cat(m$html$footer,file=xxx.html,append=T)




 Subject: RE: [R] error with geomap in googleVis
 Date: Thu, 9 Jun 2011 14:06:22 +0100
 From: markus.gesm...@lloyds.com
 To: marchy...@hotmail.com; mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch

 Hi all,

 This issue occurs with googleVis 0.2.4 and RJSONIO  0.7.1.
 Version 0.2.5 of the googleVis package has been uploaded to CRAN two
 days ago and should have fixed this issue.
 Can you please try to update to that version, e.g. from
 http://cran.r-project.org/web/packages/googleVis/

 Further version 0.2.5 provides new interfaces to more interactive Google
 charts:
 - gvisLineChart
 - gvisBarChart
 - gvisColumnChart
 - gvisAreaChart
 - gvisScatterChart
 - gvisPieChart
 - gvisGauge
 - gvisOrgChart
 - gvisIntensityMap

 Additionally a new demo 'AnimatedGeoMap' has been added which shows how
 a Geo Map can be animated with additional JavaScript. Thanks to Manoj
 Ananthapadmanabhan and Anand Ramalingam, who provided the idea and
 initial code.

 For more information and examples see:
 http://code.google.com/p/google-motion-charts-with-r/

 I hope this helps

 Markus

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Mike Marchywka
 Sent: 09 June 2011 11:19
 To: mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch
 Subject: Re: [R] error with geomap in googleVis



 
  To: r-h...@stat.math.ethz.ch
  From: mjphi...@tpg.com.au
  Date: Wed, 8 Jun 2011 10:14:01 +
  Subject: Re: [R] error with geomap in googleVis
 
  SNV Krishna primps.com.sg writes:
 
  
   Hi All,
  
   I am unable to get the plot geomap in googleVis package. data is as
   follows
  
head(index.ret)
   country ytd
   1 Argentina -10.18
   2 Australia -3.42
   3 Austria -2.70
   4 Belgium 1.94
   5 Brazil -7.16
   6 Canada 0.56
  
map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar =
'ytd')
plot(map1)
  
   But it just displays a blank page, showing an error symbol at the
   right bottom corner. I tried demo(googleVis), it also had a similar
   problem. The demo showed all other plots/maps except for those
   geomaps. Could any one please hint me what/where could be the
   problem? Many thanks for the idea and support.


 I had never used this until yesterday but it seems to generate html.
 I didn't manage to get a chart to display but if you are familiar with
 this package and html perhaps you could look at map1$html and see if
 anything is obvious. One great thing about html/js is that it is human
 readable and you can integrate it well with other page material without
 much in the way of special tools.








  
   Regards,
  
   SNV Krishna
  
   [[alternative HTML version deleted]]
  
  
 
  Hi All,
 
  I have also encountered this problem. I have tested the problem in


 3.0. I
  have latest java and flash and I have tried both Firefox and IE (both
  latest

 k just
  fine.
 
  I too would like to know how to solve this problem.
 
  Kind regards,
 
  Michael Phipps
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 **
 The information in this E-Mail and in any attachments is CONFIDENTIAL and may 
 be privileged.
 If you are NOT the intended recipient, please destroy this message and notify 
 the sender immediately.
 You should NOT retain, copy or use this E-mail for any purpose, nor disclose 
 all or any part of its
 contents to any other person or persons.

 Any views expressed in this message are those of the individual sender, 
 EXCEPT where

Re: [R] Can we prepare a questionaire in R

2011-06-08 Thread Mike Marchywka

 Date: Wed, 8 Jun 2011 12:37:33 +0530
 From: ammasamri...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Can we prepare a questionaire in R

 Is there a way to prepare a questionnaire in R like html forms whose data
 can be directly populated into R?

I've started to use Rapache although Rserve would also be an option.
When installed on server, your can point your html form to an rhtml
page and get the form variables in R just as with other web languages.
For writing html output, I had been using R2HTML ( see code excerpt below
from rhtml page). I had also found Cairo works ok if you don't
need X-11 for anything. 

For your specific situation however, it may be easier to just
use whatever you already have and just use R for the data analysis.
When a request for a results report is made,  send that to Rapache for
example. I would mention that I have gone to running two versions
of Apache, one with R and one with PHP, to allow for security and
easier development( I can write the php to fail nicely if the R apache
server is not up and no new security issues are exposed). 

library(Cairo)
library(R2HTML)
library(RColorBrewer)

You can of course also generate html from normal R commands, or for that
matter bash scripts etc.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RgoogleMaps Axes

2011-06-08 Thread Mike Marchywka















 Date: Tue, 7 Jun 2011 09:50:10 -0700
 From: egregory2...@yahoo.com
 To: r-help@r-project.org
 Subject: [R] RgoogleMaps Axes

 R Help,


 I posted a question on StackOverflow yesterday regarding an issue I've been 
 having with the RgoogleMaps packages' displaying of axes. Here is the text of 
 that submission:
 http://stackoverflow.com/questions/6258408/rgooglemaps-axes

 I can't find any documentation of the following problem I'm having with the 
 axis labels in RGoogleMaps:
 library(RgoogleMaps)
 datas -structure(list(LAT =c(37.875,37.925,37.775,37.875,37.875),
 LON =c(-122.225,-122.225,-122.075,-122.075,-122.025)),
 .Names=c(LAT,LON),class=data.frame,
 row.names =c(1418L,1419L,1536L,1538L,1578L))
 # Get bounding box.
 boxt -qbbox(lat =datas$LAT,lon =datas$LON)
 MyMap-GetMap.bbox(boxt$lonR,boxt$latR,destfile =Arvin12Map.png,
 maptype =mobile)
 PlotOnStaticMap(MyMap,lat =datas$LAT,lon =datas$LON,
 axes =TRUE,mar =rep(4,4))


I haven't gotten too far as I had to download and install proj4 and rgdal
but I did get a transofrm related warning. Did you get warnings?


 MyMap-GetMap.bbox(boxt$lonR,boxt$latR,destfile =Arvin12Map.png,maptype =m
obile)
[1] http://maps.google.com/maps/api/staticmap?center=37.85,-122.125zoom=12siz
e=640x640maptype=mobileformat=png32sensor=true
Loading required package: rgdal
Loading required package: sp
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.8.0, released 2011/01/12
Path to GDAL shared files: /usr/local/share/gdal
Loaded PROJ.4 runtime: Rel. 4.7.1, 23 September 2009
Path to PROJ.4 shared files: (autodetected)
Warning message:
In readGDAL(destfile, silent = TRUE) : GeoTransform values not available



 PlotOnStaticMap(MyMap,lat =datas$LAT,lon =datas$LON,
+  axes =TRUE,mar =rep(4,4))
List of 6
 $ lat.center: num 37.8
 $ lon.center: num -122
 $ zoom  : num 12
 $ myTile: int [1:640, 1:640] 968 853 855 969 1033 888 855 884 888 995 ...
  ..- attr(*, COL)= chr [1:1132] #00 #020201 #020202 #030302 ...
  ..- attr(*, type)= chr rgb
 $ BBOX  :List of 2
  ..$ ll: num [1, 1:2] 37.8 -122.2
  .. ..- attr(*, dimnames)=List of 2
  .. .. ..$ : chr Y
  .. .. ..$ : chr [1:2] lat lon
  ..$ ur: num [1, 1:2] 37.9 -122
  .. ..- attr(*, dimnames)=List of 2
  .. .. ..$ : chr Y
  .. .. ..$ : chr [1:2] lat lon
 $ url   : chr google
NULL
[1] -291.2711  291.2711
[1] -276.5158  276.7972



 When I run this on my computer the horizontal axis ranges from 300W to 60E, 
 but the ticks in between aren't linearly spaced (300W, 200W, 100W, 0, 100E, 
 160W, 60W). Also, the vertical axis moves linearly from 300S to 300N. It 
 seems that no matter what data I supply for datas, the axes are always 
 labeled this way.
 My question is:
 1. Does this problem occur on other machines using this code?
 2. Does anyone have an explanation for it?
 and
 3. Can anybody suggest a way to get the correct axes labels (assuming these 
 are incorrect, but maybe i'm somehow misinterpreting the plot!)?
 Thank you for your time.

 There has been no answer other than that I ought to contact the package 
 maintainer. Since it would be nice to have a publicly displayed solution, I 
 opted to post here first before doing so. Does anyone have any insight to 
 share?
 Thank you very much for your time,

 -Erik Gregory
 CSU Sacramento, Mathematics
 Student Assistant, California Environmental Protection Agency

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Regression

2011-06-07 Thread Mike Marchywka

 Date: Tue, 7 Jun 2011 01:38:32 -0700
 From: farah.farid@student.aku.edu
 To: r-help@r-project.org
 Subject: [R] Logistic Regression

 I am working on my thesis in which i have couple of independent variables
 that are categorical in nature and the depndent variable is dichotomus.
 Initially I run univariate analysis and added the variables with significant
 p-values (p0.25) in my full model.
 I have three confusions. Firstly, I am looking for confounding variables by

I'm not sure what your thesis is about, some system that you are strying
by statistics or maybe the thesis is about statistics, but
according to this disputed wikipedia entry,

http://en.wikipedia.org/wiki/Confounding

confounding or extraneous is determined by the reality of your system.
It may help to consider factors related to that and use the statistics
to avoid fooling yourself. Look at the pictures ( non-pompous way of saying
look at graphs and scatter plots for some ideas to test ) and then test various
ideas. You see bad cause/effect inferences all the time in many fields- 
from econ to biotech ( although anecdotes suggest these mistakes usually
favour the sponsors LOL). Consider some mundane known examples about
what your data would look like and see if that relates to what you have.
If you were naively measuring car velocity 
at a single point in front of traffic light and color of light, what might you
observe ( much like with an earlier example on iron in patients, there are a 
number
of more precisely defined measurements you could take on a given thing.).

If your concern is that  I ran test A and it said B but test C said D and
D seems inconsistent with B it generally helps to look at assumptions and 
detailed
equations for each model and explore what those mean with your data. With 
continuous
variables anyway, non-monotonic relationships can easily destroy a correlation
even with strong causality. 

 using formula (crude beta-cofficient - adjusted beta-cofficient)/ crude
 beta-cofficient x 100 as per rule if the percentage of any variable is 10%
 than I have considered that as confounder. I wanted to know that from
 initial model i have deducted one variable with insignificant p-value to
 form adjusted model. Now how will i know if the variable that i deducted
 from initial model was confounder or not?
 Secondly, I wanted to know if the percentage comes in negative like
 (-17.84%) than will it be considered as confounder or not? I also wanted to
 know that confounders should be removed from model? or should be kept in
 model?
 Lastly, I wanted to know that I am running likelihood ratio test to identify
 if the value is falling in critical region or not. So if the value doesnot
 fall in critical region than what does it show? what should I do in this
 case? In my final reduced model all p-values are significant but still the
 value identified via likelihood ratio test is not falling in critical
 region. So what does that show?

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Logistic-Regression-tp3578962p3578962.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Populating values from html

2011-06-07 Thread Mike Marchywka










 Date: Tue, 7 Jun 2011 03:35:46 -0700
 From: ammasamri...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Populating values from html

 can we populate values into an excel sheet from html forms that has to be
 used in R for data analysis

 Can we directly retireve the values from html forms into R fro analysis


Judging from the way many websites offer data, you'd think that jpg is
the best means for getting it LOL. html, pdf, and image formats
are designed for human consumption of limited aspects of a data set.
Normally you would prefer something closer to raw data like csv.
After having visited yet another public website that offers
data in pdf ( YAPWTODIP) I would suggest you first contact the
site and ask them to present data in a form which allows it to
be easily examined in an open source environment ( note this criterion  does not
make Excel a preferred choice either). If you have to scrape web pages,
apparently there are some R facilities but depending on the page you
may need to do a lot of work and it will not  survive if the page
is redone for artistic reasons etc. See any of these results for example,

http://www.google.com/search?hl=enq=cran+page+scraping









 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Populating-values-from-html-tp3579215p3579215.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identifying sequences

2011-06-01 Thread Mike Marchywka

 Date: Wed, 1 Jun 2011 17:12:29 +0200
 From: cjp...@gmail.com
 To: r-help@r-project.org
 Subject: Re: [R] Identifying sequences

 Thanks to David, Thierry and Jonathan for your help.
 I have been able to put this function together

 a=1:10
 b=20:30
 c=40:50
 x=c(a,b,c)

 seq.matrix - function(x){
 lower- x[which(diff(x) != 1)]
 upper - x[which(diff(x) != 1)+1]
 extremities - c(1,lower, upper,x[length(x)])
 m - 
 data.frame(matrix(extremities[order(extremities)],ncol=2,byrow=TRUE,dimnames=list(rows=paste(group,1:(length(lower)+1),sep=),cols=c(lower,upper
 m$length=m$upper-m$lower+1
 m
 }

 s.m=seq.matrix(x)
 s.m
 lower upper length
 group1 1 10 10
 group2 20 30 11
 group3 40 50 11

 One can then make a test to see if a certain value (say 9) falls
 within one of the groups and use that to find the group name or lower
 or upper border

As I understand, you are looking for large derivatives
or approx discontinuity against smooth signal.
This seems like a natural application for wavelets,
try the haar wavelet and use package 
wavelets,

 library(wavelets)
 f=wt.filter(c(-1,1),modwt=T)
 z-modwt(X=as.numeric(x),filter=f,n.levels=1)
 z@W
$W1
  [,1]
 [1,]   49
 [2,]   -1
 [3,]   -1
 [4,]   -1
 [5,]   -1
 [6,]   -1
 [7,]   -1
 [8,]   -1
 [9,]   -1
[10,]   -1
[11,]  -10
[12,]   -1
[13,]   -1
[14,]   -1
[15,]   -1
[16,]   -1
[17,]   -1
[18,]   -1
[19,]   -1
[20,]   -1
[21,]   -1
[22,]  -10
[23,]   -1
[24,]   -1
[25,]   -1
[26,]   -1
[27,]   -1
[28,]   -1
[29,]   -1
[30,]   -1
[31,]   -1
[32,]   -1

 s.m.test=function(s.m,i){which(s.m[,1]  s.m.test(s.m,i=9)
 [1] 1
 e.g.
 row.names(s.m)[s.m.test(s.m,i=9)]
 [1] group1

 Cheers
 Christiaan

 On 1 June 2011 14:31, Jonathan Daily wrote:

  I am assuming in this case that you are looking for continuity along
  integers, so if you expect noninteger values this will not work.

  You can get the index of where breaks can be found in your example using

  which(diff(x)  1)

  On Wed, Jun 1, 2011 at 6:27 AM, christiaan pauw wrote:
   Hallo Everybody

   Consider the following vector

   a=1:10
   b=20:30
   c=40:50
   x=c(a,b,c)

   I need a function that can tell me that there are three set of continuos
   sequences and that the first is from 1:10, the second from 20:30 and the
   third from 40:50. In other words: a,b, and c.

   regards
   Christiaan

   [[alternative HTML version deleted]]

   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  --
  ===
  Jon Daily
  Technician
  ===
  #!/usr/bin/env outside
  # It's great, trust me.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compiling C-code in Windows

2011-05-31 Thread Mike Marchywka

 Date: Tue, 31 May 2011 11:37:56 -0400
 From: murdoch.dun...@gmail.com
 To: tom.osb...@iinet.net.au
 CC: r-h...@stat.math.ethz.ch; pmi...@ff.uns.ac.rs

 On 31/05/2011 7:50 AM, Tom Osborn wrote:
  You could use cygwin's cc/gcc, or the Watcom opensource compiler.

 Neither of those is supported. Use what the R Admin manual suggests, or
 you're on your own.

I have had fairly good luck with cygwin or mingw but you may need to
have a few conditionals etc. Not sure what R suggests but cygwin should
work.

 Duncan Murdoch

  [Watcom used to be a commercial compiler which ceased and has
  become an open project].

  But listen to people who've experienced the options. [Not I].

  Cheers, Tom.
  - Original Message -
  From: Petar Milin
  To: R-HELP
  Sent: Tuesday, May 31, 2011 9:43 PM

  Hello ALL!
  I am an Linux user (Debian testing i386), with very dusty
  Win-experience. Nevertheless, my colleagues and I are making some
  package in R, and I built C-routines to speed up things.
  I followed instruction how to compile C for R (very useful link:
  http://www.stat.lsa.umich.edu/~yizwang/software/maxLinear/noteonR.html,
  and links listed there). Everything works like a charm in Linux. I have
  *.so and wrapper function from R is doing a right call.
  However, I wanted to make *.dll library for Win-users. Now, I used my
  colleague's computer with Win XP on it, and with the latest R. In MS-DOS
  console, I positioned prompt in 'C;\Program Files\R\R-2.13.0\bin\i386\',
  and then I run: 'R CMD SHLIB C:\trial.c'. However, nothing happened, no
  trial.dll, nothing. Then, I tried with: 'R CMD SHLIB --output=trial.dll
  C:\trial.c', but no luck, again.
  Please, can anyone help me with this? Can I use: 'R CMD SHLIB
  --output=trial.dll C:\trial.c' under Linux, and expecting working DLL?

  Best,
  PM

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] newbie: fourier series for time series data

2011-05-31 Thread Mike Marchywka


( hotmail won't mark text so I'm top posting... )
 
Can you post the data? Personally I'd just plot abs(fft(x))
and see what you see, as well as looking at Im(fft(x))/Re(fft(x))
or phase spectrum. Now, presumably you are nominally looking
for something with a period of 1 year, that part could
be expressed in harmonic specrum as suggested below, but you'd
also be looking for trends and noises of various types- additive
gausssian, amplitude modulation, maybe even frequncy modulation, etc.
I guess you could remove a few power terms ( average, linear, etc)
just to simplify ( often you get this spectrum with huge uniformative
DC or zero frequency component just gets in the way). You can probably
find book online dealing with signal processing and RF ( this is where
I'm used to seeing this things). It would of course be helpful to 
then examine known simple cases and see if you can tell them apart.
Create fft( sin(t)*(1+a*sin(epsilon*t)+b*t ) for example. 


I guess if you want to look at writing a model, you could look at
phase portrait ( plot derviative versus value ) to again get some idea what 
you may have that makes sense as model to fit.




Date: Tue, 31 May 2011 10:35:16 -0700
From: spencer.gra...@structuremonitoring.com
To: eddie...@gmail.com
CC: r-help@r-project.org
Subject: Re: [R] newbie: fourier series for time series data


On 5/31/2011 5:12 AM, eddie smith wrote:
 Hi Guys,

 I had a monthly time series's data of land temperature from 1980 to 2008.
 After plotting a scatter diagram, it seems that annually, there is a semi
 sinusoidal cycle. How do I run Fourier's series to the data so that I can
 fit model on it?

There are several methods.


1. The simplest would be to select the number of terms you
want, put the data into a data.frame, and use lm(y ~ sin(t/period) +
cos(t/period) + sin(2*t/period) + cos(2*t/period) + ..., data),
including as many terms as you want in the series. This is not
recommended, because it ignores the time series effects and does not
apply a smoothness penalty to the Fourier approximation.


2. A second is to use the 'fda' package. Examples are
provided (even indexed) in Ramsay, Hooker and Graves (2009) Functional
Data Analysis with R and Matlab (Springer). This is probably what
Ramsay and Hooker would do, but I wouldn't, because it doesn't treat the
time series as a time series. It also requires more work on your part.


3. A third general class of approaches uses Kalman
filtering, also called dynamic linear models or state space models.
This would allow you to estimate a differential equation model, whose
solution could be a damped sinusoid. It would also allow you to
estimate regression coefficients of a finite Fourier series but without
the smoothness penalty you would get with 'fda'. For this, I recommend
the 'dlm' package with its vignette and companion book, Petris, Petrone
and Campagnoli (2009) Dynamic Linear Models with R (Springer).


If you want something quick and dirty, you might want option 1.
For that, I might use option 2, because I know and understand it
moderately well (being third author on the book). However, if you
really want to understand time series, I recommend option 3. That has
the additional advantage that I think it would have the greatest chances
of acceptance in a refereed academic journal of the three approaches.


 I am really sorry for my question sound stupid, but I just don't know where
 to start.

There also are specialized email lists that you might consider
for a future post. Go to www.r-project.org 
- Mailing Lists. In particular, you might be most interested in
R-sig-ecology.


Hope this helps.
Spencer Graves

 I am desperately looking for help from you guys.

 Thanks in advance.

 Eddie

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph: 408-655-4567


[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.  
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Value of 'pi'

2011-05-30 Thread Mike Marchywka













 Date: Sun, 29 May 2011 23:09:47 -0700
 From: jwiley.ps...@gmail.com
 To: vincy_p...@yahoo.ca
 CC: r-help@r-project.org
 Subject: Re: [R] Value of 'pi'

 Dear Vincy,

 I hope that in school you also learned that 22/7 is an approximation.
 Please consult your local mathematician for a proof that pi != 22/7.
 A quick search will provide you with volumes of information on what pi
 is, how it may be calculated, and calculations out to thousands of
 digits.

 Cheers,

 Josh

 On Sun, May 29, 2011 at 11:01 PM, Vincy Pyne wrote:
  Dear R helpers,
 
  I have one basic doubt about the value of pi. In school, we have learned 
  that
 
  pi = 22/7 (which is = 3.142857). However, if I type pi in R, I get pi = 
  3.141593. So which value of pi should be considered?

You could do this if you trust your trig functions since
that is presumably what you want a value of pi for,

 atan(1)
[1] 0.7853982
 atan(1)*4
[1] 3.141593

 atan(1)*4*7-22
[1] -0.008851425

 atan(1)*4-pi
[1] 0





 
  Regards
 
  Vincy
 
 
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nested design

2011-05-28 Thread Mike Marchywka













 Date: Sat, 28 May 2011 09:33:03 -0700
 From: jwiley.ps...@gmail.com
 To: bjorn.robr...@gmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] Nested design

 Hi,

 If you are not asking for stats help, then do you understand the model
 and are just confused by how R labels it? We can help match R's
 labels to the ones you are used to, if you tell us what you are used
 to.


I would not suggest as a rule to use a tool to validate itself but
you can use R to make sure your interpretation of other R output is right by 
giving 
contrived datasets to the analysis package and see what you get back.
Comparison can be to examples from text book or your own paper and pencil
analysis.  This is also a good way to learn things from basic terms to
things like sign or unit conventions in different fields etc. 

You can generate samples from normal distro and feed that to
the questionable package to see what comes back. 



 Cheers,

 Josh

 On Sat, May 28, 2011 at 6:54 AM, unpeatable  wrote:
  Dear Dennis,
  In my opinion I am not at all asking for any stats help, just a question how
  to read this output.
  Thanks, Bjorn
 
  -
  Dr. Bjorn JM Robroek
  Ecology and Biodiversity Group
  Institute of Environmental Biology, Utrecht University
  Padualaan 8, 3584 CH Utrecht, The Netherlands
  Email address: b.j.m.robr...@uu.nl
  http://www.researcherid.com/rid/C-4379-2008
  Tel: +31-(0)30-253 6091
  --
  View this message in context: 
  http://r.789695.n4.nabble.com/Nested-design-tp3557404p3557472.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predictive accuracy

2011-05-26 Thread Mike Marchywka

 Date: Thu, 26 May 2011 13:50:15 -0700
 From: gunter.ber...@gene.com
 To: ahmed.el-taht...@pfizer.com
 CC: r-help@r-project.org
 Subject: Re: [R] predictive accuracy

 1. This is not about R, and should be taken off list.

Well, depending on what mod's think a little bit of
generic how do I REALLY use this tool discusssion may be of 
benefit for all here- a maillist for a certain brand of hammer
may discuss various uses and types of nails etc.

Pesonally I have an interest in this-if the OP will 
post the data it may be possible to explore some analysis
options. 

 2. You are wading in an alligator infested swamp. Get help from
 (other) statisticians at Pfizer (there are many good ones there).

I thought that is what statisticians do? LOL. 
We don't know the situation- intern, looking for outside ideas after
exhausting internals, specific issues with internal peers,
summer student not wishing to bother everyone there for details etc. 

 Best,
 Bert

 P.S. The answer to all your questions is no (imho).

 On Thu, May 26, 2011 at 1:35 PM, El-Tahtawy, Ahmed
  wrote:
  The strong predictor is the country/region where the study was
  conducted. So it is not important/useful for a clinician to use it (as
  long he/she is in USA or Europe).
  Excluding that predictor will make another 2 insignificant predictors to
  become significant!!  Can the new model have a reliable predictive
  accuracy? I thought of excluding all patients from other countries and
  develop the model accordingly- is the exclusion of a lot of patients and
  compromise of the power is more acceptable??

LOL, quite the contrary, post hoc selection increases power to find
whatever you or sponsor desire... 

Presuming your general interest is in finding out attributes of a given
drug under various conditions, you would probably want to combine 
the observations with tentative thoughts on causality and see
what makes the best story. 

Statistical significance in isolation is a function of the data and analysis 
method,
doesn't really have anything specific to do with underlying systems.

In this case, if you have other continuous prognostic factors, say 
age, LDH, hemoglobin come to mind, you may be able to find that you
have nonmonotinc  relations between prognostic factor and outcome.
But, furhter,say you have enough patients that you could in fact map
dose response curves. It may turn out that this curve is in fact non-montonic
with parameters non-monotonic in prognsotic factor. Consider 

avg_survival= a+b*d-c*d^2

where d is the dose. At for small d, it seems to help but for larger dose it 
makes things worse. Now consider that c is a complicated function
of hematocrit, it may not be hard to imagine that anemics and siderositic( is 
that a word LOL?) have some underlying problems dealing with your drug. 
These may be distributed geographically etc etc etc.

This is all stuff you can simulate in R or even on paper. 

It sounds like you are already trying to write a label, which may
be a bit premature ( although I defer to the guy from DNA for that LOL).
 indicated for use in patients in Western Hemisphere with   

You may have decent luck looking at FDA panel discussion transcripts, search for
related general stats terms confined to site:fda.gov

  Thanks for your help...
  Al

  -Original Message-
  From: Marc Schwartz [mailto:marc_schwa...@me.com]
  Sent: Thursday, May 26, 2011 10:54 AM
  To: El-Tahtawy, Ahmed
  Cc: r-help@r-project.org
  Subject: Re: [R] predictive accuracy

  On May 26, 2011, at 7:42 AM, El-Tahtawy, Ahmed wrote:

  I am trying to develop a prognostic model using logistic regression.
  I
  built a full , approximate models with the use of penalization -
  design
  package. Also, I tried Chi-square criteria, step-down techniques. Used
  BS for model validation.

   The main purpose is to develop a predictive model for future patient
  population.   One of the strong predictor pertains to the study design
  and would not mean much for a clinician/investigator in real clinical
  situation and have been asked to remove it.
   Can I propose a model and nomogram without that strong -irrelevant
  predictor?? If yes, do I need to redo model calibration,
  discrimination,
  validation, etc...?? or just have 5 predictors instead of 6 in the
  prognostic model??

  Thanks for your help

  Al

  Is it that the study design characteristic would not make sense to a
  clinician but is relevant to future samples, or that the study design
  characteristic is unique to the sample upon which the model was
  developed and is not relevant to future samples because they will not be
  in the same or a similar study?

  Is the study design characteristic a surrogate for other factors that
  would be relevant to future samples? If so, you might engage in a
  conversation with the clinicians to gain some insights into other
  variables to consider for inclusion in

Re: [R] Processing large datasets/ non answer but Q on writing data frame derivative.

2011-05-25 Thread Mike Marchywka

 Date: Wed, 25 May 2011 09:49:00 -0400
 From: ro...@bestroman.com
 To: biomathjda...@gmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] Processing large datasets

 Thanks Jonathan.

 I'm already using RMySQL to load data for couple of days.
 I wanted to know what are the relevant R capabilities if I want to process 
 much bigger tables.

 R always reads the whole set into memory and this might be a limitation in 
 case of big tables, correct?

ok, now I ask, perhaps for my first R effort I will try to find source code for
data frame and make a paging or streaming derivative. That is, at least for
fixed size things, it can supply things like number of total rows but
has facilities for paging in and out of memory. Presumably all users of data
frame have to work through a limited interface which I guess could be 
expanded with various hints on  prefetch this for example. I haven't looked
at this idea in a while but the issue keeps coming up, dev list maybe?

Anyway, for your immediate issues with a few statistics you could
probably write a simple c++ program that ultimately becomes part of
an R package. It is a good idea to see what is available but these
questions come up here a lot and the normal suggestion is DB which
is exactly the opposite of what you want if you have predictable
access patterns ( although even here prefetch could probably be implemented).

 Doesn't it use temporary files or something similar to deal such amount of 
 data?

 As an example I know that SAS handles sas7bdat files up to 1TB on a box with 
 76GB memory, without noticeable issues.

 --Roman

 - Original Message -

  In cases where I have to parse through large datasets that will not
  fit into R's memory, I will grab relevant data using SQL and then
  analyze said data using R. There are several packages designed to do
  this, like [1] and [2] below, that allow you to query a database
  using
  SQL and end up with that data in an R data.frame.

  [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html
  [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html

  On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko
   wrote:
   Hi R list,

   I'm new to R software, so I'd like to ask about it is capabilities.
   What I'm looking to do is to run some statistical tests on quite
   big
   tables which are aggregated quotes from a market feed.

   This is a typical set of data.
   Each day contains millions of records (up to 10 non filtered).

   2011-05-24 750 Bid DELL 14130770 400
   15.4800 BATS 35482391 Y 1 1 0 0
   2011-05-24 904 Bid DELL 14130772 300
   15.4800 BATS 35482391 Y 1 0 0 0
   2011-05-24 904 Bid DELL 14130773 135
   15.4800 BATS 35482391 Y 1 0 0 0

   I'll need to filter it out first based on some criteria.
   Since I keep it mysql database, it can be done through by query.
   Not
   super efficient, checked it already.

   Then I need to aggregate dataset into different time frames (time
   is
   represented in ms from midnight, like 35482391).
   Again, can be done through a databases query, not sure what gonna
   be faster.
   Aggregated tables going to be much smaller, like thousands rows per
   observation day.

   Then calculate basic statistic: mean, standard deviation, sums etc.
   After stats are calculated, I need to perform some statistical
   hypothesis tests.

   So, my question is: what tool faster for data aggregation and
   filtration
   on big datasets: mysql or R?

   Thanks,
   --Roman N.

   [[alternative HTML version deleted]]

   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  --
  ===
  Jon Daily
  Technician
  ===
  #!/usr/bin/env outside
  # It's great, trust me.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka

 Date: Wed, 25 May 2011 10:18:48 -0400
 From: ro...@bestroman.com
 To: mailinglist.honey...@gmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] Processing large datasets

  Hi,
  If your datasets are *really* huge, check out some packages listed
  under the Large memory and out-of-memory data section of the
  HighPerformanceComputing task view at CRAN:

  http://cran.r-project.org/web/views/HighPerformanceComputing.html

Does this have any specific limitations ? It sounds offhand like it
does paging and all the needed buffering for arbitrary size
data. Does it work with everything? I seem to recall bigmemory came up
before in this context and there was some problem.

Thanks.

  Also, if you find yourself needing to do lots of
  grouping/summarizing type of calculations over large data
  frame-like objects, you might want to check out the data.table package:

  http://cran.r-project.org/web/packages/data.table/index.html

  --
  Steve Lianoglou
  Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
  Contact Info: http://cbio.mskcc.org/~lianos/contact

 I don't think data.table is fundamentally different from data.frame type, but 
 thanks for the suggestion.

 http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf
 Just like data.frames, data.tables must fit inside RAM

 The ff package by Adler, listed in Large memory and out-of-memory data is 
 probably most interesting.

 --Roman Naumenko

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka










 Date: Wed, 25 May 2011 12:32:37 -0400
 Subject: Re: [R] Processing large datasets
 From: mailinglist.honey...@gmail.com
 To: marchy...@hotmail.com
 CC: ro...@bestroman.com; r-help@r-project.org

 Hi,

 On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka  wrote:
 [snip]
   If your datasets are *really* huge, check out some packages listed
   under the Large memory and out-of-memory data section of the
   HighPerformanceComputing task view at CRAN:
 
   http://cran.r-project.org/web/views/HighPerformanceComputing.html
 
  Does this have any specific limitations ? It sounds offhand like it
  does paging and all the needed buffering for arbitrary size
  data. Does it work with everything?

 I'm not sure what limitations ... I know the bigmemory (and ff)
 packages try hard to make using out-of-memory datasets as
 transparent as possible.

 That having been said, I guess you will have to port more advanced
 methods to use such packages, hence the existence of the biglm,
 biganalytics, bigtabulate packages do.

  I seem to recall bigmemory came up
  before in this context and there was some problem.

 Well -- I don't often see emails on this list complaining about their
 functionality. That doesn't mean they're flawless (I also don't
 scrutinize the list traffic too closely). It could be that not too
 many people use them, or that people give up before they come knocking
 when there is a problem.

 Has something specifically failed for you in the past, or?

No, I haven't tried. I may have it confused with something else.
But this question does come up a bit usually related to 
 I tried to read huge file into data frame and wanted to pass
it to something with predictable memory access patterns and it
ran out of memory. What can I do? I guess I also stopped reading
anything after  using a DB as this is generally not a replacement
for a data strcuture. I'll take a look when I have a big dataset that
I can't condense easily. 







 -steve

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] so current status of interactive PDF creation isl? trying to explore econometriic convergence thread

2011-05-23 Thread Mike Marchywka



I think I just saw a thread go by on  how do I make interactive PDF from
rgl output but I ignored it at the time and can't seem to find consensus result
on google. Essentially I wanted to create pdf or other output to 
share a 3D plot and let viewers interact with it but it still wasn't
clear on easiest way to do this. 
Is this 2009 page the best reference?

http://cran.r-project.org/web/views/Graphics.html

Thanks.
Specific case of interest is below, 

I found rgl to be quite useful in regards to this,

http://r.789695.n4.nabble.com/maximum-likelihood-convergence-reproducing-Anderson-Blundell-1982-Econometrica-R-vs-Stata-td3502516.html#a3512807


I generated some data points using this script
( which itself uses this data file http://98.129.232.234/temp/share/em.txt )

http://98.129.232.234/temp/share/em.R.txt

After some editing with sed, the output of the above script
made this data file showing some optimization's
trjectory according to my variables of interest,

http://98.129.232.234/temp/share/emx.dat.txt

df-read.table(emx.dat,header=F,sep= )




I subsequently found a few ways to plot the data,
notably,

png(em.png)

plot(log(df$V1),df$V2)

png.off()



http://98.129.232.234/temp/share/em.png



png(em2.png)

plot(log(df$V1),df$V2,cex=1+.2*(log(df$V3)-min(log(df$V3

dev.off()


http://98.129.232.234/temp/share/em2.png


But what I'd like to do is publish an interactive 3D
plot of this data similar to the rgl output
of this,

df-read.table(emx.dat,header=F,sep= )
library(rgl)
rgl.points(log(df$V1),df$V2,log(df$V3))

quick google search and ?rgl didn't seem
to provide immediate answer.

Is there a way to publish interactive plots?
Thanks.








  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RGL package installation problem on Centos

2011-05-23 Thread Mike Marchywka













Date: Mon, 23 May 2011 14:43:59 +0100
From: arraystrugg...@gmail.com
To: r-help@r-project.org
Subject: [R] RGL package installation problem on Centos


Dear R users,
I have installed the latest version of R from source on Centos (using
configure and make install).
This seemed to work fine, with no Errors reported and R at the command line
starts R.

However, if I try and installed the package rgl using;
install.packages(rgl)
I get the following error;

installing to /usr/local/lib64/R/library/rgl/libs
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
 *** caught segfault ***
address (nil), cause 'memory not mapped'

I just did install of R from source built with various
options to support Rapache and tried to load rgl.
First it complained no display so I went back
to bash and did export DISPLAY=:0 and it seemed
to load ok. Do you have X running and a display set?
Not sure what happens if you have R without X11 support
for example. 

I probably installed with dep=TRUE and only cygwin I do recall
some issues with missing dependencies.  Try setting dependencies to
true and see if that helps.











aborting ...
sh: line 1: 23732 Segmentation fault  '/usr/local/lib64/R/bin/R'
--no-save --slave  /tmp/RtmpkvIjOb/file6d97876
ERROR: loading failed
* removing â/usr/local/lib64/R/library/rglâ
The downloaded packages are in
â/tmp/Rtmp5OaGuQ/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making packages.html  ... done
Warning message:
In install.packages(rgl) :
  installation of package 'rgl' had non-zero exit status
I read that Open GL header files have to be present and are in
/usr/include/GL.
I also read about different graphics cards causing problems but I don't know
how to find this info out.

Any help appreciated and full error message included below.

Thanks,

 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

 full error ##
 install.packages(rgl)
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
trying URL 'http://cran.ma.imperial.ac.uk/src/contrib/rgl_0.92.798.tar.gz'
Content type 'application/x-gzip' length 162 bytes (1.6 Mb)
opened URL
==
downloaded 1.6 Mb
* installing *source* package ârglâ ...
checking for gcc... gcc -std=gnu99
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc -std=gnu99 accepts -g... yes
checking for gcc -std=gnu99 option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -std=gnu99 -E
checking for gcc... (cached) gcc -std=gnu99
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc -std=gnu99 accepts -g... (cached) yes
checking for gcc -std=gnu99 option to accept ISO C89... (cached) none needed
checking for libpng-config... yes
configure: using libpng-config
configure: using libpng dynamic linkage
checking for X... libraries , headers
checking GL/gl.h usability... yes
checking GL/gl.h presence... yes
checking for GL/gl.h... yes
checking GL/glu.h usability... yes
checking GL/glu.h presence... yes
checking for GL/glu.h... yes
checking for glEnd in -lGL... yes
checking for gluProject in -lGLU... yes
checking for freetype-config... yes
configure: using Freetype and FTGL
configure: creating ./config.status
config.status: creating src/Makevars
** libs
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c BBoxDeco.cpp -o BBoxDeco.o
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c Background.cpp -o
Background.o
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c Color.cpp -o Color.o
g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12
-DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext
-I/usr/local/include   -g -O2 -fpic  -g -O2 -c

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-22 Thread Mike Marchywka

 
  dramatically[[elided Hotmail spam]]
 
  # Of course, derivative-free methods newuoa, and Nelder-Mead are improved 
  by scaling, but not by the availability of exact gradients. I don't know 
  what is wrong with bobyqa in this example.
 
  In short, even with scaling and exact gradients, this optimization problem 
  is recalcitrant.
 
  Best,
  Ravi.
 
  
  From: Mike Marchywka [marchy...@hotmail.com]
  Sent: Thursday, May 12, 2011 8:30 AM
  To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com
  Cc: r-help@r-project.org
  Subject: RE: [R] maximum likelihood convergence reproducing Anderson 
  Blundell 1982 Econometrica R vs Stata
 
  So what was the final verdict on this discussion? I kind of
  lost track if anyone has a minute to summarize and critique my summary 
  below.
 
 
  Apparently there were two issues, the comparison between R and Stata
  was one issue and the optimum solution another. As I understand it,
  there was some question about R numerical gradient calculation. This would
  suggest some features of the function may be of interest to consider.
 
  The function to be optimized appears to be, as OP stated,
  some function of residuals of two ( unrelated ) fits.
  The residual vectors e1 and e2 are dotted in various combinations
  creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
  is the result to be minimized by choice of theta. Theta it seems is
  an 8 component vector, 4 components determine e1 and the other 4 e2.
  Presumably a unique solution would require that e1 and e2, both n-component 
  vectors,
  point in different directions or else both could become aribtarily large
  while keeping the error signal at zero. For fixed magnitudes, colinearity
  would reduce the Error. The intent would appear to be to
  keep the residuals distributed similarly in the two ( unrelated) fits.
  I guess my question is,
   did anyone determine that there is a unique solution? or
  am I totally wrong here ( I haven't used these myself to any
  extent and just try to run some simple teaching examples, asking
  for my own clarification as much as anything).
 
  Thanks.
 
 
 
 
 
 
 
 
 
 
  
   From: rvarad...@jhmi.edu
   To: pda...@gmail.com; alex.ols...@gmail.com
   Date: Sat, 7 May 2011 11:51:56 -0400
   CC: r-help@r-project.org
   Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
   Blundell 1982 Econometrica R vs Stata
  
   There is something strange in this problem. I think the log-likelihood is 
   incorrect. See the results below from optimx. You can get much larger 
   log-likelihood values than for the exact solution that Peter provided.
  
   ## model 18
   lnl - function(theta,y1, y2, x1, x2, x3) {
   n - length(y1)
   beta - theta[1:8]
   e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
   e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
   e - cbind(e1, e2)
   sigma - t(e)%*%e
   logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there 
   is something wrong here
   return(-logl)
   }
  
   data - read.table(e:/computing/optimx_example.dat, header=TRUE, 
   sep=,)
  
   attach(data)
  
   require(optimx)
  
   start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
  
   # the warnings can be safely ignored in the optimx calls
   p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
   + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
  
   p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
   + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
  
   p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
   + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
  
   Ravi.
   
   From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On 
   Behalf Of peter dalgaard [pda...@gmail.com]
   Sent: Saturday, May 07, 2011 4:46 AM
   To: Alex Olssen
   Cc: r-help@r-project.org
   Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
   Blundell 1982 Econometrica R vs Stata
  
   On May 6, 2011, at 14:29 , Alex Olssen wrote:
  
Dear R-help,
   
I am trying to reproduce some results presented in a paper by Anderson
and Blundell in 1982 in Econometrica using R.
The estimation I want to reproduce concerns maximum likelihood
estimation of a singular equation system.
I can estimate the static model successfully in Stata but for the
dynamic models I have difficulty getting convergence.
My R program which uses the same likelihood function as in Stata has
convergence properties even for the static case.
   
I have copied my R program and the data below. I realise the code
could be made more elegant - but it is short enough.
   
Any ideas would be highly appreciated.
  
   Better starting values would help. In this case, almost too good values 
   are available:
  
   start - c(coef(lm(y1~x1+x2

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-21 Thread Mike Marchywka

.

 In short, even with scaling and exact gradients, this optimization problem is 
 recalcitrant.

 Best,
 Ravi.

 
 From: Mike Marchywka [marchy...@hotmail.com]
 Sent: Thursday, May 12, 2011 8:30 AM
 To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com
 Cc: r-help@r-project.org
 Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 
 1982 Econometrica R vs Stata

 So what was the final verdict on this discussion? I kind of
 lost track if anyone has a minute to summarize and critique my summary below.


 Apparently there were two issues, the comparison between R and Stata
 was one issue and the optimum solution another. As I understand it,
 there was some question about R numerical gradient calculation. This would
 suggest some features of the function may be of interest to consider.

 The function to be optimized appears to be, as OP stated,
 some function of residuals of two ( unrelated ) fits.
 The residual vectors e1 and e2 are dotted in various combinations
 creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
 is the result to be minimized by choice of theta. Theta it seems is
 an 8 component vector, 4 components determine e1 and the other 4 e2.
 Presumably a unique solution would require that e1 and e2, both n-component 
 vectors,
 point in different directions or else both could become aribtarily large
 while keeping the error signal at zero. For fixed magnitudes, colinearity
 would reduce the Error. The intent would appear to be to
 keep the residuals distributed similarly in the two ( unrelated) fits.
 I guess my question is,
  did anyone determine that there is a unique solution? or
 am I totally wrong here ( I haven't used these myself to any
 extent and just try to run some simple teaching examples, asking
 for my own clarification as much as anything).

 Thanks.










 
  From: rvarad...@jhmi.edu
  To: pda...@gmail.com; alex.ols...@gmail.com
  Date: Sat, 7 May 2011 11:51:56 -0400
  CC: r-help@r-project.org
  Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
  Blundell 1982 Econometrica R vs Stata
 
  There is something strange in this problem. I think the log-likelihood is 
  incorrect. See the results below from optimx. You can get much larger 
  log-likelihood values than for the exact solution that Peter provided.
 
  ## model 18
  lnl - function(theta,y1, y2, x1, x2, x3) {
  n - length(y1)
  beta - theta[1:8]
  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e - cbind(e1, e2)
  sigma - t(e)%*%e
  logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is 
  something wrong here
  return(-logl)
  }
 
  data - read.table(e:/computing/optimx_example.dat, header=TRUE, sep=,)
 
  attach(data)
 
  require(optimx)
 
  start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
 
  # the warnings can be safely ignored in the optimx calls
  p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
  + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
 
  p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
  + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
 
  p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
  + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))
 
  Ravi.
  
  From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
  Of peter dalgaard [pda...@gmail.com]
  Sent: Saturday, May 07, 2011 4:46 AM
  To: Alex Olssen
  Cc: r-help@r-project.org
  Subject: Re: [R] maximum likelihood convergence reproducing Anderson 
  Blundell 1982 Econometrica R vs Stata
 
  On May 6, 2011, at 14:29 , Alex Olssen wrote:
 
   Dear R-help,
  
   I am trying to reproduce some results presented in a paper by Anderson
   and Blundell in 1982 in Econometrica using R.
   The estimation I want to reproduce concerns maximum likelihood
   estimation of a singular equation system.
   I can estimate the static model successfully in Stata but for the
   dynamic models I have difficulty getting convergence.
   My R program which uses the same likelihood function as in Stata has
   convergence properties even for the static case.
  
   I have copied my R program and the data below. I realise the code
   could be made more elegant - but it is short enough.
  
   Any ideas would be highly appreciated.
 
  Better starting values would help. In this case, almost too good values are 
  available:
 
  start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))
 
  which appears to be the _exact_ solution.
 
  Apart from that, it seems that the conjugate gradient methods have 
  difficulties with this likelihood, for some less than obvious reason. 
  Increasing the maxit gets you closer but still not satisfactory.
 
  I would suggest trying out the experimental optimx package. Apparently, 
  some

Re: [R] text mining analysis and word visualization of pdfs

2011-05-19 Thread Mike Marchywka














Date: Wed, 18 May 2011 15:24:49 +0530
From: ashimkap...@gmail.com
To: k...@huftis.org
CC: r-h...@stat.math.ethz.ch
Subject: Re: [R] text mining analysis and word visualization of pdfs


On Wed, May 18, 2011 at 1:44 PM, Karl Ove Hufthammer wrote:

 Ajay Ohri wrote:

  What is the appropriate software package for dumping say 20 PDFS in a
  folder, then creating data visualization with frequency counts of
  certain words as well as measure correlation within each file for
  certain key relationships or key words.

 pdftotext + Unix™ for Poets + R (ggplot2)

 What about the tm package ? I am a beginner and I don't know much about
this but I recall that it does have the ability to handle PDF's. A few words
from the experts would be nice.

I don;t know if I'm an expert, I can't even get a browser that echo's
keystrokes in a reasonable time with 4 core CPU on 'dohs, but PDF
could mean just about anything in terms of how text is respresented. Whatever
R packages do, they will not be able to read the mind of the author.
Even with pdftotext, there are many options and even simple things like
US IRS instruction forms can be almost impossible to extract in a coherent
manner. Many authors could care less about the information as long as the
thing looks like paper copy. If you are stuck with PDF, I'd be looking
for more tools first as you will probably want to know how they are 
constrcuted. 

I would just reiterate that the best approach for many data analysts would
be to contact data source explaining problems with improperly authored PDF or
other specialized file format that are only supported by limited proprietary 
tools
or that obfuscate information of interest. 


  









  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help, please

2011-05-19 Thread Mike Marchywka

 From: dwinsem...@comcast.net
 To: julio.flo...@spss.com.mx
 Date: Thu, 19 May 2011 10:40:08 -0400
 CC: r-help@r-project.org
 Subject: Re: [R] Help, please

 On May 18, 2011, at 6:29 PM, Julio César Flores Castro wrote:

  Hi,

  I am using R 2.10.1 and I have a doubt. Do you know how many cases
  can R
  handle?

 I was able to handle (meaning do Cox proportional hazards work with
 the 'rms' package which adds extra memory overhead with a datadist
 object) a 5.5 million rows by 100 columns dataframe without
 difficulty using 24 GB on a Mac (BSD UNIX kernel). I was running into
 performance slow downs related to paging out to virtual memory at 150
 columns, but after expanding to 32 GB can now handle 5.5 MM records
 with 200 columns without paging.

  I want to use the library npmc but if I have more than 4,500 cases I
  get an
  error message. If I use less than 4500 cases I don´t have problems
  with this
  library.

  Is there any way to increase the number of cases in order to use this
  library.

 64 bit OS, 64 bit R, and more memory.

The longer term solution is implementation and algorithm designed to
increase coherence
of memory accesses ( firefox is doing this to me now dropping every few chars 
and
 getting 
many behind as it thrashes with memory leak, LOL).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Power Spectrum from STFT (e1071)?

2011-05-13 Thread Mike Marchywka

 Date: Tue, 10 May 2011 11:55:34 -0700
 From: ajf...@psu.edu
 To: r-help@r-project.org
 Subject: [R] Power Spectrum from STFT (e1071)?

 Hello.

 Does anyone know how to generate a power spectrum from the STFT function in
 package e1071? The output for this function supplies the Fourier
 coefficients, but I do not know how to relate these back to the power
 spectrum.

 Alternatively, if anyone is aware of other packages/functions that perform
 short time Fourier transforms, that would also be helpful.

What exactly are you trying to do? From what I can recall, quick look on 
wikipedia
and I've done this in the past for audio envelope/carrier separation, 
the short-term modifier just means you need to window the input signal.
I did just do something like this in R, to deconvolve histograms with
an impulse response, and it is not hard to create your own window function
and multiply with signal. Power spectrum  is just magnitude, but you may want
to also invert and see that output is real (Re Im) to verify nothing went
wrong in processing. Probably this returns complex numbers, I think you
can use abs) or Re() and Im() on them and also str() but post some
code and try ?fft for example.

 Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Powerful PC to run R

2011-05-13 Thread Mike Marchywka

 Date: Fri, 13 May 2011 12:38:51 +0200
 From: haenl...@escpeurope.eu
 To: r-help@r-project.org
 Subject: [R] Powerful PC to run R

 Dear all,

 I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
 i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
 calculations run for several days sometimes even weeks (mainly simulations
 over a large parameter space). Depending on the external conditions, my
 laptop sometimes shuts down due to overheating.

 I'm now thinking about buying a more powerful desktop PC or laptop. Can
 anybody advise me on the best configuration to run R as fast as possible? I
 will use this PC exclusively for R so any other factors are of limited
 importance.

( I think my laptop is overheating with firefox trying to execute whatever
stupid code hotmail is using  ssh to remote server echos keys faster LOL)
The point of the above is that it really depends what you
are doing. Heat can com from disk drive as well as silicon. Generally
you'd want to consider algorithm and implementation and get profiling info
before just buying bigger hammer. If you are thrashing VM, sorting
some data may help for example. If it is suited parallelism, you could
even try to distribute task over several cheaper computers, hard to knwo.

 Thanks,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract information from the following dataset?

2011-05-12 Thread Mike Marchywka














 Date: Thu, 12 May 2011 10:43:59 +0200
 From: jose-marcio.mart...@mines-paristech.fr
 To: xzhan...@ucr.edu
 CC: r-help@r-project.org
 Subject: Re: [R] How to extract information from the following dataset?

 Xin Zhang wrote:
  Hi all,
 
  I have never worked with this kind of data before, so Please help me out
  with it.
  I have the following data set, in a csv file, looks like the following:
 
  Jan 27, 2010 16:01:24,000 125 - - -
  Jan 27, 2010 16:06:24,000 125 - - -
  Jan 27, 2010 16:11:24,000 176 - - -
  Jan 27, 2010 16:16:25,000 159 - - -
  Jan 27, 2010 16:21:25,000 142 - - -
  Jan 27, 2010 16:26:24,000 142 - - -
  Jan 27, 2010 16:31:24,000 125 - - -
  Jan 27, 2010 16:36:24,000 125 - - -
  Jan 27, 2010 16:41:24,000 125 - - -
  Jan 27, 2010 16:46:24,000 125 - - -
  Jan 27, 2010 16:51:24,000 125 - - -
  Jan 27, 2010 16:56:24,000 125 - - -
  Jan 27, 2010 17:01:24,000 157 - - -
  Jan 27, 2010 17:06:24,000 172 - - -
  Jan 27, 2010 17:11:25,000 142 - - -
  Jan 27, 2010 17:16:24,000 125 - - -
  Jan 27, 2010 17:21:24,000 125 - - -
  Jan 27, 2010 17:26:24,000 125 - - -
  Jan 27, 2010 17:31:24,000 125 - - -
  Jan 27, 2010 17:36:24,000 125 - - -
  Jan 27, 2010 17:41:24,000 125 - - -
  Jan 27, 2010 17:46:24,000 125 - - -
  Jan 27, 2010 17:51:24,000 125 - - -
  ..
 
  The first few columns are month, day, year, time with OS3 accuracy. And the
  last number is the measurement I need to extract.
  I wonder if there is a easy way to just take out the measurements only from
  a specific day and hour, i.e. if I want measurements from Jan 27 2010
  16:--:--
  then I get 125,125,176,159,142,142,125,125,125,125,125,125.
  Many thanks!!

 The easiest is in the shell, if you're using some flavour of unix :

 grep Jan 27, 2010 16 filein.txt | awk '{print $5}'  fileout.txt

 and use fileout which will contain only the column of data you want.

Nomrally that is what I do but the R POSIXct features work pretty easily.
I guess I'd use bash text processing commands to put the data into a 
form you like, perhaps y-mo-day time  and then read it in in as data frame.
Usually I convert everything to time since epoch began because I like integers
but there are some facilities here like round that work well with date-times.

 dx-as.POSIXct(2011-04-03 13:14:15)
 dx
[1] 2011-04-03 13:14:15 CDT
 round(dx,hour)
[1] 2011-04-03 13:00:00 CDT
 as.integer(dx)
[1] 1301854455


  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-12 Thread Mike Marchywka





So what was the final verdict on this discussion? I kind of 
lost track if anyone has a minute to summarize and critique my summary below.


Apparently there were two issues, the comparison between R and Stata
was one issue and the optimum solution another. As I understand it,
there was some question about R numerical gradient calculation. This would
suggest some features of the function may be of interest to consider. 

The function to be optimized appears to be, as OP stated, 
some function of residuals of two ( unrelated ) fits. 
The residual vectors e1 and e2 are dotted in various combinations
creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which
is the result to be minimized by choice of theta. Theta it seems is
an 8 component vector, 4 components determine e1 and the other 4 e2.
Presumably a unique solution would require that e1 and e2, both n-component 
vectors,
 point in different directions or else both could become aribtarily large
while keeping the error signal at zero. For fixed magnitudes, colinearity
would reduce the Error.  The intent would appear to be to 
keep the residuals distributed similarly in the two ( unrelated) fits. 
 I guess my question is,
 did anyone determine that there is a unique solution? or
am I totally wrong here ( I haven't used these myself to any
extent and just try to run some simple teaching examples, asking
for my own clarification as much as anything).

Thanks.











 From: rvarad...@jhmi.edu
 To: pda...@gmail.com; alex.ols...@gmail.com
 Date: Sat, 7 May 2011 11:51:56 -0400
 CC: r-help@r-project.org
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
 1982 Econometrica R vs Stata

 There is something strange in this problem. I think the log-likelihood is 
 incorrect. See the results below from optimx. You can get much larger 
 log-likelihood values than for the exact solution that Peter provided.

 ## model 18
 lnl - function(theta,y1, y2, x1, x2, x3) {
 n - length(y1)
 beta - theta[1:8]
 e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
 e - cbind(e1, e2)
 sigma - t(e)%*%e
 logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is 
 something wrong here
 return(-logl)
 }

 data - read.table(e:/computing/optimx_example.dat, header=TRUE, sep=,)

 attach(data)

 require(optimx)

 start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

 # the warnings can be safely ignored in the optimx calls
 p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

 p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

 p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2,
 + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500))

 Ravi.
 
 From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
 Of peter dalgaard [pda...@gmail.com]
 Sent: Saturday, May 07, 2011 4:46 AM
 To: Alex Olssen
 Cc: r-help@r-project.org
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
 1982 Econometrica R vs Stata

 On May 6, 2011, at 14:29 , Alex Olssen wrote:

  Dear R-help,
 
  I am trying to reproduce some results presented in a paper by Anderson
  and Blundell in 1982 in Econometrica using R.
  The estimation I want to reproduce concerns maximum likelihood
  estimation of a singular equation system.
  I can estimate the static model successfully in Stata but for the
  dynamic models I have difficulty getting convergence.
  My R program which uses the same likelihood function as in Stata has
  convergence properties even for the static case.
 
  I have copied my R program and the data below. I realise the code
  could be made more elegant - but it is short enough.
 
  Any ideas would be highly appreciated.

 Better starting values would help. In this case, almost too good values are 
 available:

 start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

 which appears to be the _exact_ solution.

 Apart from that, it seems that the conjugate gradient methods have 
 difficulties with this likelihood, for some less than obvious reason. 
 Increasing the maxit gets you closer but still not satisfactory.

 I would suggest trying out the experimental optimx package. Apparently, some 
 of the algorithms in there are much better at handling this likelihood, 
 notably nlm and nlminb.




 
  ## model 18
  lnl - function(theta,y1, y2, x1, x2, x3) {
  n - length(y1)
  beta - theta[1:8]
  e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3
  e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3
  e - cbind(e1, e2)
  sigma - t(e)%*%e
  logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma)))
  return(-logl)
  }
  p - optim(0*c(1:8), lnl, method=BFGS, hessian=TRUE, y1=y1, y2=y2,
  x1=x1,

Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata

2011-05-09 Thread Mike Marchywka

 Date: Mon, 9 May 2011 22:06:38 +1200
 From: alex.ols...@gmail.com
 To: pda...@gmail.com
 CC: r-help@r-project.org; da...@otter-rsch.com
 Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 
 1982 Econometrica R vs Stata

 Peter said

 Ahem! You might get us interested in your problem, but not to the
 level that we are going to install Stata and Tsp and actually dig out
 and study the scientific paper you are talking about. Please cite the
 results and explain the differences.

 Apologies Peter, will do,

 The results which I can emulate in Stata but not (yet) in R are reported 
 below.

did you actually cut/paste code anywhere and is your first coefficient -.19 or 
-.019?
Presumably typos would be one possible problem. 

 They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569

is this it, page 1559?

http://www.jstor.org/pss/1913396

generally it helps if we could at least see the equations to check your
code against typos ( note page number ?) in lnl that may fix part of the
mystery.  Is full text available
on author's site, doesn't come up on citeseer AFAICT,

http://citeseerx.ist.psu.edu/search?q=blundell+1982sort=ascdate

I guess one question would be  what is beta in lnl supposed to be -
it isn't used anywhere but I will also mentioned I'm not that familiar
with the R code ( I'm trying to work through this to learn R and the 
optimizers). 

maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are
e1 and e2 supposed to be?

 TABLE II - model 18s

 coef std err
 p10 -0.19 0.078
 p11 0.220 0.019
 p12 -0.148 0.021
 p13 -0.072
 p20 0.893 0.072
 p21 -0.148
 p22 0.050 0.035
 p23 0.098

 The results which I produced in Stata are reported below.
 I spent the last hour rewriting the code to reproduce this - since I
 am now at home and not at work :(
 My results are identical to those published. The estimates are for
 a 3 equation symmetrical singular system.
 I have not bothered to report symmetrical results and have backed out
 an extra estimate using adding up constraints.
 I have also backed out all standard errors using the delta method.

 . ereturn display
 --
 | Coef. Std. Err. z P|z| [95% Conf. Interval]
 -+
 a |
 a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664
 a2 | .8926598 .0704068 12.68 0.000 .7546651 1.030655
 a3 | .1261517 .0590193 2.14 0.033 .010476 .2418275
 -+
 g |
 g11 | .2199442 .0184075 11.95 0.000 .183866 .2560223
 g12 | -.1476856 .0211982 -6.97 0.000 -.1892334 -.1061378
 g13 | -.0722586 .0145154 -4.98 0.000 -.1007082 -.0438089
 g22 | .0496865 .0348052 1.43 0.153 -.0185305 .1179034
 g23 | .0979991 .0174397 5.62 0.000 .0638179 .1321803
 g33 | -.0257405 .0113869 -2.26 0.024 -.0480584 -.0034226
 --

 In R I cannot get results like this - I think it is probably to do
 with my inability at using the optimisers well.
 Any pointers would be appreciated.

 Peter said Are we maximizing over the same parameter space? You say
 that the estimates from the paper gives a log-likelihood of 54.04, but
 the exact solution clocked in at 76.74, which in my book is rather
 larger.

 I meant +54.04  -76.74. It is quite common to get positive
 log-likelihoods in these system estimation.

 Kind regards,

 Alex

 On 9 May 2011 19:04, peter dalgaard  wrote:

  On May 9, 2011, at 06:07 , Alex Olssen wrote:

  Thank you all for your input.

  Unfortunately my problem is not yet resolved.  Before I respond to
  individual comments I make a clarification:

  In Stata, using the same likelihood function as above, I can reproduce
  EXACTLY (to 3 decimal places or more, which is exactly considering I
  am using different software) the results from model 8 of the paper.

  I take this as an indication that I am using the same likelihood
  function as the authors, and that it does indeed work.
  The reason I am trying to estimate the model in R is because while
  Stata reproduces model 8 perfectly it has convergence
  difficulties for some of the other models.

  Peter Dalgaard,

  Better starting values would help. In this case, almost too good
  values are available:

  start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3)))

  which appears to be the _exact_ solution.

  Thanks for the suggestion.  Using these starting values produces the
  exact estimate that Dave Fournier emailed me.
  If these are the exact solution then why did the author publish
  different answers which are completely reproducible in
  Stata and Tsp?

  Ahem! You might get us interested in your problem, but not to the level 
  that we are going to install Stata and Tsp and actually dig out and study 
  the scientific

Re: [R] Confidence intervals and polynomial fits

2011-05-08 Thread Mike Marchywka

 From: pda...@gmail.com
 Date: Sun, 8 May 2011 09:33:23 +0200
 To: rh...@sticksoftware.com
 CC: r-help@r-project.org
 Subject: Re: [R] Confidence intervals and polynomial fits

 On May 7, 2011, at 16:15 , Ben Haller wrote:

  On May 6, 2011, at 4:27 PM, David Winsemius wrote:

  On May 6, 2011, at 4:16 PM, Ben Haller wrote:

  As for correlated coefficients: x, x^2, x^3 etc. would obviously be 
  highly correlated, for values close to zero.

  Not just for x close to zero:

  cor( (10:20)^2, (10:20)^3 )
  [1] 0.9961938
  cor( (100:200)^2, (100:200)^3 )
  [1] 0.9966219

  Wow, that's very interesting. Quite unexpected, for me. Food for thought. 
  Thanks!

 Notice that because of the high correlations between the x^k, their parameter 
 estimates will be correlated too. In practice, this means that the c.i. for 
 the quartic term contains values for which you can compensate with the other 
 coefficients and still have an acceptable fit to data. (Nothing strange about 
 that; already in simple linear regression, you allow the intercept to change 
 while varying the slope.)

I was trying to compose a longer message but at least for even/odd it isn't 
hard to find a set
of values for which cor is zero or to find a set of points that make sines of 
different frequencies have
non-zero correlation- this highlights the fact that the computer isn't magic 
and it
needs data to make basis functions different from each other. 
For background, you probably want to look up Taylor Series and Orthogonal 
Basis.
I would also suggest using R to add noise to your input and see what that does 
to your predictions
or for that matter take simple known data and add noise although I think in 
principal you can
do this analytically. You can always project a signal
onto some subspace and get an estimate of how good your estimate is, but that 
is different from
asking how well you can reconstruct your signal from a bunch of projections. 
If you want to know, what can I infer about the slope of my thing at x=a that 
is a
specific question about one coefficient but at this point statisticians can 
elaborate about
various issues with the other things you ignore. Also, I think you said 
something about
correclated at x=0 but you can change your origin, (x-a)^n and expand this in a 
finite series in x^m
to see what happens here. 

Also, if you are using hotmail don't think that a dot product is not html LOL 
since
hotmail know you must mean html when you use less than even in text email...

 --
 Peter Dalgaard
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk Priv: pda...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nls problem with R

2011-05-05 Thread Mike Marchywka











 Date: Thu, 5 May 2011 01:20:33 -0700
 From: sterles...@hotmail.com
 To: r-help@r-project.org
 Subject: Re: [R] nls problem with R

 ID1 ID2 t V(t)
 1 1 0 6.053078443
 2 1 0.3403 5.56937391
 3 1 0.4181 5.45484486
 4 1 0.4986 5.193124598
 5 1 0.7451 4.31386722
 6 1 1.0069 3.645422269
 7 1 1.5535 3.587710965
 8 1 1.8049 3.740362689
 9 1 2.4979 3.699837726
 10 1 6.4903 2.908485019
 11 1 13.5049 1.888179494
 12 1 27.5049 1.176091259
 13 1 41.5049 1.176091259

 The model
 (1) V(t)=V0[1-epi+ epi*exp(-c(t-t0))]

A=Vo, B-Vo*epi, C=exp(-c*t0)
V(t)=A-B+B*C*exp(-ct)

or further, D=A-B, F=B*C,

V(t)=D+F*exp(-ct)

this model only really has 3 attriubtes: initial value, final value,
and decay constant yet you ask for 4 parameters. There is no
way to get a unique answer. For some reason this same form comes up
a lot here, I think this is about third time I've sene this in last few weeks.

I guess when fishing or shopping for forms to fit, it is tempting to
throw a bunch of parameteres into your model but this can create intractable
ambiguities. 

Indeed, if I just remove t0 and use your first 8 points I get this
( random starting values, but convewrged easily you still need to plot etc)


[1] 1   v= 8.77181162126362  epi= 0.672516376478598  cl= 1.90973175223917 t0= 0
.643481321167201
 summary(nls2)

Formula: V2 ~ v0 * (1 - epi + epi * exp(-cl * (T2)))

Parameters:
    Estimate Std. Error t value Pr(|t|)
v0    6.2901 0.3384  18.585  8.3e-06 ***
epi   0.5430 0.1373   3.955   0.0108 *
cl    0.9684 0.5491   1.763   0.1381
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3579 on 5 degrees of freedom

Number of iterations to convergence: 11
Achieved convergence tolerance: 4.057e-06



 (2) V(t)=V0{A*exp[-lambda1(t-t0)]+(1-A)*exp[-lambda2(t-t0)]}

 in formula (2) lambda1=0.5*{(c+delta)+[(c-delta)^2+4*(1-epi)*c*delta]^0.5}

 lambda2=0.5*{(c+delta)-[(c-delta)^2+4*(1-epi)*c*delta]^0.5}
 A=(epi*c-lambda2)/(lambda1-lambda2)

 The regression rule :
 for formula (1):(t=2,that is) first 8 rows are used for non-linear
 regression
 epi,c,t0,V0 parameters are obtained
 for formula (2):all 13 rows of results are used for non-linear regression
 lambda1,lambda2,A (with these parameters, delta can be calculated from them)

 Thanks for help
 Ster Lesser

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3497825.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nls problem with R

2011-05-04 Thread Mike Marchywka

 Date: Wed, 4 May 2011 07:07:44 -0700
 From: sterles...@hotmail.com
 To: r-help@r-project.org
 Subject: Re: [R] nls problem with R

 Thanks Andrew.
 I am sorry for some typos that I omit some numbers of T2.
 Based on your suggestion,I think the problem is in the initial values.
 And I will read more theory about the non-linear regression.

there is unlikely to be any magic involved, unlike getting hotmail to work.
As a tool for understanding your data, you should have some idea
of the qualitiative properties of model and data and the error
function you use to reconcile the two. 

If you can post your full data set I may post an R example of somethings
to try. I was looking for an excuse to play with nls, I'm not expert here,
and curious to see what I can do with your example for critique by others.
If you want to fully automate this for N contnuous parameters, you
can take a shotgun approach but not sure it helps other htna to
find gross problems in model or data.
I actually wrote a loop to keep picking random parameter values
and calculate and SSE between predicted and real data. What you soon
find is that this is like trying to decode a good crypto algorithm
by guessing- you can do the math to see the problem LOL.

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3495672.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speed up plotting to MSWindows graphics window

2011-04-27 Thread Mike Marchywka

 Date: Wed, 27 Apr 2011 11:16:26 +0200
 From: jonat...@k-m-p.nl
 To: r-help@r-project.org
 Subject: [R] Speed up plotting to MSWindows graphics window

 Hello,

 I am working on a project analysing the performance of motor-vehicles
 through messages logged over a CAN bus.

 I am using R 2.12 on Windows XP and 7

 I am currently plotting the data in R, overlaying 5 or more plots of
 data, logged at 1kHz, (using plot.ts() and par(new = TRUE)).
 The aim is to be able to pan, zoom in and out and get values from the
 plotted graph using a custom Qt interface that is used as a front end to
 R.exe (all this works).
 The plot is drawn by R directly to the windows graphic device.

 The data is imported from a .csv file (typically around 100MB) to a matrix.
 (timestamp, message ID, byte0, byte1, ..., byte7)
 I then separate this matrix into several by message ID (dimensions are
 in the order of 8cols, 10^6 rows)

 The panning is done by redrawing the plots, shifted by a small amount.
 So as to view a window of data from a second to a minute long that can
 travel the length of the logged data.

 My problem is that, the redrawing of the plots whilst panning is too
 slow when dealing with this much data.
 i.e.: I can see the last graphs being drawn to the screen in the
 half-second following the view change.
 I need a fluid change from one view to the next.

 My question is this:
 Are there ways to speed up the plotting on the MSWindows display?
 By reducing plotted point densities to *sensible* values?

Well, hard to know but it would help to know where all the time is going.
Usually people start complaining when VM thrashing is common but if you are
CPU limited you could try restricting the range of data you want to plot
rather than relying on the plot to just clip the largely irrelevant points
when you are zoomed in. It should not be too expensive to find the
limits either incrementally or with binary search on ordered time series. 
Presumably subsetting is fast using  foo[a:b,] 

One thing you may want to try for change of scale is wavelet  or
multi-resolution analysis. You can make a tree ( increasing memory usage
but even VM here may not be a big penalty if coherence is high ) and
display the resolution appropriate for the current scale. 

 Using something other than plot.ts() - is the lattice package faster?
 I don't need publication quality plots, they can be rougher...

 I have tried:
 -Using matrices instead of dataframes - (works for calculations but not
 enough for plots)
 -increasing the max usable memory (max-mem-size) - (no change)
 -increasing the size of the pointer protection stack (max-ppsize) - (no
 change)
 -deleting the unnecessary leftover matrices - (no change)
 -I can't use lines() instead of plot() because of the very different
 scales (rpm-1, flags -1to3)

 I am going to do some resampling of the logged data to reduce the vector
 sizes.
 (removal of *less* important data and use of window.ts())

 But I am currently running out of ideas...
 So if sombody could point out something, I would be greatfull.

 Thanks,

 Jonathan Gabris

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speed up plotting to MSWindows graphics window

2011-04-27 Thread Mike Marchywka















 Date: Wed, 27 Apr 2011 14:40:23 +0200
 From: jonat...@k-m-p.nl
 To: r-help@r-project.org
 Subject: Re: [R] Speed up plotting to MSWindows graphics window


 On 27/04/2011 13:18, Mike Marchywka wrote:
 
   Date: Wed, 27 Apr 2011 11:16:26 +0200
   From:jonat...@k-m-p.nl
   To:r-help@r-project.org
   Subject: [R] Speed up plotting to MSWindows graphics window
  
   Hello,
  
   I am working on a project analysing the performance of motor-vehicles
   through messages logged over a CAN bus.
  

  
   I am currently plotting the data in R, overlaying 5 or more plots of
   data, logged at 1kHz, (using plot.ts() and par(new = TRUE)).
   The aim is to be able to pan, zoom in and out and get values from the
   plotted graph using a custom Qt interface that is used as a front end to
   R.exe (all this works).
   The plot is drawn by R directly to the windows graphic device.
  
   The data is imported from a .csv file (typically around 100MB) to a 
   matrix.
   (timestamp, message ID, byte0, byte1, ..., byte7)
   I then separate this matrix into several by message ID (dimensions are
   in the order of 8cols, 10^6 rows)
  
   The panning is done by redrawing the plots, shifted by a small amount.
   So as to view a window of data from a second to a minute long that can
   travel the length of the logged data.
  
   My problem is that, the redrawing of the plots whilst panning is too
   slow when dealing with this much data.
   i.e.: I can see the last graphs being drawn to the screen in the
   half-second following the view change.
   I need a fluid change from one view to the next.
  
   My question is this:
   Are there ways to speed up the plotting on the MSWindows display?
   By reducing plotted point densities to*sensible* values?
  Well, hard to know but it would help to know where all the time is going.
  Usually people start complaining when VM thrashing is common but if you are
  CPU limited you could try restricting the range of data you want to plot
  rather than relying on the plot to just clip the largely irrelevant points
  when you are zoomed in. It should not be too expensive to find the
  limits either incrementally or with binary search on ordered time series.
  Presumably subsetting is fast using foo[a:b,]
 
  One thing you may want to try for change of scale is wavelet or
  multi-resolution analysis. You can make a tree ( increasing memory usage
  but even VM here may not be a big penalty if coherence is high ) and
  display the resolution appropriate for the current scale.
 
 
 I forgot to add, for plotting I use a command similar to:

 plot.ts(timestampVector, dataVector, xlim=c(a,b))

 a and b are timestamps from timestampVector

 Is the xlim parameter sufficient for limiting the scope of the plots?
 Or should I subset the timeseries each time I do a plot?

well, maybe time series knows the data to be ordered, I never use
that, but in general it has to go check each point and clip the out
of range ones. It could I suppose binary search for start/end points
but I don't know.Based on what you said it sounds like it does not.




 The multi-resolution analysis looks interesting.
 I shall spend some time finding out how to use the wavelets package.

 Cheers!

 
   Using something other than plot.ts() - is the lattice package faster?
   I don't need publication quality plots, they can be rougher...
  
   I have tried:
   -Using matrices instead of dataframes - (works for calculations but not
   enough for plots)
   -increasing the max usable memory (max-mem-size) - (no change)
   -increasing the size of the pointer protection stack (max-ppsize) - (no
   change)
   -deleting the unnecessary leftover matrices - (no change)
   -I can't use lines() instead of plot() because of the very different
   scales (rpm-1, flags -1to3)
  
   I am going to do some resampling of the logged data to reduce the vector
   sizes.
   (removal of*less* important data and use of window.ts())
  
   But I am currently running out of ideas...
   So if sombody could point out something, I would be greatfull.
  
   Thanks,
  
   Jonathan Gabris
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting

Re: [R] Survival analysis: same subject with multiple treatments and experience multiple events

2011-04-23 Thread Mike Marchywka











 Date: Fri, 22 Apr 2011 10:00:31 -0400
 From: littleduc...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Survival analysis: same subject with multiple treatments and 
 experience multiple events

 Hi there,

 I need some help to figure out what is the proper model in survival analysis
 for my data.
 Subjects were randomized to 3 treatments in trial 1, some of them experience
 the event during the trial;
 After period of time those subjects were randomized to 3 treatments again in
 trial 2, but different from what they got in 1st trial, some of them
 experience the event during the 2nd trial (I think the carryover effect can
 be ignored since the time between two trials is long enough.)

 What I am interested is whether the survival functions differ among
 treatments. How should I deal with the correlation between the observation
 since the same subject was treated with two different drugs in two trials?
 Should I add TRIAL , whether the event happened before, or number of
 times the event happened before as covariate(s)?

 Any input will be appreciated. Thank you.
 Qian


No one else replied so I would just suggest a web search using the term 
crossover design

http://www.google.com/#q=cran+crossover+design+survival

and refer you to any FDA panel discussions regarding any drugs that have been 
debated with
similar trial designs as part of the debate.

http://www.google.com/#sclient=psyhl=ensite=source=hpq=site:fda.gov+briefing+crossover

The point of the above is to get some idea what can happen as no battle plan 
survives first
contact with data. Usually the objective in these designs is to infer something 
about causality in some system
and you just use the statistics to avoid fooling yourself. Personally it seems 
to me that
understanding of disease and host dynamics is improving to the point where you 
can
do more with the carryover effct that you mention as well as parametric 
putative prognostic
factors but you can also see opinions vary.



  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using Java methods in R

2011-04-23 Thread Mike Marchywka

















 Date: Sat, 23 Apr 2011 05:32:59 -0700
 From: hill0...@umn.edu
 To: r-help@r-project.org
 Subject: Re: [R] Using Java methods in R

 No answer to my post,
 so let's try a simpler question. Am I doing this correctly?
 I have the RGui with R Console on the screen.
 On rhe top pullDowns, Packages  Install Packages  USA(IA) rJava
  library(rJava)
  .jinit()
  qsLin - .jnew(C:/ad/j/CalqsLin)
 Error in .jnew(C:/ad/j/CalqsLin) :
 java.lang.NoClassDefFoundError: C:/ad/j/CalqsLin

i haven't used rjava yet, I think I installed it on linux
for testing but no real usage, but this message
appears to come from jvm probably because you
specified an aboslute path. You can try this from
command line,




$ ls ../geo/dayte*
../geo/dayte.class

mmarchywka@phlulap01 /cygdrive/c/d/phluant/duphus
$ java ../geo/dayte
Exception in thread main java.lang.NoClassDefFoundError: ///geo/dayte
Caused by: java.lang.ClassNotFoundException: ...geo.dayte
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
Could not find the main class: ../geo/dayte.  Program will exit.


you want to search for the classpath and set that up then
refernce the class name not the path ( the class loader will follow
the path looking for it ). 
Not sure how you do that in rjava under 'dohs but
if you search for that term is should be apparent.
And of course 'dohs and pdf aren't always friendly
for automated work so get something like cygwin and
reduce pdf's to text etc.







 So I got this error which means I don't understand very much.
 I go to C:/ad/j and get
 C:\ad\jdir CalqsLin.class
 Volume in drive C has no label.
 Volume Serial Number is 9A35-67A2
 Directory of C:\ad\j
 04/23/2011 07:11 AM 14,651 CalqsLin.class
 1 File(s) 14,651 bytes
 0 Dir(s) 104,257,716,224 bytes free

 Just to show my intentions,
 I had next wanted to call this java function method:
 linTimOfCalqsStgIsLev(20110405235959,-4) using:
  dblTim -
  .jcall(qsLin,D,linTimOfCalqsStgIsLev,20110405235959,-4)
 but that will probably also be wrong?
 Obviously I don't understand.


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Using-Java-methods-in-R-tp3469299p3469848.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using Java methods in R

2011-04-23 Thread Mike Marchywka















 From: marchy...@hotmail.com
 To: hill0...@umn.edu; r-help@r-project.org
 Date: Sat, 23 Apr 2011 15:12:30 -0400
 Subject: Re: [R] Using Java methods in R
















 
  Date: Sat, 23 Apr 2011 05:32:59 -0700
  From: hill0...@umn.edu
  To: r-help@r-project.org
  Subject: Re: [R] Using Java methods in R
 
  No answer to my post,
  so let's try a simpler question. Am I doing this correctly?
  I have the RGui with R Console on the screen.
  On rhe top pullDowns, Packages  Install Packages  USA(IA) rJava
   library(rJava)
   .jinit()
   qsLin - .jnew(C:/ad/j/CalqsLin)
  Error in .jnew(C:/ad/j/CalqsLin) :
  java.lang.NoClassDefFoundError: C:/ad/j/CalqsLin

 i haven't used rjava yet, I think I installed it on linux
 for testing but no real usage, but this message
 appears to come from jvm probably because you
 specified an aboslute path. You can try this from
 command line,




 $ ls ../geo/dayte*
 ../geo/dayte.class

 mmarchywka@phlulap01 /cygdrive/c/d/phluant/duphus
 $ java ../geo/dayte
 Exception in thread main java.lang.NoClassDefFoundError: ///geo/dayte
 Caused by: java.lang.ClassNotFoundException: ...geo.dayte
 at java.net.URLClassLoader$1.run(Unknown Source)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
 at java.lang.ClassLoader.loadClass(Unknown Source)
 Could not find the main class: ../geo/dayte.  Program will exit.


 you want to search for the classpath and set that up then
 refernce the class name not the path ( the class loader will follow
 the path looking for it ).
 Not sure how you do that in rjava under 'dohs but
 if you search for that term is should be apparent.
 And of course 'dohs and pdf aren't always friendly
 for automated work so get something like cygwin and
 reduce pdf's to text etc.



It appears that by default the current directory is on the
classpath. I must have installed this on 'dohs before
and if I copy the class file into R directory it
can find it,




 library(rJava)
 .jinit()
 .jnew(foo)
Error in .jnew(foo) : java.lang.NoClassDefFoundError: foo
 .jnew(dayte)
[1] Java-Object{dayte@1de3f2d}


that's unlikely to fix all your problems, you want to set the classpath
but if you know the term and can load cygwin you should be able
to find the way to set that up.
















 
  So I got this error which means I don't understand very much.
  I go to C:/ad/j and get
  C:\ad\jdir CalqsLin.class
  Volume in drive C has no label.
  Volume Serial Number is 9A35-67A2
  Directory of C:\ad\j
  04/23/2011 07:11 AM 14,651 CalqsLin.class
  1 File(s) 14,651 bytes
  0 Dir(s) 104,257,716,224 bytes free
 
  Just to show my intentions,
  I had next wanted to call this java function method:
  linTimOfCalqsStgIsLev(20110405235959,-4) using:
   dblTim -
   .jcall(qsLin,D,linTimOfCalqsStgIsLev,20110405235959,-4)
  but that will probably also be wrong?
  Obviously I don't understand.
 
 
  --
  View this message in context: 
  http://r.789695.n4.nabble.com/Using-Java-methods-in-R-tp3469299p3469848.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to answer the question about transitive correlation?

2011-04-22 Thread Mike Marchywka












 Date: Fri, 22 Apr 2011 11:42:59 +0800
 From: mailzhu...@gmail.com
 To: r-help@r-project.org
 Subject: [R] How to answer the question about transitive correlation?

 Hi, everyone. I know it may be a basic statistical question. But I can't
 find a good answer.

 I have a question raised by one of the reviewers.
 Factor A expression was strongly correlated with B expression (chi-square)
 in this series. Prior reports by the same authors showed that B expression
 strongly correlated with survival (Log-rank). Please provide an explanation
 why then were the results not transitive.


The only explanation that would have any value would require you post the
data and let everyone look at it and any R output would be a benefit too. 
So you compared A and B with one test, 
B vs C with another, cutoff the result at some arbitrary criterion, and
then wonder why some unspeficied test and criterion applied to A vs C
doesn't vote in the majority with the same answer? Changing acceptance levels
or applying a correction after careful shoppping can always fix that
logical inconsistency LOL. Its not hard to plot 3 normal curves on same graph
and see one example of how their overlaps can relate.

I'm not sure what to think but perhaps you could look at stuff like this,

http://en.wikipedia.org/wiki/Path_analysis_%28statistics%29



  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extrapolating data points for individuals who lack them

2011-04-20 Thread Mike Marchywka

 From: de...@exeter.ac.uk
 To: r-help@r-project.org
 Date: Wed, 20 Apr 2011 12:41:29 +0100
 Subject: [R] Extrapolating data points for individuals who lack them

 Hi,

 We have an experiment where individuals responses were measured over 5 days. 
 Some responses were not obtained because we only allowed individuals to 
 respond within a limited time-frame. These individuals are given the maximum 
 response time as they did not respond, yet we feel they may have done if 
 given time (and by looking at the rest of their responses over time, the 
 non-response days stand out).

 We therefore want to extrapolate data points for individuals, on days when 
 they didn't respond, using a regression of days when they did.

 Does anyone know how we could do this quickly and easily in R?

You are probably talking about right censoring. See things like this, 
( you may have good luck just with R rather than CRAN ) 

http://www.google.com/#sclient=psyhl=ensource=hpq=CRAN+informative+%22right+censoring%22

If you post data maybe someone can try a few things. It isn't hard to take data
subsets, fit models, and replace data with model predictions but easier and more
interesting to illustrate with your data.

Personally I would avoid making up data and of course extrapolation tends
to be the most error prone way of doing that. 

 Thanks very much
 Dave

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regression and lmer

2011-04-18 Thread Mike Marchywka

Date: Mon, 18 Apr 2011 03:27:40 -0700
From: lampria...@yahoo.com
To: r-help@r-project.org
Subject: Re: [R] regression and lmer

Dear all,

I hope this is the right place to ask this question.

( hotmail not marking your text sorry I can';t find option to change that  )
opinions vary but mods seem to let this by and personally it
seems appropriate to discuss general questions about what R 
can do. 

( end my text ) 

I am reviewing a research where the analyst(s) are using a linear
regression model. The dependent variable (DV) is a continuous measure.
The independent variables (IVs) are a mixture of linear and categorical
variables.

( my text ) 

No one wants to do homework or do your job for free but open
free peer review should not be a problem philosophically. 

( / my text, afraid to use less-than for inciting hotmail ) 

The
author investigates whether performance (DV - continuous linear) is a
function of age (continuous IV1 - measured in years), previous
performance (continuous IV2), country (categorical IV3 - six countries), the 
percentage of PhD graduates in each country (continuous IV4 -
country level data - apparently only six different percentages since we
have only six countries) and population of country (continuous IV5 - country 
level data - again only six numbers here, one for each country population).

My own opinion is that the lm function cannot be used with country
level data as IVs (for example IV4 and IV5 cannot be entered into the
model because they are country level data). If IV4 and IV5 are included
in the model, it is possible that the model will not be able to be
defined because we only have six countries and it is very likely that
the levels of counties (IV3) may be confounding with IV4 and IV5. This
also calls for multicollinearity issues, right? I would like to suggest
to the analyst to use lmer using the IV3 as a random variable and  IV4
and IV5 as IV at the second level of the two-level model.

The questions are: (a) Is it true that IV4 and IV5 cannot be entered in a
one-level regression if we also have IV3?, (b) can I use an lm function to 
check for multicollinearity
between IV3, IV4 and IV5?  and (c) If we use a two-level regression
model, does lmer cope well with only six coutnries as a random effect?

( my txt)

So you have presumably a large number of test subjects per country and a small 
number
( n~6 ) of countries. You could ask a number of questions such as,  do the 
mean performances
change from country to country by more than that expected given the observed 
distributions of
performances within country? You could also ask a question like,  if I try to 
describe performance
as a function of country attriubte what fitting parameters minimize an error 
between fit and observation?
Apparently author tried to write an expression like 

average_performance= a[country_index] + m1*some_attribute_of_country + m2* 
some_other_attribute_of_country + b

and then expected the fittring algoright to pick a[i], b,m1, and m2 in such a 
way as to minimize
the resulting error. The reported fits hopefully minimize the error function 
but then you need
to exmaine the second derivative in various directions, so you have to ask how 
the error varies
as you change a[i],b,m1, and m2. ( Ignore b right now and assume it s included 
in a[i]). 
I guess if you can find a direction where the error can not change due to these 
contraints then
it would seem to be impossible for the fit to come up with unique values. If 
you change
each a[i] and m1 by some amounts, for example, can you pick those amounts to 
not change anything? 

( /my text ) 

Thank you for your help

Jason
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rsquared for anova

2011-04-17 Thread Mike Marchywka




( did this msg make it through the lists as rich text? hotmail
didn't seem to think it was plain text?)

Anyway, having come in in the middle of this it isn't clear
if your issues are with R or stats or both. Usually the hard
core stats people punt the stats questions to other places but
both can be addressed somewhat.
In any case, exploratory work is a good way to learn both and I 
always like looking at new data. If you have one or
a few dependent variable and many independent variable,
it would probably help if you could visualize a
surface with the response as a function of the input
variables and then, maybe with the input of prior information or
anecdotes, you have some idea what tests or
analyses would make sense. 

just some thoughts for illustration only

df-read.table(results_processedCP.txt,header=T)


first it helps to make sure everything went ok and do quick
checks, for example, 

str(df)
unique(df$nh1)
unique(df$nh2)
unique(df$nh3)
unique(df$randsize)
unique(df$aweoghts)
unique(df$aweights)


now personally lots of binary variable confuse me and
I can munge them all together since I expect I can
later identify issues in following plots. So, with
this data you can create a composite variable like this,
( now I have not checked any of this for accuracy
and typos and other problems may render the results useless)

x=df$nh1+2*df$nh2+4*df$nh3+2*df$randsize+32*df$aweights
df2-cbind(df,x)
str(df2)

not sure if time was an input or output but you could
see if there is any obvious trend or periodicity of
time with your new made up variable,

plot(df2$time,df2$x)

Apparently x is a num rather than int, it can be changed for illustration
but probably of no consequence,

xi=as.integer(x)
str(xi)

and then you can add color based on this varaiable, 

min(xi)
c=rainbow(56)
cx=c[xi+1]
str(cx)

and make color coded scatter plots. Now, if you 
got lucky and guessed right you may see some patterns
that you want to test, 

plot(df2$tos,df2$tws,col=cx)

in this case, I get a cool red-yellow-green line along bottom ( very
compelling linear fit question ) and scattered magenta( pink red? LOL ) and 
blue points
everywhere with cluster near origin and nothing in top right quadrant. 
Also note a few blues lines above the red-green-yellow line but much shorter.

And in fact, presumably you already knew this as it looks like it was designed
in, if you just plot the red and green points the fit looks perfect for linear,


 good=which(df2$x20)
 plot(df2$tos[good],df2$tws[good],col=cx[good])

now if you look at results of fit of Good points vs all points,
it isn't clear that anything like this would emerge from just
looking at summaries of a linear fit, 


td=df2$tos[good]
ti=df2$tws[good]
lm(td~ti)
lm(df2$tos~ df2$tws)
summary(lm(td~ti))
summary(lm(df2$tos~ df2$tws))





Now of course tests need to be considered ahead of time or else
it is easy to go shopping for the answer you want. Anything post hoc
needs to be very complete and you should at least try to rationalize
test results you don't happen to like ( assuming you are trying to understand
the system from which the data was measured rather than justify some
particular outcome). 




Date: Sun, 17 Apr 2011 11:34:14 +0200
From: dorien.herrem...@ua.ac.be
To: dieter.me...@menne-biomed.de
CC: r-help@r-project.org
Subject: Re: [R] Rsquared for anova

Thanks for your remarks. I've been reading about R for the last two days,
but I don't really get when I should use lm or aov.
 
I have attached the dataset, feel free to take a look at it.
 
So far, running it with alle the combinations did not take too long and
there seem to be some effects between the parameters. However, 2x2
combinations might suffice.
 
Thanks for any help, or a pointer to some good documentation,
 
Dorien
 
 
On 16 April 2011 10:13, Dieter Menne dieter.me...@menne-biomed.de wrote:
 

 dorien wrote:
 
  fit - lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,
  data=expdata))
  Error: unexpected ',' in fit -
  lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,
 
 

 Peter's point is the important one: too many interactions, and even with +
 instead of * you might be running into problems.

 But anyway: if you don't let us access


 /home/dorien/UA/meta-music/optimuse/optimuse1-build-desktop/results/results_processedCP

 you cannot expect a better answer which will depend on the structure of the
 data set.

 Dieter



 --
 View this message in context:
 http://r.789695.n4.nabble.com/Rsquared-for-anova-tp3452399p3453719.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
-- 
Dorien Herremans
 
*Department of Environment, Technology and Technology Management*
Faculty of Applied

Re: [R] Identify period length of time series automatically?

2011-04-14 Thread Mike Marchywka












 Date: Thu, 14 Apr 2011 11:29:23 +0200
 From: r.m.k...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Identify period length of time series automatically?

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi

 I have 10.000 simulations for a sensitivity analysis. I have done a few
 sensitivity analysis for different response variables already,
 but now, as most of the simulations (if not all) show some cyclic
 behaviour, see how the independent input parameter influence the
 frequency of the cyclic changes and how cyclic they actually are.

 So effectively, I have 39 values, and I want to identify automatically
 the frequency / period length of the series and a kind of a measure on
 how cyclic the series is.

Probably google Digital Signal Processing or Fourier transform.
From this, you resolve your time series into sinusoids of various components
and you can separate peaks in line spectra from background noise. 
Depending on what you consider to be cyclic the analysis details
will vary. If you look at things like amplitude and frequncy modulation
of one sine wave with another and various relationships between carrier and
modulation frequency, you can get some ideas of what to look for in spectra.

Alternatively, you can try to define exactly what you mean by cyclic
and maybe make a better transform that discriminates that from acyclic
but offhand I would suggest FFT and various tests on the spectra.


Just off hand I'm not sure that 39 points would be a lot to go on
but you can simulate some examples in R quite easily if you know
what the data looks like in various cases you think may exist.






 How can I do that automatically without individual checking? I do not
 want to do an eyeball assessment for 10.000 time series

 Thanks,

 Rainer

 - --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)

 Centre of Excellence for Invasion Biology
 Stellenbosch University
 South Africa

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Incremental ReadLines

2011-04-14 Thread Mike Marchywka

 Date: Wed, 13 Apr 2011 10:57:58 -0700
 From: frederikl...@gmail.com
 To: r-help@r-project.org
 Subject: Re: [R] Incremental ReadLines

 Hi there,

 I am having a similar problem with reading in a large text file with around
 550.000 observations with each 10 to 100 lines of description. I am trying
 to parse it in R but I have troubles with the size of the file. It seems
 like it is slowing down dramatically at some point. I would be happy for any

This probably occurs when you run out of physical memory but you can
probably verify by looking at task manager. A readline() method
wouldn't fit real well with R as you try to had blocks of data
so that inner loops, implemented largely in native code, can operate
efficiently. The thing you want is a data structure that can use
disk more effectively and hide these details from you and algorightm.
This works best if the algorithm works with data strcuture to avoid
lots of disk thrashing. You coudl imagine that your read would do
nothing until each item is needed but often people want the whole
file validated before procesing, lots of details come up with exception
handling as you get fancy here. Note of course that your parse output
could be stored in a hash or something represnting a DOM and this could
get arbitrarily large. Since it is designed for random access, this may
cause lots of thrashing if partially on disk. Anything you can do to 
make access patterns more regular, for example sort your data, would help.

 suggestions. Here is my code, which works fine when I am doing a subsample
 of my dataset.

 #Defining datasource
 file - filename.txt

 #Creating placeholder for data and assigning column names
 data - data.frame(Id=NA)

 #Starting by case = 0
 case - 0

 #Opening a connection to data
 input - file(file, rt)

 #Going through cases
 repeat {
 line - readLines(input, n=1)
 if (length(line)==0) break
 if (length(grep(Id:,line)) != 0) {
 case - case + 1 ; data[case,] -NA
 split_line - strsplit(line,Id:)
 data[case,1] - as.numeric(split_line[[1]][2])
 }
 }

 #Closing connection
 close(input)

 #Saving dataframe
 write.csv(data,'data.csv')

 Kind regards,

 Frederik

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3447859.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identify period length of time series automatically?

2011-04-14 Thread Mike Marchywka
















 Date: Thu, 14 Apr 2011 12:42:28 +0200
 From: r.m.k...@gmail.com
 To: marchy...@hotmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] Identify period length of time series automatically?

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 14/04/11 11:57, Mike Marchywka wrote:
 
 
 
 
 
 
 
 
 
 
  
  Date: Thu, 14 Apr 2011 11:29:23 +0200
  From: r.m.k...@gmail.com
  To: r-help@r-project.org
  Subject: [R] Identify period length of time series automatically?
 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Hi
 
  I have 10.000 simulations for a sensitivity analysis. I have done a few
  sensitivity analysis for different response variables already,
  but now, as most of the simulations (if not all) show some cyclic
  behaviour, see how the independent input parameter influence the
  frequency of the cyclic changes and how cyclic they actually are.
 
  So effectively, I have 39 values, and I want to identify automatically
  the frequency / period length of the series and a kind of a measure on
  how cyclic the series is.

 Hi Mike,

 thanks for your answer - it confirms my fears ...

Well, if it is just that you wanted an alt to FFT I have not
provided one so it shouldn't confirm that fear as others could
exist. Indeed, if you have an idea of what you want you may
be able to create your own.  It wasn't clear in OP that you
had looked at FFT. 


 
  Probably google Digital Signal Processing or Fourier transform.
  From this, you resolve your time series into sinusoids of various components
  and you can separate peaks in line spectra from background noise.
  Depending on what you consider to be cyclic the analysis details
  will vary. If you look at things like amplitude and frequncy modulation
  of one sine wave with another and various relationships between carrier and
  modulation frequency, you can get some ideas of what to look for in spectra.

 That is what I thought as well. As I have no idea about fourier
 analysis, could you give me a small example in R, which gives me the
 frequencies of the resulting sin waves after a fourier transformation?
 I only see large matrices as return values when using e.g. fft().

 
  Alternatively, you can try to define exactly what you mean by cyclic
  and maybe make a better transform that discriminates that from acyclic
  but offhand I would suggest FFT and various tests on the spectra.

 the shape of the fluctuations can be quite different - so no common
 pattern there.

 
 
  Just off hand I'm not sure that 39 points would be a lot to go on
  but you can simulate some examples in R quite easily if you know
  what the data looks like in various cases you think may exist.

 Well - the data is over a year summed up data from daily data points, so
 I could easily go to daily data, which would be 365*39. But that would
 make the analysis probably more difficult, as I have seasonal
 fluctuations, and fluctuations over several years (1, 2, 3, 4, ...?;
 depending on the parameters used for the simulation).

plot(abs(fft(x)) as presumably your confusion is due to complex
result and spectrum is probably all you care about at first. 
If that works, there are various R facilities for peak detection
that may be of use. There may be programs to fit FFT assuming
discrete + contiuum noise but I have not looked, there are apparently
things like that for fitting mass spectra.





 Any ideas on how to do this in R?

 I have the feeling, that the quesion id more difficult then I thought...

 Rainer


 
 
 
 
 
 
  How can I do that automatically without individual checking? I do not
  want to do an eyeball assessment for 10.000 time series
 
  Thanks,
 
  Rainer
 
  - --
  Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
  Biology, UCT), Dipl. Phys. (Germany)
 
  Centre of Excellence for Invasion Biology
  Stellenbosch University
  South Africa
 
 


 - --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)

 Centre of Excellence for Invasion Biology
 Stellenbosch University
 South Africa

 Tel : +33 - (0)9 53 10 27 44
 Cell: +33 - (0)6 85 62 59 98
 Fax : +33 - (0)9 58 10 27 44

 Fax (D): +49 - (0)3 21 21 25 22 44

 email: rai...@krugs.de

 Skype: RMkrug
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

 iEYEARECAAYFAk2mz5QACgkQoYgNqgF2egqZ8QCfZrtSmYczWo+Gq9NgY25mtP5Q
 LHwAn3qaWKoo2wkc4pjTe9skZhcW7UL+
 =4uTI
 -END PGP SIGNATURE-
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Incremental ReadLines

2011-04-14 Thread Mike Marchywka










 Date: Thu, 14 Apr 2011 11:57:40 -0400
 Subject: Re: [R] Incremental ReadLines
 From: frederikl...@gmail.com
 To: marchy...@hotmail.com
 CC: r-help@r-project.org

 Hi Mike,

 Thanks for your comment.

 I must admit that I am very new to R and although it sounds interesting
 what you write I have no idea of where to start. Can you give some
 functions or examples where I can see how it can be done.

I'm not sure I have a good R answer, simply pointing out the likley
isuse and maybe the rest belongs on r-develoiper list or something.
If you can determine you are running out of physical memory, then you
either need to partitition somehting or make accesses more regular.
My favorite example from personal experience is sorting a data set
prior to piping into a c++ program that changed the execution time 
substantially by avoiding VM thrashing. R either needs a swapping buffer
or has an equivalent that someone else could mention.



 I was under the impression that I had to do a loop since my blocks of
 observations are of varying length.

 Thanks again,

 Frederik

 On Thu, Apr 14, 2011 at 6:19 AM, Mike Marchywka
  wrote:





 
  Date: Wed, 13 Apr 2011 10:57:58 -0700
  From: frederikl...@gmail.com
  To: r-help@r-project.org
  Subject: Re: [R] Incremental ReadLines
 
  Hi there,
 
  I am having a similar problem with reading in a large text file with around
  550.000 observations with each 10 to 100 lines of description. I am trying
  to parse it in R but I have troubles with the size of the file. It seems
  like it is slowing down dramatically at some point. I would be happy
 for any

 This probably occurs when you run out of physical memory but you can
 probably verify by looking at task manager. A readline() method
 wouldn't fit real well with R as you try to had blocks of data
 so that inner loops, implemented largely in native code, can operate
 efficiently. The thing you want is a data structure that can use
 disk more effectively and hide these details from you and algorightm.
 This works best if the algorithm works with data strcuture to avoid
 lots of disk thrashing. You coudl imagine that your read would do
 nothing until each item is needed but often people want the whole
 file validated before procesing, lots of details come up with exception
 handling as you get fancy here. Note of course that your parse output
 could be stored in a hash or something represnting a DOM and this could
 get arbitrarily large. Since it is designed for random access, this may
 cause lots of thrashing if partially on disk. Anything you can do to
 make access patterns more regular, for example sort your data, would help.


  suggestions. Here is my code, which works fine when I am doing a subsample
  of my dataset.
 
  #Defining datasource
  file - filename.txt
 
  #Creating placeholder for data and assigning column names
  data - data.frame(Id=NA)
 
  #Starting by case = 0
  case - 0
 
  #Opening a connection to data
  input - file(file, rt)
 
  #Going through cases
  repeat {
  line - readLines(input, n=1)
  if (length(line)==0) break
  if (length(grep(Id:,line)) != 0) {
  case - case + 1 ; data[case,] -NA
  split_line - strsplit(line,Id:)
  data[case,1] - as.numeric(split_line[[1]][2])
  }
  }
 
  #Closing connection
  close(input)
 
  #Saving dataframe
  write.csv(data,'data.csv')
 
 
  Kind regards,
 
 
  Frederik
 
 
  --
  View this message in context:
 http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3447859.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting edgelist to symmetric matrix/ plotting sparse network with lots of nodes

2011-04-11 Thread Mike Marchywka

 Date: Sat, 9 Apr 2011 14:34:28 -0700
 From: kmshafi...@yahoo.com
 To: r-help@r-project.org
 Subject: [R] Converting edgelist to symmetric matrix

 Hi,
 I have network data in the form of a couple of edgelists containing weights in
 the format x,y,weight whereby x represents row header and y represents 
 column
 header. All edgelists are based on links among 634 nodes and I need to convert
 them into a 634*634 weighted matrix.

 I searched for online help using possible keywords I know, but I could not 
 find
 a clue how to do this in R. Any help will be appreciated.

I'd replied earlier suggesting the ncol format, I'd like to follow up on
that as I have tried with some success but maybe someone else can comment
on alternatives and suggest ideas for plotting. I have a set of nodes or states
specified by two parameters ( these are isotopes specified by proton and mass 
connected
by decay paths with probability of that path being its weight). 
This seems to almost work for your needs( note that I have taken out a lot of 
extraneous stuff
and may have dropped somethimng important LOL, also setup for Rgraphviz is not 
simple 
on 'dohs as i had to manually edit env variable etc) ,

library(Rgraphviz)
library(QuACN)
nxg-read.graph(ncol.txt,format=ncol)
nn-igraph.to.graphNEL(nxg)
aasd-adjacencyMatrix(nn)
 str((aasd))
 num [1:2561, 1:2561] 0 100 95.8 2.7 0 0 0 0 0 0 ...
 - attr(*, dimnames)=List of 2
  ..$ : chr [1:2561] 17_10 17_9 16_9 15_7 ...
  ..$ : chr [1:2561] 17_10 17_9 16_9 15_7 ...

$ head ncol.txt
17_10 17_9 100
17_10 16_9 95.8
17_10 15_7 2.7
18_10 18_9 100
19_10 19_9 100
23_10 23_11 100
243_100 239_98 100
245_100 241_98 100
246_100 242_98 92
246_100 246_99 1

However, for my needs plotting has been a big problem. I apparently
have 2561 isotopes ( none of this has been validated yet LOL) that
are sparely connected by a few decay modes ( presumably acyclic directed
graph but DAG in searches didn't help much ). 

Any thoughts on which R classes to try to visualize this or even what I 
should be thinking about artistically? This is largely just a way to learn
R for some other things I want to do for analyzing data on wireless devices
but I am curious about this result too. 

Some of the things I did try are below,

library(Rgraphviz)
library(QuACN)
nxg-read.graph(ncol.txt,format=ncol)
foo-adjacencyMatrix(nxg)
?graphNEL
?NELgraph
df-data.frame(nxg)
plot.igraph(nxg,layout=layout.svd)
rglplot.igraph(nxg,layout=layout.svd)
rglplot.igraph(nxg,layout=layout.svd)
rglplot.igraph(nxg)
tkplot.igraph(nxg)
library(tcltk)
tkplot.igraph(nxg)
tkplot(nxg)
dx-decompose.graph(nxg)

nn-igraph.to.graphNEL(nxg)
igraph.plotting(nxg)
library(sna)
gplot(nxg)
dx-get.adjacency(nxg)
gplot(dx)
gplot3d(dx)
plot(nxg)
library(ElectroGraph)
eg-electrograph(nxg)
eg-electrograph(aasd)
plot(eg)

Thanks.

 Best regards,
 Shafique

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting edgelist to symmetric matrix

2011-04-10 Thread Mike Marchywka

 Date: Sat, 9 Apr 2011 14:34:28 -0700
 From: kmshafi...@yahoo.com
 To: r-help@r-project.org
 Subject: [R] Converting edgelist to symmetric matrix

 Hi,
 I have network data in the form of a couple of edgelists containing weights in
 the format x,y,weight whereby x represents row header and y represents 
 column
 header. All edgelists are based on links among 634 nodes and I need to convert
 them into a 634*634 weighted matrix.

not find
 a clue how to do this in R. Any help will be appreciated.

I'm trying to do something related and found

?read.graph 

will format=ncol do what you need? This apparently creates a graph object that 
likely
has capacilities you need.  Again, I haven't actually used any of this
just found while trying to solve a different problem. 

'It is a simple text file
with one edge per line. An edge is defined by two symbolic vertex
names separated by whitespace. (The symbolic vertex names
themselves cannot contain whitespace. They might followed by an
optional number, this will be the weight of the edge; the number
can be negative and can be in scientific notation. If there is no
weight specified to an edge it is assumed to be zero. 
'

 Best regards,
 Shafique

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scrap java scripts and styles from an html document

2011-04-07 Thread Mike Marchywka

Date: Thu, 7 Apr 2011 04:15:50 -0700
From: antuj...@gmail.com
To: r-help@r-project.org
Subject: Re: [R] Scrap java scripts and styles from an html document

Hi ,

I am working on developing a web crawler.

Comments like this come up on the list every few weeks or so and I
keep suggesting that someone ( other than me of course LOL) investigates
an R interface to webkit for any efforts that require mimic of
large parts of a browser function. Perhaps just make
a debug build or custom build of webkit to dump whatever it is you want
into a structured text file
( I've actually done this for what would amount to a crawler, I modified
maybe one or two classes to output the links being fetched to stdout but
I think there are ways to dump a DOM or other stuff in a format usable by R).
For valid pages, you can just parse html as xml and get what you want in this
case but usually people are looking for information only apparent after
large pieces of js are executed. If you want comments only, these
may be easy to isolate yourself.If you google CRAN HTML parser some
hits do come up, for example

http://cran.r-project.org/web/packages/scrapeR/scrapeR.pdf

http://r.789695.n4.nabble.com/How-to-import-HTML-and-SQL-files-td879480.html

Removing javascripts and styles is a part of the cleaning of the html
document.
What I want is a cleaned html document with only the html tags and textual
information,
so that i can figure out the pattern of the web page. This is being done to
extract relevant
information from the webpage like comments for a particular product.

For e.g the amazon.com has all such comments within the
and tags,
with regular
occuring for breaks. So tags which appear the most help us in
locating the required information. Different websites have different
patterns,
but its more likely that tags that will occur the most will have the
relevant information enclosed in them.

So, once the html page is cleaned, it would be easy to role up the tags and
knowing their frequency of occurrence, we can target the information.

Should there be any suggestions to help, please let me know. I would be more
than pleased.

Regards,
Antuj

--
View this message in context:
http://r.789695.n4.nabble.com/Scrap-java-scripts-and-styles-from-an-html-document-tp3413894p3433052.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] system() command in R

2011-04-05 Thread Mike Marchywka













 Date: Tue, 5 Apr 2011 13:37:12 +0530
 From: nandan.a...@gmail.com
 To: rasanpreet.k...@gmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] system() command in R

 On 4 April 2011 16:54, rasanpreet kaur suri wrote:

  Hi all,
  I have a local server insalled on my system and have to start that from
  within my R function.
 
  here is how I start it:
 
  cmd-sh start-server.sh
 
  system(cmd, wait=FALSE)
 
  My function has to start the server and proceed with further steps. The
  server starts but the further steps of the program are not executed.The
  cursor keeps waiting after the server is started.
 
  How r u executing further steps after starting server, meant for server
 from R ??


 i tried removing the wait=FALSE, but it still keeps waiting.
 
  I also tried putting the start-server in a separate function and my further
  script in a separate function and then run them together, but it still
  waits. The transition from the start of server to next step is not
  happening.
 
  Please help. I have been stuck on this for quite some time now.
 
  --

I hadn't done this in R but expect to do so soon.
I just got done with some java code to do something similar
and you can expect in any implementation these things will be 
system dependent. It often helps to have simple test cases
to isolate the problem. Here I made a tst script called foo
that takes a minute or so to exevute and generates some output.

If I type
system(./foo,wait=F) 

the prompt comes back right away but stdout seems to still
go to my console and maybe stdin is not redicrected either
and it could eat your input ( no idea, but this is probably not what you
want).

I did try this that could fix your problem, on debian anyway
it seems to work,

system(nohup ./foo )

you can man nohup for details. 














  Rasanpreet Kaur
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Amar Kumar Nandan
 Karnataka, India, 560100
 http://aknandan.co.nr

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time series example in Koop

2011-04-05 Thread Mike Marchywka

 Date: Tue, 5 Apr 2011 07:35:04 -0500
 From: ravi.k...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Time series example in Koop

 I am trying to reproduce the output of a time series example in Koop's book
 Analysis of Financial Data. Koop does the example in Excel and I used the
 ts function followed by the lm function.
 I am unable to get the exact coefficients that Koop gives - my coefficients
 are slightly different.

 After loading the data file and attaching the frame, my code reads:

  y = ts(m.cap)
  x = ts(oil.price)
  d = ts.union(y,x,x1=lag(x,-1),x2=lag(x,-2),x3=lag(x,-3),x4=lag(x,-4))
  mod1 = lm(y~x+x1+x2+x3+x4, data=d)
  summary(mod1)

 Koop gives an intercept of 92001.51, while the code above gives 91173.32.
 The other coefficients are also slightly off.

The differences here seem to be of order 1 percent. You could suspect
a number of things, including the published data file being published
to less precision than that used in the book numbers(also look at number
of points and see if any were added or dropped etc ). However, you may want
to judge these based on what they do to your error which they
presumably are both supposed to minimize but the calculation of which could
be subject to various roundoff errors etc. Unless minimization is done
analytically, it is of course subject to limitations of convergence or iteration
count. Plotting both fits over the data and looking at residuals may help too.
Depending on what you are really trying to do, you may want to change
your error calculation etc. 

Details of numerical results often depend on details of implementation.
This is why stats packages that are not open
source have limitations in applicability. With real models of course
things get even more confusing. 
( take a look at credit rating agencies results for example LOL).

 This is the example in Table 8.3 of Koop. I also attach a plain text version
 of the tab separated file badnews.txt.
 http://r.789695.n4.nabble.com/file/n3427897/badnews.txt badnews.txt

 Any light on why I do not get Koop's coefficients is most welcome...

 Ravi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help

2011-04-03 Thread Mike Marchywka

 Date: Sun, 3 Apr 2011 01:35:16 +0530
 From: nandan.a...@gmail.com
 To: padmanabhan.vija...@gmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] help

 One way that u might have thought of is to create plot in PDF in R and the
 use pdftools.
 Additionally one can also think of running R script using R CMD and then
 using pdftools in a .sh script file if u r in linux.
 I am not aware of pdftools capability in R.

 On 2 April 2011 23:01, Vijayan Padmanabhan wrote:

  Dear R Help group
  I need to run a command line script from within R session. I am not clear
  how i can acheive this. I tried shell and system function, but i am missing
  something critical.can someone provide help?
  My intention is to create a pdf file of a plot in R and then attach
  existing files from my system as attachment into the newly created pdf
  file.
  Any help would be greatly appreciated.. Here is the command line script i
  want to execute from within R.

  pdftools -S attachfiles=C:\test1.pdf -i C:\test2.pdf -o C:\test4.pdf

  Regards
  Vijayan Padmanabhan

I just tried 

 system(pdftk --help)

and it appeared to work as I have pdftk from cygwin.I routinely do this the 
other way however and invoke
R from a bash script and then use external tools like this from the bash script
after R is done. If I'm generating various pieces, it seems to make sense
to get them all first and release any resources R has accumulated 
as pdf manipulation itself can often require lots of memory etc.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Asking Favor For the Script of Median Filter

2011-03-27 Thread Mike Marchywka


( sorry if this is a duplicate, I am not sure if hotmail is
dropping some of my posts. Thanks )






 You obviously want to delegate inner loops to R packages that
 execute as native, hopefully optimized, code.
 Generally a google search that starts with R CRAN will help.
 In this case it looks like a few packages available,

 http://www.google.com/search?sclient=psyhl=enq=R+cran+median+filter













 
 Date: Sun, 27 Mar 2011 07:56:11 -0700
 From: chuan...@hotmail.com
 To: r-help@r-project.org
 Subject: [R] Asking Favor For the Script of Median Filter

 Hello,everybody. My name is Chuan Zun Liang. I come from Malaysia. I am just
 a beginner for R. Kindly to ask favor about median filter. The problem I
 facing as below:


 x-matrix(sample(1:30,25),5,5)
 x
 [,1] [,2] [,3] [,4] [,5]
 [1,] 7 8 30 29 13
 [2,] 4 6 12 5 9
 [3,] 25 3 22 14 24
 [4,] 2 15 26 23 19
 [5,] 28 18 10 11 20

 This is example original matrices of an image. I want apply with median
 filter with window size 3X# to remove salt and pepper noise in my matric.
 Here are the script I attend to writing.The script and output shown as
 below:

 MedFilter-function(mat,sz)
 + {out-matrix(0,nrow(mat),ncol(mat))
 + for(p in 1:(nrow(mat)-(sz-1)))
 + {for(q in 1:(ncol(mat)-(sz-1)))
 + {outrow-median(as.vector(mat[p:(p+(sz-1)),q:(q+(sz-1))]))
 + out[(p+p+(sz-1))/2,(q+q+(sz-1))/2]-outrow}}
 + out}

 MedFilter(x,3)
 [,1] [,2] [,3] [,4] [,5]
 [1,] 0 0 0 0 0
 [2,] 0 8 12 14 0
 [3,] 0 12 14 19 0
 [4,] 0 18 15 20 0
 [5,] 0 0 0 0 0

 Example to getting value 8 and 12 as below:

 7 8 30 8 30 29
 4 6 12 (median=8) 6 12 5 (median=12)
 25 3 22 3 22 14

 Even the script can give output. However, it is too slow. My image size is
 364*364. It is time consumption. Is it get other ways to improving it?

 Best Wishes
 Chuan

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Asking Favor For the Script of Median Filter

2011-03-27 Thread Mike Marchywka

















 CC: chuan...@hotmail.com; r-help@r-project.org
 From: dwinsem...@comcast.net
 To: marchy...@hotmail.com
 Subject: Re: [R] Asking Favor For the Script of Median Filter
 Date: Sun, 27 Mar 2011 18:30:48 -0400


 On Mar 27, 2011, at 1:07 PM, Mike Marchywka wrote:

  You obviously want to delegate inner loops to R packages that
  execute as native, hopefully optimized, code.
  Generally a google search that starts with R CRAN will help.
  In this case it looks like a few packages available,
 
  http://www.google.com/search?sclient=psyhl=enq=R+cran+median+filter

 Did you find any that include a 2D median filter? All the ones I
 looked at were for univariate data.

I put almost zero thought or effort into that but an interested
party could modify the words a bit and and, for example image
and one of the first interesting hits is this, 


http://cran.r-project.org/web/packages/biOps/biOps.pdf

x - readJpeg(system.file(samples, violet.jpg, package=biOps))
y - imgBlockMedianFilter(x, 5)



 --
 David.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

2011-03-25 Thread Mike Marchywka













 Date: Fri, 25 Mar 2011 09:40:39 +
 From: all...@cybaea.com
 To: muenchen@gmail.com
 CC: frien...@yorku.ca; had...@rice.edu; r-h...@stat.math.ethz.ch
 Subject: Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated

 Not R, but just to get the data (format is month year,week,count) to
 compare with your students' output:

[...]
 Hope this helps a little.

 Allan
 (Who thinks it is very sad that he can remember that $c=()=$a=~$b
 construct...)


 On 22/03/11 23:26, Bob Muenchen wrote:
  On 3/22/2011 5:15 PM, Hadley Wickham wrote:
  I don't doubt that R may be the most popular in terms of
  discussion group
  traffic, but you should be aware that the traffic for SAS comprises two
  separate lists that used to be mirrored, but are no longer linked


I think this discussion highlights the need for more structured document
formats on the internet so you can separete out mirrored or copied
text( see for example all the ad supported sites that simply copy wikipedia 
content). 
I've sometimes done things like this with pubmed
citations but they provide something called eutils api


http://eutils.ncbi.nlm.nih.gov/

so you don't need to scrape html or other human readable content.
It is then easy to plot paper or author count as function of year
for some key word criteria and it can be interesting to see how
fads come and go. 
You see a lot of questions here on  how do I use R to scrape html
and it is a big problem in doing many kinds of analysis. Yahoo
is doing a nice service by making downloads of histoical data available
but this is not common, even places like census often only offer
Excel format downloads ( while this is fine for R users, csv files would be just
as good and reach a wider audience ) 
and some places do require you to make complicated POST or other request types. 




  Usenet -- news://comp.soft-sys.sas (what you counted)
  listserve -- SAS-L http://www.listserv.uga.edu/archives/sas-l.html
  R programming challenge: create a script that parses those html pages
  to compute the total number of messages per week! (Maybe I'll use
  this in class)
 
  Hadley
 
 
  That would be nice! I'd love to have all the sources, which includes
  various company forums. Sounds like students could be kept busy for
[[elided Hotmail spam]]
 
  Cheers,
  Bob
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with Snowball RWeka

2011-03-24 Thread Mike Marchywka

 Date: Thu, 24 Mar 2011 03:35:31 -0700
 From: kont...@alexanderbachmann.de
 To: r-help@r-project.org
 Subject: [R] Problem with Snowball  RWeka

 Dear Forum,

 when I try to use SnowballStemmer() I get the following error message:

 Could not initialize the GenericPropertiesCreator. This exception was
 produced: java.lang.NullPointerException

 It seems to have something to do with either Snowball or RWeka, however I
 can't figure out, what to do myself. If you could spend 5 minutes of your
 valuable time, to help me or give me a hint where to look for, it would be
 very much appreciated.

 Thank you very much.

If you only want answers from people who have encountered this exact problem
before then that's great but you are more likely to get a useful response
if you include reproducible code and some data to produce the error you have
seen. Sometimes I investigate these things because they involve a package
or objective I wanted to look at anyway. It could be that the  only problem is 
that the OP
missed something in documentation or had typo etc. 

In this case, to pursue it from the perspective of debuggin the code, 
you probably want to find some way to get a stack trace and then find out
which java variable was null and relate it back to how you invoked it. 
This likely points to a missing object
in your call or maybe the installation lacked a dependency as this
occured during init, but hard to speculate with what you have provided. 
You could try reinstalling and check for errors.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rapache ( was Developing a web crawler )

2011-03-22 Thread Mike Marchywka

 Subject: Re: [R] Rapache ( was Developing a web crawler )
 From: m...@biostatmatt.com
 To: marchy...@hotmail.com
 CC: r-help@r-project.org
 Date: Sun, 6 Mar 2011 13:51:53 -0500

 On Sun, 2011-03-06 at 08:06 -0500, Mike Marchywka wrote:

   Date: Thu, 3 Mar 2011 13:04:11 -0600
   From: matt.shotw...@vanderbilt.edu
   To: r-help@r-project.org
   Subject: Re: [R] Developing a web crawler / R webkit or something 
   similar? [off topic]

   On 03/03/2011 08:07 AM, Mike Marchywka wrote:

Date: Thu, 3 Mar 2011 01:22:44 -0800
From: antuj...@gmail.com
To: r-help@r-project.org

 Hi Mike,

 If you've built and configured RApache, then the difficult plowing is
 over :). RApache operates at the top (HTTP) layer of the OSI stack,
 whereas Rserve works at the lower transport/network layer. Hence, the
 scope of Rserve applications is far more general. Extending Rserve to
 operate at the HTTP layer (via PHP) will mean more work.

I finally got back to this and started from scratch on a clean machine.
It took most of the day, on and off, but downloading and building R, apache,
and rapache was relatively easy and info page worked but I had to go
get Cairo and various packages to get graphic demo pages to work.
I'll probably have to play with it a bit to see if I can use
it for anything useful but getting it to run was not too difficult
( I think before I didn't bother to build Apache from source and the failure 
mode
wasnt real clear ) . 

 RApache offers high level functionality, for example, to replace PHP
 with R in web pages. No interface code is necessary. Here's a simple
 What's The Time? webpage using RApache and yarr [1] to handle the
 code:

  setContentType(text/html\n\n) 

 Message body

 Here's a live version: [2]. Interfacing PHP with Rserve in this context
 would be useful if installation of R and/or RApache on the web host were
 prohibited. A PHP/Rserve framework might also be useful in other
 contexts, for example, to extend PHP applications (e.g. WordPress,
 MediaWiki).

 Best,
 Matt

 [1] http://biostatmatt.com/archives/1000
 [2] http://biostatmatt.com/yarr/time.yarr

   -Matt

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

2011-03-22 Thread Mike Marchywka

Date: Tue, 22 Mar 2011 09:31:01 -0700
From: crossp...@hotmail.com
To: r-help@r-project.org
Subject: [R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?

Dear all,

I want to improve my adj - R sq. I 've chequed some established models and
they introduce two times the same variable, one transformed, and the other
not. It also improves my adj - R sq.

But, isn't this bad for the collinearity? Do I interpret coefficients as
usual?

I'm not sure how many replies you got or if your question was answered but just
offhand
let me see if I understand your concern.
If your data is only over a limited range of v1 where you can Taylor
expand to linear term only then sure it can be hard to tell a linear from log
dependence
of quantify a mixture of the two. If you try to find a and b
to fit y=a*f(x) + b*g(x) that minimizes some error, you should be able
to see the issues on paper. Presumaly log is not linear over a larger
range and any error function, like SSE, would have reasonbly peaked
minimum for some values of the two coefficients but you could do a sensitivty
analysis to check- find the second derivatives of your error function or
just perturb the coefficients a bit. I guess if there is some direction
where the error does not change as a and b vary then you have the case you
are worried about. I'm not sure what you consider to be usual but
when I'm doing something like this, I usually have some physical
interpretation mind. Most uninfomratively, you could interpret these
coefficients as those which minimize your error given the data you have :)
What you do from there depends on a lot of specifics. To tell if
a given function seems to be appropriate for the data, it is always good
to look at a plot of residuals. Note that ability to find a unique
set of coefficients that minimizes a given error has nothing to do
with independence of the two terms attached to the coefficients- indeed
polynomial fits are a common example( log having a taylor series just constrains
a lot of coefficient relationships LOL).

P-values and confidence intervals are another matter with post hoc
exploratory work but I'll let a statistician comment on that
as well as the meaning of the R output.
Usually the final decision on a putative model impovement comes
from your ability to infer something about the underlying system
although you may just want a simple empirical approximation
and be more worried about meeting a given error with a limited
number of computations etc etc.

Apparently you found on a retrospective literature search that
everyone else is using the log term.
Sometimes you see people ask questions like, given that in 10 papers on
the subject 4 of them used the log term and these authors have historically
been right 50 percent of the time but the other 6 are right 40 percent of the
time, what are the chances that the log term should be included? I will
also avoid commenting on this question except to say it illustrates
a number of ways people do approach these problems and what you consider
to be relevant to your situation.

Estimate Std. Error t value Pr(|t|)
(Intercept) 1.73140 7.22477 0.240 0.81086
v1 -0.33886 0.20321 -1.668 0.09705 .
log(v1) 2.63194 3.74556 0.703 0.48311
v2 -0.01517 0.01089 -1.394 0.16507
log(v3) -0.45719 0.27656 -1.653 0.09995 .
factor1 -1.81517 0.62155 -2.920 0.00392 **
factor2 -1.87330 0.84375 -2.220 0.02759 *

Analysis of Variance Table

Response: height rise
Df Sum Sq Mean Sq F value Pr(F)
v1 1 51.25 51.246 21.4128 6.842e-06 ***
log(v1) 1 13.62 13.617 5.6897 0.018048 *
v2 1 2.84 2.836 1.1850 0.277713
log(v3) 1 3.02 3.024 1.2638 0.262357
factor1 1 17.62 17.616 7.3608 0.007279 **
factor2 1 11.80 11.797 4.9294 0.027586 *
Residuals 190 454.71 2.393

Thanks,
u...@host.com

--
View this message in context:
http://r.789695.n4.nabble.com/lm-v1-log-v1-improve-adj-Rsq-any-sense-tp3396935p3396935.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] How to draw a map of Europe?

2011-03-19 Thread Mike Marchywka

 Date: Sat, 19 Mar 2011 19:32:43 -0500
 From: shali...@gmail.com
 To: r-help@r-project.org
 Subject: [R] How to draw a map of Europe?

 Hi R users,

 I need to draw a map of select European countries with country names shown
 on the map. Does anyone know how to do this in R?

 Also, is it possible to draw a historical map of European countries using R?

This came up on this list a while ago and I was just using that example
for some work I'm doing but I have only copied what was posted
and added minor things to it. I can't give you actual answer
but you are probably just a few key words away from a google searhc.
 The term you are probably looking for
is shapefile. Try that on google. For example,

http://www.google.com/#sclient=psyhl=enq=shapefile+cran+europe

turns up things that may help.

 Thanks a lot for your help.

 Maomao

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make sure R's g++ compiler uses certain C++ flags when making a package

2011-03-17 Thread Mike Marchywka

 Date: Wed, 16 Mar 2011 13:50:37 -0700
 From: solomon.mess...@gmail.com
 To: r-help@r-project.org
 Subject: Re: [R] How to make sure R's g++ compiler uses certain C++ flags 
 when making a package

 Looks like the problem may be that R is automatically passing this flag to
 the g++ compiler: -arch x86_64 which appears to be causing trouble for
 opencv. Does anyone know how to suppress this flag?

Are you builing a 32 or 64 bit app? I haven't looked but offhand that may be
what this flag is for ( you can man g++ and check I suppose ) 

I was going to reply after doing some checking as I have never actually made
a complete R package but I did make a test one and verified I could call
the c++ code from R. I seem to remember that it was easy to just invoke g++
explicitly and R would use whatever binaries you happen to have left there.
That is, if you just compile the code for your R modules yourself there
is no real reason to worry about how R does it. Mixing some flags will
prevent linking or, worse, create mysterious run time errors.

Also, IIRC, your parameters were the output of foreign scripts ( opencv 
apparently)
and not too helpful here ( and in any case you should echo these to see what 
they
are if they are an issue). 

 --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Spatial cluster analysis of continous outcome variable

2011-03-17 Thread Mike Marchywka








Did you post your data or hypothetical data?
Usually that helps make your problem more clear and more interesting
( likely to get a useful response to your post).




From: tintin...@hotmail.com
To: r-help@r-project.org
Date: Thu, 17 Mar 2011 17:38:14 +0100
Subject: [R] Spatial cluster analysis of continous outcome variable



Dear R Users, R Core
Team,
I have a two dimensional space where I measure a numerical value in two 
situations at different points. I have measured the change and I would like to 
test if there are areas in this 2D-space where there is a different amount of 
change (no change, increase, decrease). I don´t know if it´s better to analyse 
the data just with the coordinates or if its better to group them in pixels 
(and obtain the mean value for each pixel) and then run the cluster analysis. I 
would like to know if there is a package/function that allows me to do these 
calculations.I would also like to know if it could be done in a 3D space (I 
have collapsed the data to 2D because I don´t have many points.
Thanks in advance





J Toledo
[[alternative HTML version deleted]]

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] minimum distance between line segments

2011-03-10 Thread Mike Marchywka













 Date: Wed, 9 Mar 2011 10:55:46 +1300
 From: darcy.web...@gmail.com
 To: r-help@r-project.org
 Subject: [R] minimum distance between line segments

 Dear R helpers,

 I think that this may be a bit of a math question as the more I
 consider it, the harder it seems. I am trying to come up with a way to
 work out the minimum distance between line segments. For instance,
 consider 20 random line segments:

 x1 - runif(20)
 y1 - runif(20)
 x2 - runif(20)
 y2 - runif(20)

 plot(x1, y1, type = n)
 segments(x1, y1, x2, y2)

 Inititally I thought the solution to this problem was to work out the
 distance between midpoints (it quickly became apparent that this is
 totally wrong when looking at the plot). So, I thought that perhaps
 finding the minimum distance between each of the lines endpoints AND
 their midpoints would be a good proxy for this, so I set up a loop
 that uses pythagoras to work out these 9 distances and find the
 minimum. But, this solution is obviously flawed as well (sometimes
 lines actually intersect, sometimes the minimum distances are less
 etc). Any help/dection on this one would be much appreciated.


I wasn't too happy before since I thought there was a good and simple
way to approach this and no one came up with it, including me. On further
thought, I seem to recall that a vector approach should be quite general
without special cases. For example, describe the segments 
as R=a*n^ + B where n^ is a unit vector in the direction of the line,
B an initial point, a a scalar describing single coordinate along the line,
and R the point on line corresponding to a. This should avoid issues with 
infinite
slopes and other issues. You should be able to derive, for each segment,
a set of n^, B, amin, and amax . calculating d=|R1-R2 | as a vector expression
should be trivial and then you can minimize and constrain a. Presumably d
as a function of a1 and a2 will allow you to verify you can limit to amin and 
amax
and let you see how to generalize to 3D etc. Since the segment is described
originally by 4 numbers and this representation uses 6, you have some freedom
in how to do this. The first thought is taking the initial point as B and then
amin is zero and amax could be 1 if you don't actually bother to normalize n^
and just take it as (dx,dy). And again in 2D nonparallel lines intersect
so the distance between them is likely to be zero until you limit their extents.



I haven't done this in so long I forgot about vectors but I hate special
cases when there is a more concise and general way to approach a prblem :)
As always with free advice on the internet, caveat emptor, but I'd be curious
to see if anyone knows better. I think someone else suggested checking
if they interesct first but again this is motivated by an attempt to
avoid special cases and get a better understanding of what is going on.











 Thanks in advance,
 Darcy.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] minimum distance between line segments

2011-03-09 Thread Mike Marchywka

 To: r-h...@stat.math.ethz.ch
 From: hwborch...@googlemail.com
 Date: Wed, 9 Mar 2011 17:45:53 +
 Subject: Re: [R] minimum distance between line segments

 Darcy Webber  gmail.com writes:

  Dear R helpers,

  I think that this may be a bit of a math question as the more I
  consider it, the harder it seems. I am trying to come up with a way to
  work out the minimum distance between line segments. For instance,
  consider 20 random line segments:

  x1 - runif(20)
  y1 - runif(20)
  x2 - runif(20)
  y2 - runif(20)

  plot(x1, y1, type = n)
  segments(x1, y1, x2, y2)

  Inititally I thought the solution to this problem was to work out the
  distance between midpoints (it quickly became apparent that this is
  totally wrong when looking at the plot). So, I thought that perhaps
  finding the minimum distance between each of the lines endpoints AND
  their midpoints would be a good proxy for this, so I set up a loop
  that uses pythagoras to work out these 9 distances and find the
  minimum. But, this solution is obviously flawed as well (sometimes
  lines actually intersect, sometimes the minimum distances are less
  etc). Any help/dection on this one would be much appreciated.

  Thanks in advance,
  Darcy.

 A correct approach could proceed as follows:

 (1) Find out whether two segments intersect
 (e.g., compute the intersection point of the extended lines and compare
 its x-coords with those of the segment endpoints);
 if they do, the distance is 0, otherwise set it to Inf.
 (2) For each endpoint, compute the intersection point of the perpendicular
 line through this point with the other segment line; if this point lies
 on the other segment, take the distance, otherwise compute the distance
 to the other two endpoints.
 (3) The minimum of all those distances is what you seek.

 I have done a fast implementation, but the code is so crude that I would
 not like to post it here. If you are really in need I could send it to you.

LOL, I sent a private reply suggesting essentially the opposite approach
since I discovered pmax and pmin. That is, parameterize the location along
lines ofinifinte length and minimize the distance WRT the two locations 
( one for each line). You can do this by hand, find them to be perp or 
probably find an R routine to minimze the distnace. Then, with your array
of positions along the lines, limit them with pmin or pmax. Infinite lines
always cross unless parallel, so you will probably do a lot of clipping,
but stuff like that would probably become apparent as you work through it.
If things fail or you want to extend to 3D, you have some starting point for
improvement. 

This is probably a common issue in some fields, like graphics, thought
there may be something packaged but no idea. 

 --Hans Werner

 (I am not aware of a pure plane geometry package in R --- is there any?)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 >

1 - 100 of 267 matches

Mail list logo