[R] Followup [Can't install or upgrade the "PKI" package on a Debian testing system]
Dear list; since I've posted the message below, Simon Urbanek has fixed the problem (in less than 5 hours, phewww...). The version available on Rforge *is* installable with OpenSSL 1.1 Note that you'll have to install it with "install.packages('PKI',,'https://www.rforge.net/')", NOT install.packages('PKI',,'http://www.rforge.net/'), which, for some inscrutable reason (runaway proxying ?) fails the same way as before. A big Kudos and Thank You to Simon Urbanek, on behalf of not-so-stable distributions users ! Emmanuel Charpentier Original message follows : Dear list, It seems that recent changes in openssl somehow broke the installation or update of the PKI package. This is probably specific of my setup(s) (Debian testing, updated frequently). Copy of a mail to Simon Urbanek (PKI maintainer), sent 5 days ago without reply nor acknowledgement so far : It seems that recent changes in openssl rendered the PKI package uninstallable : -- install.packages("PKI") essai de l'URL 'http://cran.univ-paris1.fr/src/contrib/PKI_0.1-3.tar.gz' Content type 'application/x-gzip' length 31058 bytes (30 KB) == downloaded 30 KB * installing *source* package ‘PKI’ ... ** package ‘PKI’ correctement décompressé et sommes MD5 vérifiées ** libs gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c asn1.c -o asn1.o gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c init.c -o init.o gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c pki-x509.c -o pki-x509.o pki-x509.c: In function ‘PKI_extract_key’: pki-x509.c:136:26: error: dereferencing pointer to incomplete type ‘EVP_PKEY {aka struct evp_pkey_st}’ if (EVP_PKEY_type(key->type) != EVP_PKEY_RSA) ^~ pki-x509.c: In function ‘get_cipher’: pki-x509.c:244:40: error: dereferencing pointer to incomplete type ‘EVP_CIPHER_CTX {aka struct evp_cipher_ctx_st}’ ctx = (EVP_CIPHER_CTX*) malloc(sizeof(*ctx)); ^~~~ pki-x509.c: In function ‘PKI_RSAkeygen’: pki-x509.c:550:5: warning: ‘RSA_generate_key’ is deprecated [-Wdeprecated-declarations] rsa = RSA_generate_key(bits, 65537, 0, 0); ^~~ In file included from /usr/include/openssl/rsa.h:13:0, from pki.h:13, from pki-x509.c:1: /usr/include/openssl/rsa.h:193:1: note: declared here DEPRECATEDIN_0_9_8(RSA *RSA_generate_key(int bits, unsigned long e, void ^ /usr/local/sage-7/local/lib/R//etc/Makeconf:132 : la recette pour la cible « pki-x509.o » a échouée make: *** [pki-x509.o] Erreur 1 ERROR: compilation failed for package ‘PKI’ * removing ‘/usr/local/sage-7/local/lib/R/library/PKI’ Les packages source téléchargés sont dans ‘/tmp/Rtmpnmt97E/downloaded_packages’ Warning message: In install.packages("PKI") : l'installation du package ‘PKI’ a eu un statut de sortie non nul -- This problem blocks the installation of rstanarm, brms, rsconnect and shinystan among others. Not exactly trivial. As far as I know, my libssl libraries are all at 1.1.0c, as well as openssl. New data point : it turns out that this problem also impedes the update of PKI : my running R installatin still has PKI 1.3, which seems enough for rstanarm, brms, rsconnect and shinystan to run and install/upgrade. However, on a new installation of R (installed in Sage), PKI can't be installed. I tried to install PKI 1.3 fro Rforge, with no success. - Did someone already had this problem ? - Has he/she been able to work around it ? If so, How ? - Is my mail to Simon Urbanek sufficient as a bug report ? If not, what should I do ? Sincerely yours, Emanuel Charpentier PS : If possible, I'd appreciate to be CC'd of your answers : I'm not on the list and have had trouble subscribing "reasonably". I'm following it through the mail archives. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can't install or upgrade the "PKI" package on a Debian testing system
Dear list, It seems that recent changes in openssl somehow broke the installation or update of the PKI package. This is probably specific of my setup(s) (Debian testing, updated frequently). Copy of a mail to Simon Urbanek (PKI maintainer), sent 5 days ago without reply nor acknowledgement so far : It seems that recent changes in openssl rendered the PKI package uninstallable : -- > install.packages("PKI") essai de l'URL 'http://cran.univ-paris1.fr/src/contrib/PKI_0.1-3.tar.gz' Content type 'application/x-gzip' length 31058 bytes (30 KB) == downloaded 30 KB * installing *source* package ‘PKI’ ... ** package ‘PKI’ correctement décompressé et sommes MD5 vérifiées ** libs gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c asn1.c -o asn1.o gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c init.c -o init.o gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG -fpic -g -O2 -c pki-x509.c -o pki-x509.o pki-x509.c: In function ‘PKI_extract_key’: pki-x509.c:136:26: error: dereferencing pointer to incomplete type ‘EVP_PKEY {aka struct evp_pkey_st}’ if (EVP_PKEY_type(key->type) != EVP_PKEY_RSA) ^~ pki-x509.c: In function ‘get_cipher’: pki-x509.c:244:40: error: dereferencing pointer to incomplete type ‘EVP_CIPHER_CTX {aka struct evp_cipher_ctx_st}’ ctx = (EVP_CIPHER_CTX*) malloc(sizeof(*ctx)); ^~~~ pki-x509.c: In function ‘PKI_RSAkeygen’: pki-x509.c:550:5: warning: ‘RSA_generate_key’ is deprecated [-Wdeprecated-declarations] rsa = RSA_generate_key(bits, 65537, 0, 0); ^~~ In file included from /usr/include/openssl/rsa.h:13:0, from pki.h:13, from pki-x509.c:1: /usr/include/openssl/rsa.h:193:1: note: declared here DEPRECATEDIN_0_9_8(RSA *RSA_generate_key(int bits, unsigned long e, void ^ /usr/local/sage-7/local/lib/R//etc/Makeconf:132 : la recette pour la cible « pki-x509.o » a échouée make: *** [pki-x509.o] Erreur 1 ERROR: compilation failed for package ‘PKI’ * removing ‘/usr/local/sage-7/local/lib/R/library/PKI’ Les packages source téléchargés sont dans ‘/tmp/Rtmpnmt97E/downloaded_packages’ Warning message: In install.packages("PKI") : l'installation du package ‘PKI’ a eu un statut de sortie non nul -- This problem blocks the installation of rstanarm, brms, rsconnect and shinystan among others. Not exactly trivial. As far as I know, my libssl libraries are all at 1.1.0c, as well as openssl. New data point : it turns out that this problem also impedes the update of PKI : my running R installatin still has PKI 1.3, which seems enough for rstanarm, brms, rsconnect and shinystan to run and install/upgrade. However, on a new installation of R (installed in Sage), PKI can't be installed. I tried to install PKI 1.3 fro Rforge, with no success. - Did someone already had this problem ? - Has he/she been able to work around it ? If so, How ? - Is my mail to Simon Urbanek sufficient as a bug report ? If not, what should I do ? Sincerely yours, Emanuel Charpentier PS : If possible, I'd appreciate to be CC'd of your answers : I'm not on the list and have had trouble subscribing "reasonably". I'm following it through the mail archives. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FYI : XML 3.4.2 still breaks odfWeave 0.7.17
Dear list, perusing the GMane archive shows that the issue with XML 3.4.x still bugs odfWeave users. I just checked that the newer XML 3.4.2 version still give the same problem. Using it to weave a bit of documentation written with LibreOffice 3.3.3 (current in Debian testing) leads me to a 192 Kb, 590 pages document, whereas reinstalling XML 3.2.0 gives me a 4 pages, 24Kb file (not entirely correct, though, but I haven't yet debugged it). I can send the relevant file (.RData, source and good and bad compilations) if this can help. Sincerely yours, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave 0.7.17 stutters on Debian testing 64-bit amd64 systems.
Dear list, This is a copy of a mail sent to Max Kuhn, original author and maintainer of the odfWeave package, which seems not to have received it. It reports a problem that seems to be very implementation specific (reproductible on three Debian testing amd64 machine, does *not* happen on two i686 Debian testing systems, does *not* happen on an Ubuntu 11.06 amd64 machine) and therefore not attributable to odfWeave itself (which is pure R) but to a software component it uses (XML and the underlying libxml2, R itself, etc ...), but I need to know how to track this problem. Apologies fror cross-posting to r-help and r-debian-sig, but I think that the issue is ... complicated and might not be as Debian-specific as it seems at first view. Sincerely, Emmanuel Charpentier Dear Max, A few days ago, I started to have problems with odfWeave 0.7.17 on a couple of amd64 systems : the compiled files contained numerous copies of the source files, more or less interlaced, and a few copies of the target productions. Then I noticed that using an older 32-bit system resulted in correct files. An attempt with yet another machine (recent i686 netbook) confirmed that 32-bit systems gave okay results. Setup : in all machines, I use Debian testing with updates. My packages are self-compiled (i. e. installed via install.packages()). I enclose a very minimalist source and the resulting targets. Logs of execution on 32- and 64-bit systems are affixed after this message. Since odfWeave is pure R, I doubt that it could be the source of the problem. This leaves us with two obvious targets : R itself (the Debian package is current), or the XML library. Do you have any idea about how to proceed to find the source of the problem (and how to fix in) ? Sincerely, Emmanuel Charpentier Execution on a 32-bit system : library(odfWeave) Le chargement a nécessité le package : lattice Le chargement a nécessité le package : XML sessionInfo() R version 2.13.0 (2011-04-13) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8 [5] LC_MONETARY=C LC_MESSAGES=fr_FR.utf8 [7] LC_PAPER=fr_FR.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] odfWeave_0.7.17 XML_3.2-0 lattice_0.19-26 loaded via a namespace (and not attached): [1] grid_2.13.0 system.time(odfWeave(In1.odt, Out1-32.odt)) Copying In1.odt Setting wd to /tmp/RtmpS8lBt8/odfWeave11161739126 Unzipping ODF file using unzip -o In1.odt Archive: In1.odt extracting: mimetype creating: Configurations2/statusbar/ inflating: Configurations2/accelerator/current.xml creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/progressbar/ creating: Configurations2/toolpanel/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ inflating: content.xml inflating: manifest.rdf inflating: styles.xml extracting: meta.xml extracting: Thumbnails/thumbnail.png inflating: settings.xml inflating: META-INF/manifest.xml Removing In1.odt Creating a Pictures directory Pre-processing the contents Sweaving content.Rnw Writing to file content_1.xml Processing code chunks ... 'content_1.xml' has been Sweaved Removing content.xml Post-processing the contents Removing content.Rnw Removing styles.xml Renaming styles_2.xml to styles.xml Removing manifest.xml Renaming manifest_2.xml to manifest.xml Removing extra files Packaging file using zip -r In1.odt . adding: manifest.rdf (deflated 54%) adding: mimetype (stored 0%) adding: Pictures/ (stored 0%) adding: Configurations2/ (stored 0%) adding: Configurations2/images/ (stored 0%) adding: Configurations2/images/Bitmaps/ (stored 0%) adding: Configurations2/menubar/ (stored 0%) adding: Configurations2/progressbar/ (stored 0%) adding: Configurations2/toolbar/ (stored 0%) adding: Configurations2/floater/ (stored 0%) adding: Configurations2/accelerator/ (stored 0%) adding: Configurations2/accelerator/current.xml (stored 0%) adding: Configurations2/popupmenu/ (stored 0%) adding: Configurations2/toolpanel/ (stored 0%) adding: Configurations2/statusbar/ (stored 0%) adding: content.xml (deflated 75%) adding: META-INF/ (stored 0%) adding: META-INF/manifest.xml (deflated 83%) adding: Thumbnails/ (stored 0%) adding: Thumbnails/thumbnail.png (deflated 60%) adding: meta.xml (deflated 56%) adding: styles.xml (deflated 83
Re: [R] odfWeave 0.7.17 stutters on Debian testing 64-bit amd64 systems.
Dear Brian, Thank you very much for this answer. However ... Le samedi 14 mai 2011 à 11:52 +0100, Prof Brian Ripley a écrit : Note the difference in XML versions. odfWeave does not work correctly with XML 3.4-x: this has been reported to the maintainer (and can be seen on the CRAN package checks at http://cran.r-project.org/web/checks/check_results_odfWeave.html). I suggest you try downgrading to an earlier version of XML. Do you mean the XML R package ? Or the libxml2 Debian package ? I am worried by the diference in behaviour of i386 and amd64 versions (and, on amd64, the discrepancies between Ubuntu 11.04 and Debian testing, which are easier to explain : Debian has slightly more recent version). I'll try to downgrade the R xml package (easier) and report results. Again, thank you very much ! Emmanuel Charpentier On Sat, 14 May 2011, Emmanuel Charpentier wrote: Dear list, This is a copy of a mail sent to Max Kuhn, original author and maintainer of the odfWeave package, which seems not to have received it. It reports a problem that seems to be very implementation specific (reproductible on three Debian testing amd64 machine, does *not* happen on two i686 Debian testing systems, does *not* happen on an Ubuntu 11.06 amd64 machine) and therefore not attributable to odfWeave itself (which is pure R) but to a software component it uses (XML and the underlying libxml2, R itself, etc ...), but I need to know how to track this problem. Apologies fror cross-posting to r-help and r-debian-sig, but I think that the issue is ... complicated and might not be as Debian-specific as it seems at first view. Sincerely, Emmanuel Charpentier Dear Max, A few days ago, I started to have problems with odfWeave 0.7.17 on a couple of amd64 systems : the compiled files contained numerous copies of the source files, more or less interlaced, and a few copies of the target productions. Then I noticed that using an older 32-bit system resulted in correct files. An attempt with yet another machine (recent i686 netbook) confirmed that 32-bit systems gave okay results. Setup : in all machines, I use Debian testing with updates. My packages are self-compiled (i. e. installed via install.packages()). I enclose a very minimalist source and the resulting targets. Logs of execution on 32- and 64-bit systems are affixed after this message. Since odfWeave is pure R, I doubt that it could be the source of the problem. This leaves us with two obvious targets : R itself (the Debian package is current), or the XML library. Do you have any idea about how to proceed to find the source of the problem (and how to fix in) ? Sincerely, Emmanuel Charpentier Execution on a 32-bit system : library(odfWeave) Le chargement a nécessité le package : lattice Le chargement a nécessité le package : XML sessionInfo() R version 2.13.0 (2011-04-13) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8 [5] LC_MONETARY=C LC_MESSAGES=fr_FR.utf8 [7] LC_PAPER=fr_FR.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] odfWeave_0.7.17 XML_3.2-0 lattice_0.19-26 loaded via a namespace (and not attached): [1] grid_2.13.0 system.time(odfWeave(In1.odt, Out1-32.odt)) Copying In1.odt Setting wd to /tmp/RtmpS8lBt8/odfWeave11161739126 Unzipping ODF file using unzip -o In1.odt Archive: In1.odt extracting: mimetype creating: Configurations2/statusbar/ inflating: Configurations2/accelerator/current.xml creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/progressbar/ creating: Configurations2/toolpanel/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ inflating: content.xml inflating: manifest.rdf inflating: styles.xml extracting: meta.xml extracting: Thumbnails/thumbnail.png inflating: settings.xml inflating: META-INF/manifest.xml Removing In1.odt Creating a Pictures directory Pre-processing the contents Sweaving content.Rnw Writing to file content_1.xml Processing code chunks ... 'content_1.xml' has been Sweaved Removing content.xml Post-processing the contents Removing content.Rnw Removing styles.xml Renaming styles_2.xml to styles.xml Removing manifest.xml Renaming manifest_2.xml to manifest.xml Removing extra files Packaging file using zip -r In1.odt . adding
Re: [R] odfWeave 0.7.17 stutters on Debian testing 64-bit amd64 systems.
Dear Brian, You are right (as usual...) : downgrading XML to 3.2.0 (the next-to-most-recent version) enables me to compile my minimalistic example correctly. And yes, there is no other XML than XML. Silly me... Now, I have to understand why the i386 and Ubuntu machines were *not* upgraded to 3.4.0 by the update.packages(ask=FALSE) I'm used to do ... Thank you again ! Now I can again use a multiprocessor eficiently to run the same analyses on 18 datasets (thanks to mclapply (multicore)). A serious win in my case... Emmanuel Charpentier Le samedi 14 mai 2011 à 12:59 +0100, Prof Brian Ripley a écrit : On Sat, 14 May 2011, Emmanuel Charpentier wrote: Dear Brian, Thank you very much for this answer. However ... Le samedi 14 mai 2011 à 11:52 +0100, Prof Brian Ripley a écrit : Note the difference in XML versions. odfWeave does not work correctly with XML 3.4-x: this has been reported to the maintainer (and can be seen on the CRAN package checks at http://cran.r-project.org/web/checks/check_results_odfWeave.html). I suggest you try downgrading to an earlier version of XML. Do you mean the XML R package ? Or the libxml2 Debian package ? I am Only the R package is called XML. Why would I write XML if I meant libxml2? worried by the diference in behaviour of i386 and amd64 versions (and, on amd64, the discrepancies between Ubuntu 11.04 and Debian testing, which are easier to explain : Debian has slightly more recent version). I'll try to downgrade the R xml package (easier) and report results. Again, thank you very much ! Emmanuel Charpentier On Sat, 14 May 2011, Emmanuel Charpentier wrote: Dear list, This is a copy of a mail sent to Max Kuhn, original author and maintainer of the odfWeave package, which seems not to have received it. It reports a problem that seems to be very implementation specific (reproductible on three Debian testing amd64 machine, does *not* happen on two i686 Debian testing systems, does *not* happen on an Ubuntu 11.06 amd64 machine) and therefore not attributable to odfWeave itself (which is pure R) but to a software component it uses (XML and the underlying libxml2, R itself, etc ...), but I need to know how to track this problem. Apologies fror cross-posting to r-help and r-debian-sig, but I think that the issue is ... complicated and might not be as Debian-specific as it seems at first view. Sincerely, Emmanuel Charpentier Dear Max, A few days ago, I started to have problems with odfWeave 0.7.17 on a couple of amd64 systems : the compiled files contained numerous copies of the source files, more or less interlaced, and a few copies of the target productions. Then I noticed that using an older 32-bit system resulted in correct files. An attempt with yet another machine (recent i686 netbook) confirmed that 32-bit systems gave okay results. Setup : in all machines, I use Debian testing with updates. My packages are self-compiled (i. e. installed via install.packages()). I enclose a very minimalist source and the resulting targets. Logs of execution on 32- and 64-bit systems are affixed after this message. Since odfWeave is pure R, I doubt that it could be the source of the problem. This leaves us with two obvious targets : R itself (the Debian package is current), or the XML library. Do you have any idea about how to proceed to find the source of the problem (and how to fix in) ? Sincerely, Emmanuel Charpentier Execution on a 32-bit system : library(odfWeave) Le chargement a nécessité le package : lattice Le chargement a nécessité le package : XML sessionInfo() R version 2.13.0 (2011-04-13) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8 [5] LC_MONETARY=C LC_MESSAGES=fr_FR.utf8 [7] LC_PAPER=fr_FR.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] odfWeave_0.7.17 XML_3.2-0 lattice_0.19-26 loaded via a namespace (and not attached): [1] grid_2.13.0 system.time(odfWeave(In1.odt, Out1-32.odt)) Copying In1.odt Setting wd to /tmp/RtmpS8lBt8/odfWeave11161739126 Unzipping ODF file using unzip -o In1.odt Archive: In1.odt extracting: mimetype creating: Configurations2/statusbar/ inflating: Configurations2/accelerator/current.xml creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/progressbar/ creating: Configurations2/toolpanel/ creating
Re: [R] Multivariate binary response analysis
Dear Derek, dear list, A couple suggestions (and more) : 0) check that your problem can't be expressed in terms of what glmer() (lme4(a|b)? package(s)) can solve. I use this often, and am quite impressed with its abilities. 1) Why not modeling that directly in BUGS ? That might not be easy, will almost certainly not be fast to converge and will require lots of post- convergence analysis but offers a direct solution. Current BUGS implementations (OpenBUGS 3.1.x, JAGS 2.x) are well- interfaced to R. My current favorite is JAGS + rjags, but this preference might come from years of frustration about (Win|Open)BUG on Wine on Linux, and the current offering of OpenBUGS + BRugs, currently in beta on the Community page of the OpenBUGS website, seems quite decent. Memory use might have been a problem in the (not so) good old days, but in a era of multigigabyte netbooks, I doubt this is still relevant. Computation time is more of a concern : although Lunn al. (2009)(1) allude to a parallelized version of WinBUGS(2), I am not aware of any publicly-accessible parallelized BUGS interpreter, which is a shame in times where even said netbooks get multicore CPUs ...). 2) The nice MCMCglmm package is able to run a large subclass of DAG Bayesian models expressible as generalizations of (possibly multidependent) multivaried mixed-model GLMs, *way* faster than current BUGS implementations (the author claims an acceleration factor of about 30 on his test problem, and my (limited) experience does not disprove him). It seems (to me) that your problem *might* be expressible in terms palatable to MCMCglm(). Be aware, however, that using this package will *require* some serious understanding of the author's notation used in his (excellent) documentation. Furthermore, if you're interested in the random effects values, you're SOL : I have not (yet) been able to retrieve traces of their distributions in the resultant object. But again, my understanding of this package is limited. My personal bias goes to BUGS, as you might have guessed, but the public health warning that the Spiegelhalter gang put in front of the WinBUGS manual still apply : MCMC sampling can be dangerous. To which I add model checking, convergence assessment and validation might eat more time than modeling itself. HTH, Emmanuel Charpentier -- 1. David Lunn et al., “Rejoinder to commentaries on ‘The BUGS project: Evolution, critique and future directions’,” Statistics in Medicine 28, no. 25 (11, 2009): 3081-3082. 2. http://www.page-meeting.org/pdf_assets/6232-PARALLEL% 20Girgis_POSTER_final.pdf. Accessed on Dec, 14, 2010. On Mon, 13 Dec 2010 15:31:53 -0800, Janszen, Derek B wrote : Greetings ~ I need some assistance determining an appropriate approach to analyzing multivariate binary response data, and how to do it in R. The setting: Data from an assay, wherein 6-hours-post-fertilization zebrafish embryos (n=20, say) are exposed in a vial to a chemical (replicated 3 times, say), and 5 days later observed for the presence/absence (1/0) of defects in several organ systems (yolk sac, jaw, snout, body axis, pericardial edema, etc.) for each fish. The assay can be used as a screen (in which case a single response 'any' is first analyzed) as well as for comparing the responses of different classes of chemicals (a MANOVA-type analysis). Interest is focused on where response(s) are occurring, any associations among responses, and ultimately on possible biological mechanisms (the fish are tested for behavioral activity prior to this assay, and then ground up to provide RNA for microarray assays. A-lotta-data!). What I *wish* I could do is something like glm(response.matrix ~ treat/vial, family=binomial(logit), data=zf.dat) but I know this can't be done. I came across the baymvb (Bayesian analysis of multivariate binary data) package in the R contributed packages archives, but it is no longer supported and the author is incommunicado. In the baymvb function the model is specified as single.stacked.response ~ structure.factor + linear.model.terms. Huh? This looks suspiciously similar to analyzing repeated measures data in SAS as a univariate response with a split-plot structure (which forces the response covariance matrix to have a compound symmetric structure). If this is what's happening with this function it is definitely not appropriate. How about a GEE approach (I'm not familiar with the liter ature, or with any of the R packages that implement it)? Any other recommendations? (NB: just discovered the bayesm package. Don't know if this will work for me or not.) Any help would be greatly appreciated. Derek Janszen, PhD Sr Research Biostatistician Computational Biology Bioinformatics Pacific Northwest National Laboratory Richland, WA 99352 USA derek.jans...@pnl.gov [[alternative HTML version deleted
[R] Second summary [( S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?]
Dear list, The starting solution proposed by Duncan Murdoch is an *excellent* start for Sweave'ing. One can trivially redefine Sweave to start using it quasi-transparently (to use it transparently would require patching Sweave, which would be worth some failsafes and possibly options tinkering). Integrating it on odfWeave is not as simple, because the driver as to spit out correct XML. I suspect that the odfCat() function of odfWeave will be useful, but my (very) cursory glance at odfWeave source did not (yet) ensured me that one could use the shortcuts that the simple structure of \TeX{} file allows. XML *needs* some caution, and the Open Document specification is *not* a simple beast. However, I have good hopes. I'll keep the list posted about further enhancements, as my (somewhat busy) schedule will allow. Again, kudos and a big thank you to Duncan Murdoch, who was able to give a *correct* solution before I have been able to *correctly* state the problem. Emmanuel Charpentier On Sat, 11 Dec 2010 22:58:10 +, Emmanuel Charpentier wrote : [ Snip... ] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summary (Re: (S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?)
Dear list, see comment at end. On Sat, 11 Dec 2010 22:58:10 +, Emmanuel Charpentier wrote : Dear list, Inspired by the original Knuth tools, and for paedaogical reasons, I wish to produce a document presenting some source code with interspersed comments in the source (see Knuth's books rendering TeX and metafont sources to see what I mean). I seemed to remember that a code chunk could be defined piecewise, like in Comments... Chunk1, eval=FALSE, echo=TRUE= SomeCode @ Some other comments... Chunk2, eval=FALSE, echo=TRUE= MoreCode @ And finally, Chunk3, eval=TRUE, echo=TRUE= Chunk1 Chunk2 EndOfTheCode @ That works ... as long as SomeCode, MoreCode and EndOfTheCode are self- standing pieces of R code, but *not* code fragments. You can *not* intersperse comments in, say, a function body, or local() environment this way : when Sweaving, *R* complains of an incomplete source (makes noise about an unexpected end of input at the end of Chunk1, IIRC, and never sees Chunk2). I hoped that Sweave's alternative syntax could offer a way out : no such luck. There seems to be no way to delay R evaluation of a R chunk passed by Sweave ; at least, the eval=FALSE option of chunk declaration is not sufficient for that. Am I missing something in the Sweave nd odfWeve documentations (that I read till I grew green and moldy) ? Or does this require a fundamental change in the relevant Sweave drivers ? Can you suggest alternative ways of doing what I mean to do ? The only workaround I found is to paste a second copy of my code in a \verbatim environment (or, in the case of odfWeave, in the text part), and spice it with \end{verbatim} comments.. \begin{verbatim} chunks. This way, I lose any guarantee of consistency between commented text and effective code. Any other idea ? Emmanuel Charpentier To summarize the answers I got so far, there seems to be no satisfactory solutions to my current problem with either Sweave or odfWeave : - every (Sw|odfW)eave code chunk has to be parseable in itself. One cannot break it in unparseable pieces in home to paste it later ; - the (hypothetical) parse=FALSE option, suggested by Duncan Murdoch, is, well, hypothetical ; - the brew package is not integrated in either Sweave or odfWeave ; - R-style comments will get you only so far (where is math markup when you need it ?) ; - subdivising work in small units is not a practical solution, when your big piece of software is a collection of small parts and local variable assignments, embedded in a (perforce large) local environment to keep things clean... Therefore, I let this problem to sleep. However, I Cc this answer (with the original question below) to Max Kuhn and Friedrich Leisch, in the (faint) hope that this feature, which does not seem to have been missed by anybody in 8 years, might be considered sufficiently useful to grant addition someday... Sincerely yours, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?
Dear list, Inspired by the original Knuth tools, and for paedaogical reasons, I wish to produce a document presenting some source code with interspersed comments in the source (see Knuth's books rendering TeX and metafont sources to see what I mean). I seemed to remember that a code chunk could be defined piecewise, like in Comments... Chunk1, eval=FALSE, echo=TRUE= SomeCode @ Some other comments... Chunk2, eval=FALSE, echo=TRUE= MoreCode @ And finally, Chunk3, eval=TRUE, echo=TRUE= Chunk1 Chunk2 EndOfTheCode @ That works ... as long as SomeCode, MoreCode and EndOfTheCode are self- standing pieces of R code, but *not* code fragments. You can *not* intersperse comments in, say, a function body, or local() environment this way : when Sweaving, *R* complains of an incomplete source (makes noise about an unexpected end of input at the end of Chunk1, IIRC, and never sees Chunk2). I hoped that Sweave's alternative syntax could offer a way out : no such luck. There seems to be no way to delay R evaluation of a R chunk passed by Sweave ; at least, the eval=FALSE option of chunk declaration is not sufficient for that. Am I missing something in the Sweave nd odfWeve documentations (that I read till I grew green and moldy) ? Or does this require a fundamental change in the relevant Sweave drivers ? Can you suggest alternative ways of doing what I mean to do ? The only workaround I found is to paste a second copy of my code in a \verbatim environment (or, in the case of odfWeave, in the text part), and spice it with \end{verbatim} comments.. \begin{verbatim} chunks. This way, I lose any guarantee of consistency between commented text and effective code. Any other idea ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Of possible interest for (Win|Open)BUGS users on Linux platforms
Dear list, It might be of interest to (Win|Open)BUGS users trying to use it from R on Linux platform, and to Linux R users trying to call (Win|Open)BUGS to note that two new packages have appeared on the OpenBUGS User Conributed Code (http://openbugs.info/w/UserContributedCode). R2OpenBUGS seems to be an adaptation of the R2WinBUGS package, able to run native OpenBUGS without Wine contorsions (of late, I have been unable to run WinBUGS from R2WinBUGS correctly without debug option, WinBUGS not opening the script file. Probably yet another wine regression...). A version of BRugs able to use the linux version of OpenBUGS is also avilable. Their author (apparently Chris Jackson from MRC Cambridge, which I thank heartily) states that [p]ending additional user testing, they will be posted to CRAN. I add that such testing (and reporting !) might hasten this posting, an therefore the general availability of these packages. These packages might also be useful to stress-test the other BUGS implementation (JAGS), which gave me pause recently (in the deviance/pD/DIC department...). First data point : I have been able to install these two packages on Ubuntu Lucid (64 bits, using the g++-multilib package) and Debian Squeeze (32 bits) and to test BRugs quickly without apparent problems. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Including a text file (JAGS model) in an R package
Sorry for chiming in a tad late (I was incommunicado for two weeks) Le lundi 02 août 2010 à 18:50 +, Ben Bolker a écrit : Prof Brian Ripley ripley at stats.ox.ac.uk writes: Here is how I do it: write( model { for (year in 1:N) { D[year] ~ dpois(mu[year]) log(mu[year]) - b[1] + step(year - changeyear) * b[2] } for (j in 1:2) { b[j] ~ dnorm(0.0, 1.0E-6) } changeyear ~ dunif(1,N) } , coalmining.jags) which has the advantage that the model description is right there in the .Rd file. If you want a file to be part of the package, put it in e.g. pkg source/inst/JAGSmodels/model.txt and in the example use system.file(JAGSmodels, model.txt, package=whatever) What Prof Ripley said, but also: for JAGS and WinBUGS models you can actually specify coalmining - function() { for (year in 1:N) { D[year] ~ dpois(mu[year]) log(mu[year]) - b[1] + step(year - changeyear) * b[2] } for (j in 1:2) { b[j] ~ dnorm(0.0, 1.0E-6) } changeyear ~ dunif(1,N) } which is even more nicely integrated into the R code ... ... but will fail with BUGS truncation/censoring notation, which cannot pass for R syntax. The R2WinBUGS package has a write.model() function, which I shoplifted, simplified (JAGS doesn't seem to be as picky as WinBUGS in tne numerical notation department), extended (to take advantage of the JAGS' data transform feature) for use with JAGS (which is much more convenient to use nowadays with R on a Unix-like platform than either WinBUGS or OpenBUGS). I planned to submit it to Martyn Plummer, but got interrupted... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe horizontal scrolling
under Unix/X11 (and IIRC under Windows), edit(some.data.frame) will invoke a smallish spreadsheet-like data editor, which will allow you to move around your data with no fuss. I probably wouldn't use it for any serious data entry, but found damn useful in a lot of data-debugging situations (you know, the sort of ... thing ... that hit the fan when you've been coaxed to accept Excel spreatsheet as dats sets). HTH, Emmanuel Charpentier Le lundi 10 mai 2010 à 16:06 -0400, Michael H a écrit : R experts, I am working with large multivariable data frames ( 50 variables) and I would like to scroll horizontally across my output to view each data frame rather than having to scroll down vertically- wrapped data frames.I have been using R Commander as a programming interface. If I assign a long character string to a vector I can scroll across its output easily but not a dataframe of equivalent width. I just want my data frame variables to fully output horizontally rather than partially with vertical wrapping, if that is possible. Any help would be appreciated. Thank you, Mike _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. N:WL:en-US:WM_HMP:042010_1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Sorry for this late answer (I've had a seriously nonmaskable interrupt). Since I have technical questions not related to R, I take the liberty to follow this up by e-mail. I might post a followup summary if another R problem arises... Emmanuel Charpentier Le lundi 19 avril 2010 à 23:40 -0800, Kay Cichini a écrit : hello, it's the Morisita Horn Index, which is an ecological index for similarity between two multivariate objects (vegetation samples with species and its abundance) where a value of one indicates completely same relative importance of species in both samples and 0 denotes total absence of any same species. it can be expressed as a probability: (prob. that an individual drawn from sample j and one from sample k belong to the same species) - = MH-INdex (prob. that two ind. drawn from either sample will belong to same species) [ Technical rambling ] Hmmm ... that's the *ratio* of two probabilities, not a probability. According to http://www.tnstate.edu/ganter/B412%20Ch%201516% 20CommMetric.html, that I found in the first page of answers to a simple google query, in can also be thought of the ratio of two distances between sites : (maximal distnce - actual distance) / maximal distance (with a massive (over-?) simplification). There is no reason to think a priori that the logit transformation (or the asin(sqrt()) transformation has better properties for this index than any other mapping from [0 1] to R. (BTW, a better mapping might be from [0 1] to [0 Inf] or conversely, but negative distances have no (obvious) meaning. Here asin(sqrt()) might make more sense that qlogis().) [ End of technical rambling ] But I have trouble understanding how a similarity or distance index can characterize *one* site... Your data clearly associate a MH.Index to each site : what distance or similarity do you measure at this site ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Le lundi 19 avril 2010 à 03:00 -0800, Kay Cichini a écrit : hi emmanuel, thanks a lot for your extensive answer. do you think using the asin(sqrt()) transf. can be justified for publishing prurpose or do i have to expect criticism. Hmmm ... depends of your reviewers. But if an half-asleep dental surgeon caught that after an insomnia, you might expect that a fully caffeinated reviewer will. Add Murphy's law to the mix and ... boom ! naivly i excluded that possibility, because of violated anova-assumptions, but if i did get you right the finite range rather posses a problem here. No. your problem is that you model a probability as a smooth (linear) finite function of finite variables. Under those assumptions, you can't get a *certitude* (probability 0 or 1). Your model is *intrinsically* inconsistent with your data. In other word, I'm unable to believe both your model (linear whathyoumaycallit regression) and your data (wich include certainties) *simultaneously*. I'd reconsider your 0 or 1, as meaning *censored* quantities (i. e. no farther than some epsilon from 0 or 1), with *hard* data (i. e. not a cooked-up estimate such as the ones i used) to estimate epsilon. There are *lots* of ways to fit models with censored dependent variables. why is it in this special case an advantage? It's bloody hell *not* a specific advantage : if you want to fit a linear model to a a probability, you *need* some function mapping R to the open ]0 1[ (i. e. all reals strictly superior to 0 and strictly inferior to 1 ; I thing that's denoted (0 1) in English/American usage). Asin(sqrt()) does that. However, (asin(sqrt()))^-1 has a big problem (mapping back [0 1] i. e. *including* 0 and 1, *not* (0 1), to R) which *hides* the (IMHO bigger) problem of the inadequacy of your model to your data ! In other words, it lets you shoot yourself in the foot after a nice sciatic nerve articaïne block making the operation painless (but still harmful). On the other hand, logit (or, as pointed by Martin Maechler, qlogis), is kind enough to choke on this (i. e. returning back Inf values, which will make the regression program choke). So please quench my thirst : what exactly is MH.Index supposed to be ? How is it measured, estimated, guessed or divined ? HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE deviance 534.2639 4.42164 0.0807278 0.122396 pi[1] 0.6396 0.10877 0.0019859 0.001892 pi[2] 0.2964 0.06592 0.0012036 0.001163 pi[3] 0.5083 0.04120 0.0007522 0.000805 pi[4] 0.5702 0.07170 0.0013091 0.001221 sigma.lpi[1] 3.0387 0.36356 0.0066376 0.008803 sigma.lpi[2] 2.0519 0.22665 0.0041381 0.005358 sigma.lpi[3] 1.0928 0.12531 0.0022879 0.003305 sigma.lpi[4] 0.8699 0.26650 0.0048656 0.010146 2. Quantiles for each variable: 2.5% 25% 50% 75%97.5% deviance 527.9142 530.9711 533.4347 536.6295 544.7738 pi[1] 0.4130 0.5683 0.6429 0.7195 0.8251 pi[2] 0.1826 0.2511 0.2909 0.3373 0.4390 pi[3] 0.4266 0.4809 0.5093 0.5359 0.5892 pi[4] 0.4183 0.5254 0.5701 0.6150 0.7106 sigma.lpi[1] 2.4362 2.7772 2.9965 3.2600 3.8395 sigma.lpi[2] 1.6606 1.8935 2.0320 2.1811 2.5598 sigma.lpi[3] 0.8803 1.0035 1.0754 1.1711 1.3670 sigma.lpi[4] 0.5022 0.6901 0.8198 0.9906 1.5235 One should also note that the deviance is *extremely* sensitive to the choice of epsilon, whic tends to indicate that, at small epsilon values, the sharp impossible values dominate the evaluation of the oefficients they are involved in. Beta models (not shown) give similar results, to a lesser extend. In short, your sharp data are essentially incompatible with your model, which can't be fitted unless you get at least a rough idea of their precision. HTH, Emmanuel Charpentier, DDS, MSc best regards, kay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmer with non integer weights
Addendum to my previous answer : In that special case, the limited range of the asin(sqrt()) transformation, which is a shortcoming, turns out to be useful. The fixed-effect doefficients seem semi-reasonable (except for stageB) : (sin(coef(lm(asin(sqrt(MH.Index))~0+stage, data=similarity^2 stageAstageBstageCstageD 0.6164870 0.3389430 0.5083574 0.5672021 The random effects being nested in the fixed efect, one can't afford to be lazy in the parametrization of the corresponding random effect : summary(lmer(asin(sqrt(MH.Index))~stage+(stage|site), data=similarity)) Linear mixed model fit by REML Formula: asin(sqrt(MH.Index)) ~ stage + (stage | site) Data: similarity AIC BIC logLik deviance REMLdev 155.3 199 -62.65111.8 125.3 Random effects: Groups NameVariance Std.Dev. Corr site (Intercept) 0.043579 0.20876 stageB 0.033423 0.18282 -0.999 stageC 0.043580 0.20876 -1.000 0.999 stageD 0.043575 0.20875 -1.000 0.999 1.000 Residual 0.128403 0.35833 Number of obs: 136, groups: site, 39 Fixed effects: Estimate Std. Error t value (Intercept) 0.930360.08431 11.035 stageB -0.308790.10079 -3.064 stageC -0.136600.09981 -1.369 stageD -0.077550.14620 -0.530 Correlation of Fixed Effects: (Intr) stageB stageC stageB -0.836 stageC -0.845 0.707 stageD -0.577 0.482 0.487 v-fixef(lmer(asin(sqrt(MH.Index))~stage+(stage|site), data=similarity)) v[2:4]-v[1]+v[2:4] names(v)[1]-stageA (sin(v))^2 stageAstageBstageCstageD 0.6429384 0.3390903 0.5083574 0.5672021 But again, we're exploiting a shortcoming of the asin(sqrt()) transformation. HTH, Emmanuel Charpentier Le vendredi 16 avril 2010 à 00:15 -0800, Kay Cichini a écrit : thanks thierry, i considered this transformations already, but variance is not stabilized and/or normality is neither achieved. i guess i'll have to look out for non-parametrics? best regards, kay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-pkgs] SOAR - Stored object caches for R
Le mercredi 07 avril 2010 à 18:03 +1000, William Venables a écrit : [ Snip ... ] [ ... ] The effect is somewhat like that of the use of the .Data directory in S-PLUS, (a program not unlike R), though somewhat more manually driven. I'm not sure that such old-timers puns qualify for fortune()ification, but this is a good one... William : did you consider possible interaction with other cacheing efforts such as cachesweave ? It seems that the idea of alleviating memory/CPU time requirements is a concern to many people (of course, the intrinsically interactive development process that S/R allow(s) makes this a big concern). Unifying those efforts, and maybe incorporating them in the base interpreter, might be worthwhile... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression in an incomplete dataset
Dear Desmond, a somewhat analogous question has been posed recently (about 2 weeks ago) on the sig-mixed-model list, and I tried (in two posts) to give some elements of information (and some bibliographic pointers). To summarize tersely : - a model of information missingness (i. e. *why* are some data missing ?) is necessary to choose the right measures to take. Two special cases (Missing At Random and Missing Completely At Random) allow for (semi-)automated compensation. See literature for further details. - complete-case analysis may give seriously weakened and *biased* results. Pairwise-complete-case analysis is usually *worse*. - simple imputation leads to underestimated variances and might also give biased results. - multiple imputation is currently thought of a good way to alleviate missing data if you have a missingness model (or can honestly bet on MCAR or MAR), and if you properly combine the results of your imputations. - A few missing data packages exist in R to handle this case. My ersonal selection at this point would be mice, mi, Amelia, and possibly mitools, but none of them is fully satisfying(n particular, accounting for a random effect needs special handling all the way in all packages...). - An interesting alternative is to write a full probability model (in BUGS fo example) and use Bayesian estimation ; in this framework, missing data are naturally modeled in the model used for analysis. However, this might entail *large* work, be difficult and not always succeed (numerical difficulties. Furthermore, the results of a Byesian analysis might not be what you seek... HTH, Emmanuel Charpentier Le lundi 05 avril 2010 à 11:34 +0100, Desmond Campbell a écrit : Dear all, I want to do a logistic regression. So far I've only found out how to do that in R, in a dataset of complete cases. I'd like to do logistic regression via max likelihood, using all the study cases (complete and incomplete). Can you help? I'm using glm() with family=binomial(logit). If any covariate in a study case is missing then the study case is dropped, i.e. it is doing a complete cases analysis. As a lot of study cases are being dropped, I'd rather it did maximum likelihood using all the study cases. I tried setting glm()'s na.action to NULL, but then it complained about NA's present in the study cases. I've about 1000 unmatched study cases and less than 10 covariates so could use unconditional ML estimation (as opposed to conditional ML estimation). regards Desmond __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multilevel modeling with count variables
Your request might find better answers on the R-SIG-mixed-models list ... Anyway, some quick thoughts : Le vendredi 26 mars 2010 à 15:20 -0800, dadrivr a écrit : By the way, my concern with lmer and glmer is that they don't produce p-values, The argumentation of D. Bates is convincing ... A large thread about these issues exist on this (R-help) list archive. I won't get in the (real) debate about the (dubious) value of hypothesis testing, and I'm aware that there are domains where journal editors *require* p-values. Sigh... This insistence seems to be weakening, however. and the techniques used to approximate the p-values with those functions (pvals.fnc, HPDinterval, mcmcsamp, etc.) only apply to Gaussian distributions. (Probably ?) true for pvals.fnc, false for HPDinterval and mcmcsamp. You can always generate a sufficient number of estimates and assess a confidence interval and a p-value from such samples. See any good text on use of simulation in statistics... The fly in the ointment is that current versions of lmer seem to have problems with mcmcsamp(). Bootstrap comes to mind, but might be non-trivial to do efficiently, especially in a hierarchical/multilevel context. Given that I would likely be working with quasi-poisson distributions, is there a good alternative mixed effects modeling approach glmer, Bayesian modeling through BUGS (whose probabilities interpretation is (quite) different from p-values). that would output significance values? No. You'd have to compute them yourself ... if you're able to derive the (possibly asymptotic) distribution of your test statistic under the hypothesis you aim to refute. This nut might be harder to crack than it appears... Now, to come back to your original question, you might try 1) to use glm/glmer to model your dependent variable as a (quasi-)Poisson variable, and 2) use log transformations of the independent cout variables ; a more general solution is given by the Box-Cox transformation family : see the relevant function in MASS, which also offers the logtrans family. ISTR that John Fox's car package offers a function aiming at finding the optimal simultaneous transformations of a dependent variable and its predictors. Other packages I'm not aware of might offer other solutions ; in particular, I'd recommend to peek at Frank Harrell's Hmisc and rms packages documentation, whose wealth I did not yet seriously assess... Bayesian modeling with BUGS would also allow you to try to fit any model you might wish to test (provided that it can be written as a directed acyclic graph and that the distributions of you variables are either from a standard family available in BUGS or that you are able to express the (log-)density of the non-standard distribution you wish to use). But, again, no p-values in sight. Would you settle for Bayes factors between two models ? or DIC comparisons ? HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recommendations on use of - operator
An old (Lisp ? C ?) quote, whose author escapes me now, was : syntactic sugar causes cancer of the semicolon .. but nowadays semicolons are rarely used in real-life R. Emmanuel Charpentier PS and, BTW, neither C nor Lisp have much use for semicolons... Was that Pascal (or one of its spawns) ? Le jeudi 18 mars 2010 à 12:00 +, Prof Brian Ripley a écrit : - is usually only used when you suddenly remember you wanted to keep result of a computation, especially in the days before command-line editors were universal. fn(a,b) ... oh I'd better keep that ... - z On Thu, 18 Mar 2010, Dan Kelley wrote: I have never used - but I noticed at http://github.com/jiho/r-utils/blob/master/beamer_colors.R that some people do. In fact, the above-named code has a sort of elegance about it (except for the use of = for assignment...). To my eye, - calls to mind a type of assignment that is meant to stand out. For example, perhaps it would make sense to use - to assign to things that are not expected to vary through the rest of the code block. Q1: is there a convention, informal or otherwise, on when to use the - operator? Q2: if there is no convention, but if people think a convention might help, what would that convention be? Dan Kelley, PhD Professor and Graduate Coordinator Dept. Oceanography, Dalhousie University, Halifax NS B3H 4J1 kelley@gmail.com (1-minute path on US server) or dan.kel...@dal.ca (2-hour path on Cdn server) Phone 902 494 1694; Fax 902 494 3877 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recommendations on use of - operator
Le jeudi 18 mars 2010 à 15:49 -0400, Duncan Murdoch a écrit : On 18/03/2010 3:10 PM, Emmanuel Charpentier wrote: An old (Lisp ? C ?) quote, whose author escapes me now, was : syntactic sugar causes cancer of the semicolon .. but nowadays semicolons are rarely used in real-life R. Emmanuel Charpentier PS and, BTW, neither C nor Lisp have much use for semicolons... Was that Pascal (or one of its spawns) ? Not sure what you mean about C: it's a statement terminator there. Pascal also uses it, but as a statement separator. Quite right ! And I know I'm falling in that trap every time I don't code in C for more than a few weeks (it has been three months, now...). That's probably because this end-of-statement role desn't strike my psyche as important enough... (And maybe because Lisp and, to a lesser extend, Pascal were my computer mother tongues, APL a serious crush and Fortran something of an (almost)-read-only acquaintance). Nice thinko... Emmanuel Charpentier Still recovering... Duncan Murdoch Le jeudi 18 mars 2010 à 12:00 +, Prof Brian Ripley a écrit : - is usually only used when you suddenly remember you wanted to keep result of a computation, especially in the days before command-line editors were universal. fn(a,b) ... oh I'd better keep that ... - z On Thu, 18 Mar 2010, Dan Kelley wrote: I have never used - but I noticed at http://github.com/jiho/r-utils/blob/master/beamer_colors.R that some people do. In fact, the above-named code has a sort of elegance about it (except for the use of = for assignment...). To my eye, - calls to mind a type of assignment that is meant to stand out. For example, perhaps it would make sense to use - to assign to things that are not expected to vary through the rest of the code block. Q1: is there a convention, informal or otherwise, on when to use the - operator? Q2: if there is no convention, but if people think a convention might help, what would that convention be? Dan Kelley, PhD Professor and Graduate Coordinator Dept. Oceanography, Dalhousie University, Halifax NS B3H 4J1 kelley@gmail.com (1-minute path on US server) or dan.kel...@dal.ca (2-hour path on Cdn server) Phone 902 494 1694; Fax 902 494 3877 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on Linux - a primer
Le dimanche 14 mars 2010 à 18:04 -0400, Axel Urbiz a écrit : Hi, I'm looking to move from Windows into a 64-bit Linux environment. Which is the best Linux Flavor to use within R? To install R on this environment, do I need to do any compiling? I'd like to add two cents of folly to the (wise) advice you've received already. Indeed , Ubuntu is one very good distribution whose management system has made the care and feeding of a Linux system a *lot* easier for the not-so-system-oriented people like yours truly. Whereas my first contacts with a Unix-like system were about 30 years ago (Oh my, how time flies, and how far away are Xenix and our 68000 systems ...), I'm *still* not fond of system maintenance for it's own sake. Ubuntu added an (almost) fool-proof maintenance system to an excellent distribution called Debian, thus lowering the Linux entry bar to as low as it can be humanely made. Some pretended that Ubuntu was a code word for I'm too stupid to configure Debian ; quite untrue ! It only means I'm too busy|lazy to configure Debian, which is a Good Thing (TM). But Debian has its strong points, and one of them is *extremely* strong for an R user : Dirk Eddelbuettel (whose name I'm almost surely misspelling (sorry, Dirk !)) has created a marvelous system called cran2deb which routinely creates binary Debian packages from (almost) the 2000+ R packages available nowadays. That might look small change : the basic tools used for developing/compiling most R packages are small beer (at least by today's standards).But some of them might depend on fiendishly difficult-to-maintain foreign libraries. Dirk's cran2deb takes care of that and creates any information that Debian's dpkg maintenance system needs to automate *your* R maintenance chores by integrating them in Debian's maintenance scheme, which is as automatic as you can get without becoming an incomprehensible beast. In fact, cran2deb is so good that Im seriously tempted to go back to Debian (after almost 8 years of Debian use, Ubuntu's ease-of-use, easy access to no-so-exotic hardware drivers (and the then-incessant politically correct yack-yacking on some Debian mailing lists...) made me switch to an early Ubuntu distribution). I did not yet switch back (mostly for not-so-superficial hardware support reasons), but I maintain a backup Debian installation for the hell of it and to test waters. So far, they have been a lot less rough than they used to be, but there are still occasional rows (e. g. a recent gotcha with openoffice.org, which would have render myself unable to work with those d*mn Word files for about a month, or forced me to do a maual repair (which I hate...)). So consider Debian as a (desirable) alternative to Ubuntu. HTH, Emmanuel Charpentier, DDS, MSc affiliation withdrawn, notwithstanding Frank Harrell's whims... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use of R in clinical trials
Le samedi 20 février 2010 à 09:44 -0800, Dieter Menne a écrit : If you check http://n4.nabble.com/R-help-f789696.html you will note that this thread has the largest number of read since years. Looks like an encouragement to Mark to keep the mentioned CRAN document updated. To add a more serious note to my sarcastic comments earlier: I don't think the FDA or our national agencies in Europe are to blame. I know that they have eminent statisticians there who know what they are talking of and are much more flexible than the culprits. The people to blame for the Nobody got fired for using SAS attitude are reviewers and bosses with mainly medical background who make decisions. It the difference in the use of the term validated which leads to confusion. A method is considered validated in medicine when it has been compared with a gold standard and is non-inferior within reasonable limits. For example, in the diagnosis of a gastric ulcer gastroscopy is the gold standard, and cheaper or less invasive tests are measured against that. However, sometimes gold standards are the easy way out to avoid litigation, and statistical evidence against these is brushed aside. Think of year it needed to accept the Australian bush doctor's evidence that the bacteria Helicobactor pylori is a main determinant for gastric ulcers; everyone knew that ulcers were caused by stress alone. Or gastric emptying: the gold standard is (was?) the use of a radioactive marker that was recorded after a meal. Since radioactivity cannot rise out of nothing, it was a well known fact that stomach content always goes down after a meal. After people started measuring the real volume of the liquid in the stomach with MRI imaging, it came out that the content INCREASED after the meal due to strong gastric secretion. Despite visible evidence from the images, the increase was considered not validated, because is was in contradiction of the gold standard. Validated in medicine means: Some well-known person has made a publication on the subject. He or she may be right, but not always. Mention the word three times in a discussion, and my blood pressure is at 200 mmHg Furthermore, the gold standard might have to be replaced by some platinium. The FDA itself seems open to some radical changes. See : http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm Tho only software products explicitely mentioned in this documents are ... WinBUGS, BRugs and OpenBugs. Not exactly a common component of your average validated toolchain... By the way, a validated acceptance of Bayesian methods for regulatory purposes (for devices, admittedly, but FDAs arguments are *also* applicable to drugs assessment) might well be a window of opportunity for R, which offers : - interfaces to WinBUGS - interface to JAGS, an alternative implementation of the BUGS language and model - various packages for alternative MCMC and/or conjugate estimation of posterior distributions - a good system of production of formally validatable documents (Sweave|odfWeave, and possibly RRPM (Bioconductor)). Looks like R might have a head start over SAS for this new race... However, Bert Gunter's remarks are *still* valid : a lot of money has already been invested in SAS production systems, esp. in the pharmaceutical industry. This industry will consider switching systems if and only if this switch allows them to save *serious* money or to *seriously* streamline their current process. We all know it's possible, but the economic|financial case still has to be done. An implementation of such a system in a new sector (namely medical devices), using the Bayesian headstart and the FDA incentive to consider Bayesian methods, might help making this case. My message: If you hear not validated or validated, question it. My message : timeo apothicarios et dona ferentes Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (quite possibly OT) Re: how to make R running on a Linux server display a plot on a Windows machine
be smart to at least consider this seriously. In short, you should discuss your options (memory, X server, VNC/RDP, Linux) with your *local* friendly help. Again, the R help list is *definitely* *NOT* the right place for learning about the care and feeding of computers. HTH, Emmanuel Charpentier preparing to be shot at dawn by the list police... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data views (Re: (Another) Bates fortune?)
aimed at various data structure problems (including hierarchical data), but not *the* solution to data-representation problem in R. Any thoughts ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are Type II or III contrast? (contrast() in contrast package)
Le mercredi 03 février 2010 à 00:01 -0500, David Winsemius a écrit : On Feb 2, 2010, at 11:38 PM, Peng Yu wrote: ?contrast in the contrast package gives me the following description. However, I have no idea what Type II and III contrasts are. Could somebody explain it to me? And what does 'type' mean here? *‘type’*: set ‘type=average’ to average the individual contrasts (e.g., to obtain a Type II or III contrast) In no particular order: http://courses.washington.edu/b570/handouts/type3ss.pdf http://core.ecu.edu/psyc/wuenschk/SAS/SS1234.doc http://n4.nabble.com/a-kinder-view-of-Type-III-SS-td847282.html Don't expect any follow-up questions to be answered or further citations offered. This is really more in the realm of statistics education than an R specific question. Nonwhistanding David Winsemius' closing remark, I'd like to add something that should be requested reading (and maybe hinted at in lm()'s help page) : http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf (BTW, despite is age, MASS *is* requested reading, and Bill Venables' exegeses should be part of it). HTH, -- Emmanuel Charpentier, DDS, MsC :-) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are Type II or III contrast? (contrast() in contrast package)
Le mercredi 03 février 2010 à 09:23 -0600, Peng Yu a écrit : On Wed, Feb 3, 2010 at 2:12 AM, Emmanuel Charpentier charp...@bacbuc.dyndns.org wrote: Le mercredi 03 février 2010 à 00:01 -0500, David Winsemius a écrit : On Feb 2, 2010, at 11:38 PM, Peng Yu wrote: ?contrast in the contrast package gives me the following description. However, I have no idea what Type II and III contrasts are. Could somebody explain it to me? And what does 'type' mean here? *‘type’*: set ‘type=average’ to average the individual contrasts (e.g., to obtain a Type II or III contrast) In no particular order: http://courses.washington.edu/b570/handouts/type3ss.pdf http://core.ecu.edu/psyc/wuenschk/SAS/SS1234.doc http://n4.nabble.com/a-kinder-view-of-Type-III-SS-td847282.html Don't expect any follow-up questions to be answered or further citations offered. This is really more in the realm of statistics education than an R specific question. Nonwhistanding David Winsemius' closing remark, I'd like to add something that should be requested reading (and maybe hinted at in lm()'s help page) : http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf (BTW, despite is age, MASS *is* requested reading, and Bill Venables' exegeses should be part of it). Do you by any means indicate that MASS describes Type I, II, III, IV contrast? Although MASS describes contrasts, but I don't think it describes Type I, II, III, IV contrast. it does not. But it explains (tersely) *WHY* one has *serious* logical difficulties when tryng to interpret contrasts not abiding to marginality principle. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] system vs shell wait command
Le dimanche 24 janvier 2010 à 04:25 -0800, robbert blonk a écrit : Dear All, I try to invoke a second program (called gencont) within a R script. However this second program needs some time to finish. After finishing the output should be read into R and used for further calculation. Under windows, it perfectly works: result-shell(gencont yn.txt,intern=TRUE,wait=TRUE) # R waits untill the program finishes (apporx 3 min) and uses the result as an object. yn.txt contains the answer to a question prompted by the second program before running resultf-result[length(result)] # reads the last line, which contains the required result However, At our Linux server, shell() doesn't work. Here, I'll need system()... result-system(./gencont.exe yn.txt,intern=TRUE,wait=TRUE) #slightly different You're using a Windows executable through wine (maybe transparently via mod_binmisc) ? And this executable prints its output on the console ? Right ? resultf-result[length(result)] #same The problem now is that R under Linux does not wait for the second program to finish, even if wait = TRUE...how come? Might be wine, might be, more probably, gencont.exe : I have encountered some windows programs whose main purpose was to set things up and launch a second process which did the real work. Cases in point : Office 2000 and Office 2003 installers [yuck !], for example. FWIW, system(wineconsole cmd) opens a new window and waits for it's user to type exit. It returns 0 as a result. ISTR that Windows98' COMMAND.COM had an (internal ?) command called start launching another process, with an option to wait or not for it to terminate before returning ; this could be true of Windows 2000 and later, but I can't check right now. Wine's own cmd.exe has a start program : wineconsole --backend=curses cmd CMD version 1.1.31 Z:\home\charpentstart Lance un programme, ou ouvre un document dans le programme normalement utilisé avec cette extension. Usage : start [options] fichier_programme [...] start [options] fichier_document Optionss: /M[inimized] Lance le programme minimisé. /MAX[imized] Lance le programme maximisé. /R[estored] Lance le programme normalement (ni minimisé ni maximisé). /W[ait] Attend que le programme lancé se termine, et termine ensuite avec son code de sortie. /ProgIDOpen Open a document using the following progID. /L Montre la licence d'utilisation. start.exe version 0.2 Copyright (C) 2003, Dan Kegel Start est fourni sans AUCUNE GARANTIE ; pour les ddtails lancez avec l'option /L . Ceci est un logiciel libre, et vous êtes invité à le redistribuer sous certaines condition; lancez 'start /L' pour les détails. Z:\home\charpent This might help you ... What should I do? I seriously doubt that system') can read back wht is spit in the console. Better redirect that to another text file. Some options come to mind, but beware : I didn't try any of them, not having your gencont.exe program. - system(wineconsole cmd gencont.exe yn?txt result.txt), then result-read.lines(result.txt) ? - ftp your input to a windows box, ssh -f yourwindowsbox gencont.exe yn.txt results.txt, ftp your results back (or, equivalently, working on a Windows share) ? - port gencont.exe to linux ? :-) (this might seem silly, but might be the easiest solution if this program is (as it seems according to your partial explanations) a simple filter (read data, munch them, spit output) not using windows-specific functions or system calls). HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use SQL code in R
Le lundi 16 novembre 2009 à 13:14 +0800, sdlywjl666 a écrit : Dear All, How to use SQL code in R? Depends on what you mean by using SQL code in R... If you mean Query, update or otherwise mess with an existing database, look at the RODBC packages and/or various RDBI-related packages (hint : look at BDR's nice manual on importing/exporting data in R which comes with the standrd R documentation). If you mean use standard SQL code to manipilte R dataframes, look at the sqldf ackage (quite nice, but be aware that is work in progress). If you mean use Oracle's command language or pgsql or ... in order to transfer in R code written for a specific database, I'm afraid your SOL... In that last case, however, note that you may be able to use R *inside* your external database instead, depending on its ability to use external code (I'm thinking of the rpgsql language, which allows running R code in a pgsql function in PostgreSQL). HTH, Emmanuel Charpentier BTW, there exist a R database Special Interest Group, with a mailing list. Lookup their archive, and maybe ask a (slightly less general, if possible) version of your question there... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What parts of 'Statistical Models in S' are not applicable to R?
Le mercredi 11 novembre 2009 à 10:22 +0100, Uwe Ligges a écrit : Peng Yu wrote: According to Amazon review, 'Statistical Models in S' is a key reference for understanding the methods implemented in several of S-PLUS' high-end statistical functions, including 'lm()', predict()', 'design()', 'aov()', 'glm()', 'gam()', 'loess()', 'tree()', 'burl.tree()', 'nls()' and 'ms()'. But since it is for S, some part of the book may not be applicable to R. Some examples (e.g. interaction.plot()) discussed in this book are not available in R. Without, working examples, it is sometimes difficult for me to understand the materials in the book. Besides the functions mentioned in the Amazon review, could somebody give me a hint on what chapters (or sections) in this book are not appropriate to R? They all are appropriate, but nuances differ these days, as some nuances differ for recent S-PLUS versions, 17 years later. It should still be fine to learn some relevant concepts. You could also note that, at least in the 4th (last) edition of the book, the authors have marked passages with differences between R and S+ with a marginal R. Now this book has grow a bit out of date since its lst edition (2002 ?) : Newer R packages implements various things previously not implemented in R (e.g. multiple comparisons after ANOVA, previously available in S+ with the multicomp function, nd implemented (with a lot more generalizability) in the multcomp package). A 5th edition might be in order, but that would be a *daunting* task : The language R has grew (e. g. namespaces), the nature and extend of avilable tasks has grew *enormously*, and I don't think that producing a book that would be to 2009 R what VR4 was to 2002 R is doable by a two-person team, as talented, dedicated and productive as these two persons might be (modulo Oxford sarcasm :-). Furthermore, these two persons already give an enormous amount of time and effort to other R development (search for R-help activity of BV and BDR, or meditate on the recently published stats on R source activity...). Such a document would probably have to be something other than a book to stay up to date and accurate, and even coordinating such a task would need serious time... Even if it would exclude anything present in the vrious packages help files, and should limit to tutorial introductions, examples and discussions, the sheer volume (1700+ packages last time I looked) and the difficulty of coordination (how do you discuss 5 different packages, implementing various means to solve the same problem ?) would involve serious organizational difficulties. So I doubt such a document will get produced in the foreseeable future. Frequent R-help reading and note-taking is the second-best option... To come back to R-vs-S+ topic : unless I'm mistaken, R seems to be currently the dominant version of the S language, and most published S material will nowadays (implicitly) be aimed at R. This should reduce the importance of the problem. Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Models for Discrete Choice in R
Le dimanche 08 novembre 2009 à 19:05 -0600, Frank E Harrell Jr a écrit : Emmanuel Charpentier wrote: Le dimanche 08 novembre 2009 à 17:07 -0200, Iuri Gavronski a écrit : Hi, I would like to fit Logit models for ordered data, such as those suggested by Greene (2003), p. 736. Does anyone suggests any package in R for that? look up the polr function in package MASS (and read the relevant pages in VR4 and some quoted references...) or the slightly more sophisticated (larger range of models) lrm function in F. Harrell's Design (now rms) packge (but be aware that Design is a huge beast witch carries its own computing universe, based on (strong) Harrell's view of what a regression analysis should be : reading his book is, IMHO, necessary to understand his choices and agree (or disgree) with them). If you have a multilevel model (a. k. a. one random effect grouping), the repolr packge aims at that, but I've been unable to use it recently (numerical exceptions). By the way, my dependent variable is ordinal and my independent variables are ratio/intervalar. Numeric ? Then maybe some recoding/transformation is in order ... in which case Design/rms might or might not be useful. I'm not clear on what recoding or transformation is needed for an ordinal dependent variable and ratio/interval independent variables, nor why rms/Design would not be useful. I was thinking about transformations/recoding of the *independent* variables... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme4 and incomplete block design
, 0, 0, 0, 1, 0)) AIC BIC logLik deviance 87.33 98.72 -38.6777.33 Random effects: Groups NameVariance Std.Dev. block (Intercept) 0.86138 0.9281 Number of obs: 72, groups: block, 24 Fixed effects: Estimate Std. Error z value Pr(|z|) (Intercept) -1.9246 0.7117 -2.704 0.00684 ** treatment21.3910 0.8568 1.624 0.10446 treatment30.4527 0.9163 0.494 0.62124 treatment40.4526 0.9163 0.494 0.62131 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) trtmn2 trtmn3 treatment2 -0.775 treatment3 -0.721 0.598 treatment4 -0.721 0.598 0.558 In the first case (your original data), treatment is interpreted as a numeric (quantitative) variable , and whr lmre estimtes is a logistic regression coefficient of the outcome n this numeric variable. Probbly nonsensical, unless you hve reason to thin that your factor is ordered and should be treated as numeric). In the second case, treatment is a factor, so you get an estimate for each treatment level except the first, to be interpreted as difference of means with the first level. I fell in that trap myself a few times, and took the habit to give evels to my fctors tht cannot be interpreted as numbers (such as f-paste(F, as.character(v))). Any help is appreciated! HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Models for Discrete Choice in R
Le dimanche 08 novembre 2009 à 17:07 -0200, Iuri Gavronski a écrit : Hi, I would like to fit Logit models for ordered data, such as those suggested by Greene (2003), p. 736. Does anyone suggests any package in R for that? look up the polr function in package MASS (and read the relevant pages in VR4 and some quoted references...) or the slightly more sophisticated (larger range of models) lrm function in F. Harrell's Design (now rms) packge (but be aware that Design is a huge beast witch carries its own computing universe, based on (strong) Harrell's view of what a regression analysis should be : reading his book is, IMHO, necessary to understand his choices and agree (or disgree) with them). If you have a multilevel model (a. k. a. one random effect grouping), the repolr packge aims at that, but I've been unable to use it recently (numerical exceptions). By the way, my dependent variable is ordinal and my independent variables are ratio/intervalar. Numeric ? Then maybe some recoding/transformation is in order ... in which case Design/rms might or might not be useful. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrap inference for the sample median?
Le dimanche 30 août 2009 à 18:43 +0530, Ajay Shah a écrit : Folks, I have this code fragment: set.seed(1001) x - c(0.79211363702017, 0.940536712079832, 0.859757602692931, 0.82529998629531, 0.973451006822, 0.92378802164835, 0.996679563355802, 0.943347739494445, 0.992873542980045, 0.870624707845108, 0.935917364493788) range(x) # from 0.79 to 0.996 e - function(x,d) { median(x[d]) } b - boot(x, e, R=1000) boot.ci(b) The 95% confidence interval, as seen with `Normal' and `Basic' calculations, has an upper bound of 1.0028 and 1.0121. How is this possible? If I sample with replacement from values which are all lower than 1, then any sample median of these bootstrap samples should be lower than 1. The upper cutoff of the 95% confidence interval should also be below 1. Nope. Normal and Basic try to adjust (some form of) normal distribution to the sample of your statistic. But the median of such a small sample is quite far from a normal (hint : it is a discrete distribution with only very few possible values, at most as many value as the sample. Exercise : derive the distribution of median(x)...). To convince yourself, look at the histogram of the bootstrap distribution of median(x). Contrast with the bootstrap distribution of mean(x). Meditate. Conclude... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing year effect in lm() ***failed first time, sending again
Le mercredi 05 août 2009 à 09:52 -0700, Mark Difford a écrit : Emmanuel, somewhat incomplete help pages : what in h*ll are valid arguments to mcp() beyond Tukey ??? Curently, you'll have to dig in the source to learn that...). Not so: they are clearly stated in ?contrMat. Oops.. I oversaw that. With my apologies, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing year effect in lm() ***failed first time, sending again
Le jeudi 30 juillet 2009 à 16:41 -0600, Mark Na a écrit : Dear R-helpers, I have a linear model with a year effect (year is coded as a factor), i.e. the parameter estimates for each level of my year variable have significant P values (see some output below) and I am interested in testing: a) the overall effect of year; b) the significance of each year vis-a-vis every other year (the model output only tests each year against the baseline year). install.packges(multcomp) help.start() and use the vignettes ! They're good (and are a good complement to somewhat incomplete help pages : what in h*ll are valid arguments to mcp() beyond Tukey ??? Curently, you'll have to dig in the source to learn that...). HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave : sudden and unexplained error
Dear list, dear Max, I a currently working on a report. I'm writing it with OpenOffice.org and odfWeave. I'm working increentally : I write a bit, test (interactively) some ideas, cutting-and-pasting code to the Ooo report when satisfied with it. I the process, I tend to recompile the .odt source a *lot*. Suddenly, odfWeave started to give me an incomprehensible error even before starting the compilation itself (InFile is my inut .odt file, Outfile is the resultant .odt file) : odfWeave(InFile, OutFile) Copying SrcAnalyse1.odt Setting wd to /tmp/RtmphCUkSf/odfWeave01225949667 Unzipping ODF file using unzip -o SrcAnalyse1.odt Archive: SrcAnalyse1.odt extracting: mimetype inflating: content.xml inflating: layout-cache inflating: styles.xml extracting: meta.xml inflating: Thumbnails/thumbnail.png inflating: Configurations2/accelerator/current.xml creating: Configurations2/progressbar/ creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ creating: Configurations2/statusbar/ inflating: settings.xml inflating: META-INF/manifest.xml Removing SrcAnalyse1.odt Creating a Pictures directory Pre-processing the contents Erreur : cc$parentId == parentId is not TRUE Perusing the documentation and the r-help list archives didn't turn up anything relevant. This error survived restarting OOo, restarting R, restarting its enclosing Emacs session and even rebooting the damn hardware... Any idea ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to perform a Likelihood-ratio test?
Le lundi 13 juillet 2009 à 13:04 +0200, Lars Bergemann a écrit : Hi. I would like to use a likelihood-ratio test to compare a linear model (lm()) to another linear model containing the first one to see if the extra factors are needed - but I was unable to find any help on how to do that. Could you help me and tell me what to do? Or tell me where to find help on this topic? ?anova? Check the test argument ...:-) Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unix commands on R
Le mercredi 08 juillet 2009 à 16:36 -0400, suman Duvvuru a écrit : I am using R on unix. While in R how do I execute the unix shell commands? Is there a way to do it? I am executing a function in R and the matrix resulting from the function has to be passed on as an input to unix command. Any directions will be helpful. ?system ?capture ?textConnection These three functions (I'm not sure bout the real name of the second) should allow you to get the standard output of a Unix command line. Consider also cpturing your command output in a file (reusable...). HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odfWeave : problems with odfInsertPlot() and odfFigureCaption()
Dear Max, Le mardi 30 juin 2009 à 21:36 -0400, Max Kuhn a écrit : Well, on this one, I might bring my record down to commercial software bug fix time lines... Having actually looked at the code before wailing, I had such a hunch... [ Snip ... ] We had to do some interesting things to get figure captions working with odfWeave. You would think that the figure caption function just write out some xml and are done (like the insert plot code and the table caption code). Unfortunately, it doesn't. It writes the caption to a global variable (I know, I know) and other code actually writes the caption. We had good reason to do this, mostly related to the complexity associated with the context of the code chunk (e.g. was it inside a table, at paragraph, etc). I've been thinking about this one for a while since the current implementation doesn't allow you to pass the figure caption to other functions (it writes a global variable and returns NULL). I haven't come up with a good solution yet. ISTR from my Lisp/Scheme days that a good heuristic in this kind of case is to try to build a closure and return that to a wrapper. Unfortunately, R doesn't have anything as simple to use as let/let*, and you have to play fast and loose with environments... aside I've not been as prompt with odfWeave changes in comparison to other packages (many of you have figured this out). This is mostly due to the necessary complexity of the code. In a lot of ways, I think that we let the feature set be too rich and should have put some constrains on the availability of code chunks (e.g. only in paragraphs). I didn't even think of this. A priori, that's hard to do (whereas (\La)Texw will need \par or *two* newlines to end a paragraph and start another, oowriter will take one newline as an end of paragraph and will emit the corresponding XML. Come to think of it, you might try to insert a chunk in a table case, or in a footnote... And I don't know about other ODF tools... I want to add more features (like the one that you are requesting), but seemingly small changes end up being a considerable amount of work (and I don't time). I've been thinking about simplifying the code (or writing rtfWeave) to achieve the same goals without such a complex system. What about something along the line of : foo-odfTable(something) bar-odfInsertPlot(somethingelse) odfCaptionize(foo,Something) odfCaptionize(bar(Something else) ? The ODF format is a double-edged sword. It allows you to do many more things than something like RTF, but makes it a lot more difficult to program for. Take tables as an example. The last time I looked at the RTF specification, this was a simple markup structure. In ODF, the table structure is somewhat simple, but the approach to formatting the table is complex and not necessarily well documented. Indeed... For example, some table style elements go into styles.xml whereas others go into contest.xml (this might be implementation dependent) So, I could more easily expand the functionality of odfWeave by imposing some constraints about where the code chunks reside. The earlier versions of odfWeave did not even parse the xml; we got a lot done via regular expressions and gsub. However, we went to a xml parsing approach when we found out that the context of the code chunk matters. Feedback is welcome: do people actually use code chunks in places other than paragraphs? /aside I didn't even think of it. But now that you tell it, you got me a nice idea for a workaround ... :-) So, back to your question, the relevant code for captions is in odfWeave:::withCaptionXML (odfInsertPlot uses this to write the xml). You can try that and I might be able to look at it in the next few days. I did look that. Tjhe problem is that your code has dire warnings not to call it with a caption more than once in a figure=TRUE (sic..) chunk. Thanks, *I* thank *you, Max, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave : problems with odfInsertPlot() and odfFigureCaption()
Dear Max, dear list, I have trouble using odfInsertPlot to insert plots outside fig=TRUE chunks. My goal is to dump a bunch of (relatively uninteresting, but required) graphs from within a loop, possibly interspeded with some tables. This is not possible within a normal fig=TRUE chunk. odfInsertPlot() is able to insert a previously saved plot. When used in a fig=TRUE, results=hide chunk, it produces just a blank frame (with a caption if caption=odfFigureCaption() is used). Used in a fig=TRUE, results=xml chunk, it produces an error at file rte-assembly (see end of post). However, when wrapped in a cat(), the same code produces 1) the figure (without caption), then a blank frame with the caption, or just the figure (without caption) if the caption=() argument is absent. The same cat(odfInsertPlot()) construct can be used in a results=xml chunk (no fig=TRUE) ; however, I found no way to insert the caption. Therefore : I can programmatically create graphs and insert them via results=xml chunks, but I can't caption them. Do someone has a workaround (except reprogramming odfFigureCaption() with a new argument taking the output of odfInsertPlot()) ? Sincerely, Emmanuel Charpentier Annex : log of compilation with fig=true, results=xml. odfWeave(Test1Src.odt, Test1.odt) Copying Test1Src.odt Setting wd to /tmp/RtmpenMTgz/odfWeave3023572231 Unzipping ODF file using unzip -o Test1Src.odt Archive: Test1Src.odt extracting: mimetype inflating: content.xml inflating: layout-cache inflating: styles.xml extracting: meta.xml inflating: Thumbnails/thumbnail.png inflating: Configurations2/accelerator/current.xml creating: Configurations2/progressbar/ creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ creating: Configurations2/statusbar/ inflating: settings.xml inflating: META-INF/manifest.xml Removing Test1Src.odt Creating a Pictures directory Pre-processing the contents Sweaving content.Rnw Writing to file content_1.xml Processing code chunks ... 1 : term hide(label=ProlégomènesStandard) 2 : term xml(label=EssaiGraphe1) 3 : echo term verbatim(label=TestChunk) 4 : term xml(label=SecondChunk) 'content_1.xml' has been Sweaved Removing content.xml Post-processing the contents error parsing attribute name attributes construct error xmlParseStartTag: problem parsing attributes Couldn't find end of Start Tag draw:frame line 6 error parsing attribute name attributes construct error xmlParseStartTag: problem parsing attributes Couldn't find end of Start Tag draw:image line 6 Opening and ending tag mismatch: text:p line 6 and draw:frame Opening and ending tag mismatch: office:text line 2 and text:p Opening and ending tag mismatch: office:body line 2 and office:text Opening and ending tag mismatch: office:document-content line 2 and office:body Extra content at the end of the document Erreur : 1: error parsing attribute name 2: attributes construct error 3: xmlParseStartTag: problem parsing attributes 4: Couldn't find end of Start Tag draw:frame line 6 5: error parsing attribute name 6: attributes construct error 7: xmlParseStartTag: problem parsing attributes 8: Couldn't find end of Start Tag draw:image line 6 9: Opening and ending tag mismatch: text:p line 6 and draw:frame 10: Opening and ending tag mismatch: office:text line 2 and text:p 11: Opening and ending tag mismatch: office:body line 2 and office:text 12: Opening and ending tag mismatch: office:document-content line 2 and office:body 13: Extra content at the end of the document __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NaiveBayes fails with one input variable (caret and klarR packages)
Le mardi 30 juin 2009 à 14:51 -0400, Max a écrit : I just put a new version on cran... Now, *that's* support !!! bug report at 17:31, acknowledgement at 20:12, (probable) bug fix at 20:51. Bravo, bravissimo, Max ! Try that, SAS support ! Emmanuel Charpentier And I insist that this is not ironic in the least... It's genuinely admirative. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions about meta-analysis
Le samedi 27 juin 2009 à 13:02 +0800, sdzhangping a écrit : Dear R users: In the example of meta-analysis (cochrane, package rmeta), I can not found the p-value of Test for overall effect, and some other indices (Z, I, weight and et al). How can I get the these indices listed? library(rmeta) data(cochrane) cochrane name ev.trt n.trt ev.ctrl n.ctrl 1 Auckland 36 532 60538 2Block 169 5 61 3Doran 481 11 63 4Gamsu 14 131 20137 5 Morrison 367 7 59 6 Papageorgiou 171 7 75 7 Tauesch 856 10 71 a=meta.MH(n.trt,n.ctrl,ev.trt,ev.ctrl,names=name,data=cochrane) summary(a) Fixed effects ( Mantel-Haenszel ) meta-analysis Call: meta.MH(ntrt = n.trt, nctrl = n.ctrl, ptrt = ev.trt, pctrl = ev.ctrl, names = name, data = cochrane) OR (lower 95% upper) Auckland 0.580.38 0.89 Block0.160.02 1.45 Doran0.250.07 0.81 Gamsu0.700.34 1.45 Morrison 0.350.09 1.41 Papageorgiou 0.140.02 1.16 Tauesch 1.020.37 2.77 Mantel-Haenszel OR =0.53 95% CI ( 0.39,0.73 ) (where is Z and p-value ?) Test for heterogeneity: X^2( 6 ) = 6.9 ( p-value 0.3303 ) You might easily recompute them : use the confidence interval to guesstimate the standard deviation of OR, its value and the damn Z and p you long for... Remember that it's log(OR) that is (asymptotically) normally distributed. The weights are proportional to inverse of variance and sum to 1. You might peek at the source code for enlightment... Alternatively, you may grow lazy an use the meta package, whose function metabin() will give you all that. The recent metafor package seems also quite interesting. Another (smarter(?)) alternative is to ask yourself *why* you need Z and p. Z is just a computing device allowing you to use a known density. And the real meaning of p and its usefulness has been discussed, disputed and fought over at exceedingly large lengths in the past 80 years. Of course, if you have an instructor to please ... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA with means and SDs as input
Le jeudi 25 juin 2009 à 12:15 +0200, Sebastian Stegmann a écrit : Dear R-community, I'm struggling with a paper that reports only fragmented results of a 2by2by3 experimental design. However, Means and SDs for all cells are given. Does anyone know a package/function that helps computing an ANOVA with only Means and SDs as input (resulting in some sort of effect size and significance test)? The reason why I'm interested is simple: I'm conducting a meta-analysis. If I included only what the authors would like to show the world, the results would be somewhat biased... What you aim to do is not, as far as I can tell, possible using already-programmed solutions in R packages. Some packages offer very close solutions (e. g. lme()) but lack some necessary options (e. g. the ability to fix intra-level variance in lme() : varFixed() works as documented, not as what its name suggests...). Why not use some established meta-analysis package such as meta or rmeta ? You may have to recompute some variances or effect sizes, but it can usually be done. If you aim to go a bit farther (i. e. more than 2-groups comparisons), the newly-released metafor package seems quite interesting. Wolfgang Viechtbauer aimed to offer a tool for meta-regression, and the current stage of the package (0.5.something IIRC) seems to be quite useable, if somewhat counter-intuitive in places. I'd recommend to have a look at it. WV didn't think useful to include the vignettes and documentation for mima(), predecessor of metafor, which explains quite nicely the motivation and algorithms he used. Too bad... However, this documentation is still available on his Web site. Be aware that the current main interface to the rma() function (workhorse of the package) does not (yet ?) follow the usual format for regression-like functions common to many R packages (the (formula, data, subset, etc...) format). For example, you may make meta-regression, but you'll have to code the model matrix yourself. Similarly, the necessary bricks for (e. g.) indirect comparisons are here, but you'll have to build them yourself (bring your wheelbarrow an some mortar...). An alternative to be seriously considered : Bayesian modeling. Meta-analysis is one of the rare medical domains where journal reviewers won't immediately reject a paper at the sight of the B***sian word... As far as I know, psychological journals are not as asinine. Here again, some manual work will be in order. Some tutorial material is readily available (google Bayesian meta analysis tutorial, which should lead you to the various Statistics in Medicine tutorials, among other things). I've combed through the web and my various books on R, but it's not that easy to find. Or is it? Variance analysis through manual decomposition of sum of squares ? Hugh ... I did that when I first learned anova(so long a time ago that I am uncomfortable to think of it. In these times, even a pocket calculator was too expensive for a random student (therefore forbidden in exams), and I learned to cope with slide rule...). Tiring and messy. Furthermore, manual algorithms (i. e. simple, non-iterative algorithms) exists only for the simplest (e. g. one-way anova) or ideal (e. g. balanced two-way anova with equal group sizes) cases ; furthermore, modern critiques of the bases of such works cast some doubts on their foundations (e. g. random- and mixed-effect anova)... Therefore, the art is fast forgotten, if not totally lost. If you insist, I might try to unearth my copy of Winer's handbook (ca. 1959) and look up specific questions. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] User defined GLM?
Le lundi 22 juin 2009 à 09:35 -0700, francogrex a écrit : Hello, I have this generalized linear formula: log(x[i]/n[i])=log(sum(x)/sum(n)) + beta[i] I do not understand the problem as stated. if x[i] and n[i] are known, and unless sum(n)=0, your dataset reduces to a set of nrow(dataset) independent linear equations with nrow(dataset) unknowns (the beta[i]), whose solution is trivially beta[i]=log(x[i]/n[i])-log(sum(x)/sum(n), except when n[i]=0 in which case your equation has no solution. Could you try to re-express your problem ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] large numbers of observations using ME() of spdep
Le dimanche 07 juin 2009 à 02:50 +0900, lucero mariani a écrit : Dear All, We aim to remove the spatial structure of our data using Moran Eigen Vectors and spdep package . Our data has 3694 samples and 13 variables. The computer stop working after almost 4 days of processing (we found it emitting a sharp sound and with all colors on the screen. No wories, it was restared without problem!). Cool ! Farm it out to a night club (and get a real computer with the proceedings). Sorry. Couldn't resist ! And twice sorry to be unable to help with your *real* problem... Emmanuel Charpentier And we are left with nothing: no result file was produced since the calculations were interrumpted! Consequently, I am looking for a way to accelerate calculations with ME() of spdep and to save the results step by step (for each eigen value). I came accross several promissing ideas: 1)make a compiled program in C and call it in R with the command system(); 2)use muliR. However, I do not know how to use these tools. I do not understand exactely why the calculations are so slow and I do know know how to write a program in C... I am using Ubuntu 9.04 and R version 2.8.1 . The code in R is a fallows: taimin.xy = taimin[c(dy, dx)]; coordinates(taimin.xy) - ~dy+dx; #dy, dx, name of the respective coordinate columns coords-coordinates(taimin.xy); library(spdep); TaiminGabrielGraph-gabrielneigh(coords, nnmult = 12); tai.gab.nb-graph2nb(TaiminGabrielGraph,sym=TRUE); nbtaim_distsg - nbdists(tai.gab.nb, coords); nbtaim_simsg - lapply(nbtaim_distsg, function(x) (1-((x/(4*50))^2)) ); MEtaig.listw - nb2listw(tai.gab.nb, glist=nbtaim_simsg, style=B); sevmtaig - ME(Presabs ~Age+Curv+Zon2005+ZoneMeiji+Wi+Sol+Slope+Seadist+Elev+Basin,data=taimin, family=binomial,listw=MEtaig.listw) Any help is welcome! Thanks Lucero Mariani, Yokohama National University, Japan. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] New Meta-Analysis Package (metafor)
Le vendredi 05 juin 2009 à 14:03 +0200, Viechtbauer Wolfgang (STAT) a écrit : A new package is now available via CRAN, called metafor. The metafor package consists of a collection of functions for conducting meta-analyses in R. Fixed- and random-effects models (with and without moderators) can be fitted via the general linear (mixed-effects) model. For 2x2 table data, the Mantel-Haenszel and Peto's method are also implemented. The package also provides various plot functions (for example, for forest, funnel, and radial plots) and functions for assessing the model fit and obtaining case diagnostics. Thank you very much ! I think that this package was expected ... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Small mystery : passing a subset= argument to lme|lm through ...
Dear list, I have problems involving passing a subset= argument through I'm trying to augment the set of defined analyses for mice (homonymous package) with a call to lme. This package create multiple imputations of missing data in a mids object, each completed data set may be obtained through the complete(data, set) function. sessionInfo() R version 2.9.0 (2009-04-17) x86_64-pc-linux-gnu locale: LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] odfWeave_0.7.10 XML_2.3-0 lattice_0.17-25 loaded via a namespace (and not attached): [1] grid_2.9.0 MASS_7.2-47 nlme_3.1-92 rpart_3.1-44 # First attempt : foo-function(formula, data, ...) { library(nlme) call - match.call() if (!is.mids(data)) stop(The data must have class mids) analyses-lapply(1:data$m, function(i) lme(formula, data=complete(data,i), ...)) object - list(call = call, call1 = data$call, nmis = data$nmis, analyses = analyses) oldClass(object) - c(mira, oldClass(object)) return(object) } bar1-foo(ttotal~nbpradio, data=Data1.Imp, random=~1|ctr) class(bar1$analyses[[1]]) [1] lme # Fine : the random= argument *HAS BEEN* passed to lme() # through ... ; therefore it's (usually) possible. # Further tests (not shown) show that results are reasonable (i. e. # compliant with expectations...). bar2-foo(log(ttotal)~log(nbpradio), data=Data1.Imp, random=~1|ctr, subset=nbpradio0) Erreur dans eval(expr, envir, enclos) : ..2 utilisé dans un mauvais contexte, aucun ... où chercher Further exploration show that a subset= argument *DOES NOT GET PASSED* to lme(), which complaints furthermore not to find a ... argument. Since lme is a method private to the nlme package, I can't trace the source of error. [Later : Same observation with lm() instead of lme()...] Now, ISTR that an analogous problem *has already been discussed* on this list, but even a prolonged use of RSiteSearch() did not retrieve it. http://finzi.psych.upenn.edu/R/Rhelp02/archive/10139.html hints that Modelling and graphics functions with formulas all have slightly different nonstandard evaluation rules, and it turns out that lme() doesn't evaluate extra arguments like subset in the parent environment, but this does not tell me why a ... argument is *not* found. Different frame of evaluation ? May some kind soul point me to the right direction ? (He may even further my shame by telling me how he retrieved the relevant information...). Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can I Compile R with MS Access 2003 Developer Extensions?
Le dimanche 31 mai 2009 à 12:08 -0700, Felipe Carrillo a écrit : Hi: I have an application that uses MS Access as front end interacting with Excel,Word,Sigmaplot and R in the background. I use the programs to do different tasks but I mainly use R to create graphics on my Access forms without having to manually open R. To make this short, I was wondering if R can be compiled from Access using the Developer Extensions. I know that the MS office programs can be compiled with it but don't know if it can be done including R. Somehow, I strongly doubt it : R compilation needs tools totally alien to Microsoft's world vision. However, ISTR that there exists some sort of R-WinWorld bridge. Have a look here : http://cran.miroir-francais.fr/contrib/extra/dcom/ Note that I can't vouch for it myself : I didn't touch a Windows computer for years... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression with Constraints
Le mardi 26 mai 2009 à 14:11 -0400, Stu @ AGS a écrit : Hi! I am a bit new to R. I am looking for the right function to use for a multiple regression problem of the form: y = c1 + x1 + (c2 * x2) - (c3 * x3) Where c1, c2, and c3 are the desired regression coefficients that are subject to the following constraints: 0.0 c2 1.0, and 0.0 c3 1.0 Sounds rather like an in-the-closet Bayesian problem (with a very strange prior...). Did you consider to submit it to WinBUGS (or JAGS) ? If you still want a direct optimization, you could have started : RSiteSearch(optimization constraint) Which would have quickly led you to ask : ? constrOptim y, x1, x2, and x3 are observed data. I have a total of 6 rows of data in a data set. ??? I that's real-life data, I wonder what kind of situation forces you to estimate 3+1 parameters (c1, c2, c3 and the residual, which is not really a parameter) with 6 data points ? Your problem can be written as a system of 6 linear equations with 3 unknowns (c1, c2, c3), leaving you room to search in (a small piece of) R^3 (the residual is another way to express your objective function, not an independent parameter). Of course, if it's homework, get lost ! Emmanuel Charpentier Is optim in the stats package the right function to use? Also, I can't quite figure out how to specify the constraints. Thank you! -Stu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression with Constraints
$counts function gradient 12 12 $convergence [1] 0 $message [1] CONVERGENCE: REL_REDUCTION_OF_F = FACTR*EPSMCH system.time(D1.bound-optim(par=c(c1=0.5, c2=0.5, c3=0.5), + fn=objfun, + gr=objgrad, + data=Data1, + method=L-BFGS-B, + lower=c(-Inf,0,0), + upper=c(Inf,1,1))) utilisateur système écoulé 0.004 0.000 0.004 D1.bound $par c1 c2 c3 -0.5052117 0.2478706 0.7626612 $value [1] 12.87678 # About the same values as for the unbounded case. Okay $counts function gradient 99 $convergence [1] 0 $message [1] CONVERGENCE: REL_REDUCTION_OF_F = FACTR*EPSMCH # The real test : what will be found when the real objective is out of # bounds ? system.time(D2.bound-optim(par=c(c1=0.5, c2=0.5, c3=0.5), + fn=objfun, + gr=objgrad, + data=Data2, + method=L-BFGS-B, + lower=c(-Inf,0,0), + upper=c(Inf,1,1))) utilisateur système écoulé 0.004 0.000 0.004 D2.bound $par c1c2c3 2.0995442 0.2192566 0.000 # The optimizer bangs is pretty little head on the c3 wall. c1 is out of # the picture, but c2 happens to be not so bad... $value [1] 14.4411 # The residual is (as expected) quite larger than in the unbound case... $counts function gradient 88 $convergence [1] 0 $message [1] CONVERGENCE: REL_REDUCTION_OF_F = FACTR*EPSMCH # ... and that's not a computing anomaly. This also illustrates the hubris necessary to fit 3 coefficients on 6 data points... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to google for R stuff?
Le mercredi 20 mai 2009 à 09:02 -0400, Kynn Jones a écrit : Hi! I'm new to R programming, though I've been programming in other languages for years. One thing I find most frustrating about R is how difficult it is to use Google (or any other search tool) to look for answers to my R-related questions. With languages with even slightly more distinctive names like Perl, Java, Python, Matlab, OCaml, etc., usually including the name of the language in the query is enough to ensure that the top hits are relevant. But this trick does not work for R, because the letter R appears by itself in so many pages, that the chaff overwhelms the wheat, so to speak. So I'm curious to learn what strategies R users have found to get around this annoyance. ISTR having this question or very close ones at least thrice in the last two months. Time for a FAQ entry ? (It does not seem to exist : I checked...) Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to save R clean sessions in BATCH mode?
Le samedi 16 mai 2009 à 17:21 +0200, mcnda...@mncn.csic.es a écrit : Thanks a lot for all of you that have reply me about opening and ending R workspaces in BATCH mode. However replies were a king general and Im afraid I could not take the entire message from them. Therefore I chose to expose here a representative fraction of my work. I have 50 Rdata files (F1,F2,F3,F4, ,F50) with objects inside. I need to: open F1: - perform some simple operations with the objects - export the solution with write.table - end F1 session open F2 repeat procedures as F1 open F50 repeat procedures as F1 My difficulty here is to end a workspace and open one from the scratch to avoid mixing files from consecutive worksessions, and thus using R memory unnecessarily. I could use rm() to delete objects from the previous sessions but it seems not an efficient task. And re-loading R, rebuilding a whole process context, re-allocating memory is an efficient one ? Hah ! Any suggestions on how to perform this in Batch Mode? An examplified help would be nice! Why not encapsulate your procedures in a function taking the filename as its argument and loopîng on the filenames list ? Anything created in the function, being local to the function, will be (efficiently) cleaned up at the function exit. Magic... Exemple : ls() character(0) Foo-runif(10,0,1) ls() [1] Foo ?save.image save.image(Foo1.RData) ls() [1] Foo rm(list=ls()) Foo-letters[round(runif(10,min=1,max=26))] Foo [1] v m b y g u r f y q save.image(Foo2.RData) rm(list=ls()) bar-edit() bar-edit() Waiting for Emacs... bar function(filename) { load(file=filename) print(ls()) print(Foo) invisible(NULL) } ls() [1] bar bar(Foo1.RData) [1] filename Foo # Note : by default, ls() list the function's # environment, not the global one... ** no bar here... [1] 0.8030422 0.6326055 0.8188481 0.6161665 0.5917206 0.6631358 0.7290200 [8] 0.2970315 0.2016259 0.4473244 ls() [1] bar # Bar is still in the global environment... bar(Foo2.RData) [1] filename Foo [1] v m b y g u r f y q ls() [1] bar Good enough for you ? HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Do you use R for data manipulation?
Le lundi 11 mai 2009 à 23:20 +0800, ronggui a écrit : [ Snip... ] But, at least in my trade, the ability to handle Excel files is a must (this is considered as a standard for data entry. Sigh ...). [ Re-snip... ] I don't think Excel is a standard tool for data entry. Epidata entry is much more professional. Irony squared ? This *must* go in the fortunes file ! Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Do you use R for data manipulation?
Le mercredi 06 mai 2009 à 00:22 -0400, Farrel Buchinsky a écrit : Is R an appropriate tool for data manipulation and data reshaping and data organizing? [ Large Snip ! ... ] Depends on what you have to do. I've done what can be more or less termed data management with almost uncountable tools (from Excel (sigh...) to R with SQL, APL, Pascal, C, Basic (in 1982 !), Fortran and even Lisp in passing...). SQL has strong points : join is, to my tastes, more easily expressed in SQL than in most languages, projection and aggregation are natural. However, in SQL, there is no natural ordering of row tables, which makes expressing algorithms using this order difficult. Try for example to express the differences of a time series ... (it can be done, but it is *not* a pretty sight). On the other hand, R has some unique expressive possibilities (reshape() comes to mind). So I tend to use a combination of tools : except for very small samples, I tend to manage my data in SQL and with associated tools (think data editing, for example ; a simple form in OpenOffice's Base is quite easy to create, can handle anything for which an ODBC driver exists, and won't crap out for more than a few hundreds line...). finer manipulation is usually done in R with native tools and sqldf. But, at least in my trade, the ability to handle Excel files is a must (this is considered as a standard for data entry. Sigh ...). So the first task is usually a) import data in an SQL database, and b) prepare some routines to dump SQL tables / R dataframes in Excel tor returning back to the original data author... HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with lme4 model specification
Le mercredi 06 mai 2009 à 15:21 -0700, boeinguy2 a écrit : I am new to R and am trying to specify a model for mixed model analysis. When I run the following model I get an error: AAT- lmer(Y ~ S + A + (1|S:A/H), data=AT, REML=True) The error looks like this: Error in Spar_loc:`:` : NA/NaN argument In addition: Warning messages: 1: In model.matrix.default(mt, mf, contrasts) : variable 'Spar_loc' converted to a factor This is suspicious : Spar_loc doesn't appear in your statement... 2: In Spar_loc:`:` : numerical expression has 720 elements: only the first used Ditto... I am having trouble specifying th random component. It should reflect the random term as H nested within the interaction of S A. What am I doing wrong? The precedence of modelling operators is not very clear (and does not seem to be formally defined in the standard documentation...). In doubt, use parentheses. So try ...+(1|(S:A)/H), just in case... You might also try to define it outside the model by adding variables : AT-within(AT,{SA=S:A, SAH=SA:H[,drop=TRUE]}) AAT-lmer(y~S+A+(1|SAH),data=AT,REML=TRUE) :confused: Enlightened ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple imputations : wicked dataset. Need advice for follow-up to a possible solution.
Answering to myself (for future archive users' sake), more to come (soon) : Le jeudi 23 avril 2009 à 00:31 +0200, Emmanuel Charpentier a écrit : Dear list, I'd like to use multiple imputations to try and save a somewhat badly mangled dataset (lousy data collection, worse than lousy monitoring, you know that drill... especially when I am consulted for the first time about one year *after* data collection). My dataset has 231 observations of 53 variables, of which only a very few has no missing data. Most variables have 5-10% of missing data, but the whole dataset has only 15% complete cases (40% when dropping the 3 worst cases, which might be regained by other means). [ Big snip ... ] It turns out that my problems were caused by ... the dataset. Two very important variables (i. e. of strong influence on the outcomes and proxies) are ill-distributed : - one is a modus operandi (two classes) - the second is center (23 classes, alas...) My data are quite ill-distributed : some centers have contributed a large number of observations, some other very few. Furthermore, while few variables are quite badly known, the missingness pattern is such as : - some centers have no directly usable information (= complete cases) under one of the modi operandi - some other have no complete case at all... Therefore, any model-based prediction method using the whole dataset (recommended for multiple imputations, since one should not use for inference a richer set of data than what was imputed (seen this statement in a lot of references)) fails miserably. Remembering some fascinating readings (incl. VR) and an early (20 years ago) excursion in AI (yes, did that, didn't even got the T-shirt...), I have attempted (with some success) to use recursive partitioning for prediction. This (non-)model has some very interestind advantages in my case : - model-free - distribution-free (quite important here : you should see my density curves... and I won't speak about the outliers !) - handles missing data gracefully (almost automagically) - automatic selection and ranking of the pertinent variables - current implementation in R has some very nice features, such as surrogate splits if a value is missing, auto-pruning by cross-validation, etc ... It has also some drawbacks : - no (easy) method for inference - not easy to abstract (you can't just publish an ANOVA table and a couple of p-values...) - no well-established (i. e. acknowledged by journal reviewers) = difficult to publish These latter point do not bother me in my case. So I attempted to use this for imputation. Since mice is based on a chained equations approach and allows the end-user to write its own imputation functions, I wrote a set of such imputers to be called within the framework of the Gibbs sampler. They proceed as follow : - build a regression or classification tree of the relevant variable using the rest of the dataset - predict the relevant variable for *all* the dataset, - compute residuals from known values of the relevant variable and their prediction - impute values to missing data as prediction + a random residual. It works. It's a tad slower than prediction using normal/logistic/multinomial modelling (about a factor of 3, but for y first trial, I attempted to err on the side of excessive precision == deeper trees). It does not exhibit any obvious statistical misfeatures. But I have questions : 1) What is known of such imputation by regression/classification trees (aka recursive partitionning) ? A quick research didn't turn up very much : the idea has been evoked here and there, but I am not aware of any published solution. In particular, I have no knowledge of any theoretical (i. e. probability) wotrk on their properties ? 2) Where could I find published datasets having been used to validate other imputation methods ? 3) Do you think that these functions should be published ? Sincerely yours, Emmanuel Charpentier PS : Could someone offer an explanation ? Or must I recourse to sacrifying a black goat at midnight next new moon ? Any hint will be appreciated (and by the way, next new moon is in about 3 days...). The goat has endured the fear of her life, but is still alive... will probably start worshipping the Moon, however. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rounding problem
Seconded ! Emmanuel Charpentier Le lundi 02 mars 2009 à 21:06 -0700, Greg Snow a écrit : -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Prof Brian Ripley Sent: Monday, March 02, 2009 12:38 AM To: tedzzx Cc: r-help@r-project.org Subject: Re: [R] rounding problem I nominate the following for the fortunes package: R is Open Source and so you can modify it to emulate the bugs in other software: that is not one of the aims of its developers so please don't expect us to do so for you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loess over split data
Dear Luc, I stumbled on your unanswered question only this morning. Sorry for this lateness... Le jeudi 23 avril 2009 à 15:56 -0400, Luc Villandre a écrit : Dear R users, I am having trouble devising an efficient way to run a loess() function on all columns of a data.frame (with the x factor remaining the same for all columns) and I was hoping that someone here could help me fix my code so that I won't have to resort to using a for loop. (You'll find the upcoming lines of code in a single block near the end of the message.) Here's a small sample of my data : my.data=data.frame(birth.vec=c(274, 259, 288, 284, 276, 274, 284, 288, 273, 278), final.weight= c(2609.328, 2955.701, 4159.837, 4591.258, 5144.559, 2877.159, 4384.498, 3348.454, 3323.391, 2918.055)) ; birth.vec is in days and final.weight is in grams. Now, what I'm trying to do is to output smoothed curves for the 10th, 50th and 90th percentiles over completed weeks. In other words, I transform birth.vec into a factor vector of completed weeks with completed.weeks = as.factor(floor(my.data$birth.vec/7)) ; I use these factors to get weight quantiles by completed weeks with quan.list = tapply(my.data$final.weight,INDEX = completed.weeks,FUN=quantile,probs=c(0.1,0.5,0.9)) ; I then collapse the list into a data.frame with quan.frame = data.frame(do.call(rbind,quan.list)) ; Now comes the time to apply the loess() function to each percentile curve (i.e. to each column in quan.frame). Note that the x values for all percentile columns (i.e. X10., X50., X90.) x.values = as.numeric(rownames(quan.frame)) ; Once again, using tapply() (and transposing beforehand): t.quan.frame = t(quan.frame) ; ## The following command doesn't work smooth.result = tapply(t.quan.frame,INDEX=as.factor(rep(1:nrow(t.quan.frame),each=ncol(t.quan.frame))),FUN=loess) ; The formula argument of loess() is of course missing, since I have no idea how I could specify it in a way that the function could understand it. Why not tapply(yadda, yadda, function(x)loess(whatever you mean, x is data...)) ? But the whole idea seems ... bizarre : 1) converting time to a *class* variable destroys any ordinal relation between different times ; a curve has no meaning. At the minimum, keep this variable numeric. 2) You lose any notion of individual data (e. g. weight of each week). 3) Do you have reason to expect empirical quantiles very much different of asymptotic quantiles (modulo a possible variable transformation) ? That forces you to coarsen your data, which is rarely a good idea... If no, the asymptotic IC80 curves can be obtained much cheaper. Modulo some possible typos : my.loess-loess(final.weight~birth.vec,data=my.data) my.pred-predict(my.loess, new.data=data.frame(birth.vec=my.pred.times-with(my.data, seq(min(birth.weight), max(birth.weight, se=TRUE) my.curve=with(my.pred, data.frame(time=my.pred.times, center=fit, lb=fit+qnorm(0.1)*se.fit, ub=fit+qnorm(0.9*se.fit)) # Graphics *will* help : with(my.curve,{ plot(x=range(time), y=range(center,lb,ub), type=n, xlab=Achieved time (days), ylab=Final weight (g), main=A suspicious loess() fit); lines(final.weight~birth.vec, type=p, data=my.data) ; lines(center~time, lty=1) ; lines(lb~time, lty=3) ; lines(ub~time, lty=3) }) Adding your empirical centiles to this graph might help... NB : on your example data set, there is a big gap between the earliest point and the rest, thus suspicious predictions in the gap (lower bound is *negative* ! ) ; maybe a variable change (log ?) might be in order ? This can be assessed only on the full data. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Imputation in mice/norm
Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit : I'm trying to use either mice or norm to perform multiple imputation to fill in some missing values in my data. The data has some missing values because of a chemical detection limit (so they are left censored). I'd like to use MI because I have several variables that are highly correlated. In SAS's proc MI, there is an option with which you can limit the imputed values that are returned to some range of specified values. Is there a way to limit the values in mice? You may do that by writing your own imputation function and assign them for the imputation of particular variable (see argument imputationMethod and details in the man page for mice). If not, is there another MI tool in R that will allow me to specify a range of acceptable values for my imputed data? In the function amelia (package Amelia), you might specify a bounds argument, which allows for such a limitation. However, be aware that this might destroy the basic assumption of Amelia, which is that your data are multivariate normal. Maybe a change of variable is in order (e. g. log(concentration) has usually much better statistical properties than concentration). Frank Harrell's aregImpute (package Hmisc) has the curtail argument (TRUE by default) which limits imputations to the range of observed values. But if your left-censored variables are your dependent variables (not covariates), may I suggest to analyze these data as censored data, as allowed by Terry Therneau's coxph function (package survival) ? code your missing data as such a variable (use : coxph(Surv(min(x,yourlimit,na.rm=TRUE), !is.na(x),type=left)~Yourmodel) to do this on-the-fly). Another possible idea is to split your (supposedly x) variable in two : observed (logical), and value (observed value if observed, detection limit if not) and include these two data in your model. You probably will run into numerical difficulties due to the (built-in total separation...). HTH, Emmanuel Charpentier Thanks for the help, Todd __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Imputation in mice/norm
Danke sehr, herr Professor ! This one escaped me (notably because it's a trifle far from my current interests...). Emmanuel Charpentier Le samedi 25 avril 2009 à 08:25 -0500, Frank E Harrell Jr a écrit : Emmanuel Charpentier wrote: Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit : I'm trying to use either mice or norm to perform multiple imputation to fill in some missing values in my data. The data has some missing values because of a chemical detection limit (so they are left censored). I'd like to use MI because I have several variables that are highly correlated. In SAS's proc MI, there is an option with which you can limit the imputed values that are returned to some range of specified values. Is there a way to limit the values in mice? You may do that by writing your own imputation function and assign them for the imputation of particular variable (see argument imputationMethod and details in the man page for mice). If not, is there another MI tool in R that will allow me to specify a range of acceptable values for my imputed data? In the function amelia (package Amelia), you might specify a bounds argument, which allows for such a limitation. However, be aware that this might destroy the basic assumption of Amelia, which is that your data are multivariate normal. Maybe a change of variable is in order (e. g. log(concentration) has usually much better statistical properties than concentration). Frank Harrell's aregImpute (package Hmisc) has the curtail argument (TRUE by default) which limits imputations to the range of observed values. But if your left-censored variables are your dependent variables (not covariates), may I suggest to analyze these data as censored data, as allowed by Terry Therneau's coxph function (package survival) ? code your missing data as such a variable (use : coxph(Surv(min(x,yourlimit,na.rm=TRUE), !is.na(x),type=left)~Yourmodel) to do this on-the-fly). Another possible idea is to split your (supposedly x) variable in two : observed (logical), and value (observed value if observed, detection limit if not) and include these two data in your model. You probably will run into numerical difficulties due to the (built-in total separation...). HTH, Emmanuel Charpentier Thanks for the help, Todd __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. All see @Article{zha09non, author = {Zhang, Donghui and Fan, Chunpeng and Zhang, Juan and Zhang, {Cun-Hui}}, title ={Nonparametric methods for measurements below detection limit}, journal = Stat in Med, year = 2009, volume = 28, pages ={700-715}, annote = {lower limit of detection;left censoring;Tobit model;Gehan test;Peto-Peto test;log-rank test;Wilcoxon test;location shift model;superiority of nonparametric methods} } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple imputations : wicked dataset ? Wicked computers ? Am I cursed ? (or stupid ?)
The textbook example (example for mice) runs almost perfectly (warnings in the second example) Package mix (Joseph Schafer, maintained by Brian Ripley) : # mix needs dataset columns sorted and counted p-ncol(DTM-DataTest[sapply(DataTest,is.factor)]) DTM-cbind(DTM,DataTest[!sapply(DataTest,is.factor)]) # Preliminary grouping and sorting system.time(s-prelim.mix(DTM,p)) user system elapsed 0.012 0.000 0.009 Warning messages: 1: In storage.mode(w) - integer : NAs introduits lors de la conversion automatique 2: In max(w[!is.na(w[, j]), j]) : aucun argument pour max ; -Inf est renvoyé 3: NAs introduits lors de la conversion automatique 4: In max(w[!is.na(w[, j]), j]) : aucun argument pour max ; -Inf est renvoyé 5: NAs introduits lors de la conversion automatique 6: In max(w[!is.na(w[, j]), j]) : aucun argument pour max ; -Inf est renvoyé 7: NAs introduits lors de la conversion automatique # Parameter estimation system.time(thetahat-em.mix(s,showits=TRUE)) Erreur dans rep(prior, s$ncells) : argument 'times' incorrect Timing stopped at: 0 0 0 ### indeed : s$ncells is NA ! ### Therefore, the rest crashes : # Data augmentation proper system.time(newtheta - da.mix(s, thetahat, steps=100, showits=TRUE)) Erreur dans rep(prior, s$ncells) : argument 'times' incorrect Timing stopped at: 0 0 0 # Single imputation... system.time(ximp1 - imp.mix(s, newtheta)) Erreur dans imp.mix(s, newtheta) : objet 'newtheta' introuvable Timing stopped at: 0.004 0 0.001 The textbook example (example for da.mix) runs perfectly (and fast !). = I might be particularly stupid and misunderstanding manual and the textbook examples of one or two packages, but five ! Visual/manual/graphical examination of my dataset does not suggest an explanation. I am not aware of anything fishy in my setup (one of the machines is a bit aging, but the one used for the reported tests is almost new. Could someone offer an explanation ? Or must I recourse to sacrifying a black goat at midnight next new moon ? Any hint will be appreciated (and by the way, next new moon is in about 3 days...). Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graph with 15 combinations
Le lundi 20 avril 2009 à 21:04 +0200, Penner, Johannes a écrit : Dear R helpers, I have a data set with 4 types (W, C, E S). Now I have values for all types plus all possible combinations (the order is unimportant): W, C, WC, E, WE, CE, WCE, S, WS, CS, WCS, ES, WES, CES WCES. Ideally I would like to represent everything in one graph and as concise as possible. Drawing 4 circles and depicting it as overlap just gives me 13 out of the 15 possibilities needed (as e.g. depicted here http://www.psy.ritsumei.ac.jp/~akitaoka/classic9e.html in the graph Four circles surrounding illusion). Does anybody has a nice solution, ideally with a possible solution in R? Plot on a torus. Should be trivial in R once you've found the torus feeder for your printer... :-) Emmanuel Charpentier Thanks in advance! Johannes -- Project Coordinator BIOTA West Amphibians Museum of Natural History Dep. of Research (Herpetology) Invalidenstrasse 43 D-10115 Berlin Tel: +49 (0)30 2093 8708 Fax: +49 (0)30 2093 8565 http://www.biota-africa.org http://community-ecology.biozentrum.uni-wuerzburg.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling an incomplete Poisson distribution ?
Dear Ben, Le samedi 18 avril 2009 à 23:37 +, Ben Bolker a écrit : Emmanuel Charpentier charpent at bacbuc.dyndns.org writes: I forgot to add that yes, I've done my homework, and that it seems to me that answers pointing to zero-inflated Poisson (and negative binomial) are irrelevant ; I do not have a mixture of distributions but only part of one distribution, or, if you'll have it, a zero-deflated Poisson. An answer by Brian Ripley (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar question leaves me a bit dismayed : if it is easy to compute the probability function of this zero-deflated RV (off the top of my head, Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm (still) able to use optim to ML-estimate lambda, using this to (correctly) model my problem set and test it amounts to re-writing some (large) part of glm. Furthermore, I'd be a bit embarrassed to test it cleanly (i. e. justifiably) : out of the top of my head, only the likelihood ration test seems readily applicable to my problem. Testing contrasts in my covariates ... hum ! So if someone has written something to that effect, I'd be awfully glad to use it. A not-so-cursory look at the existing packages did not ring any bells to my (admittedly untrained) ears... Of course, I could also bootstrap the damn thing and study the distribution of my contrasts. I'd still been hard pressed to formally test hypotheses on factors... I would call this a truncated Poisson distribution, related to hurdle models. You could probably use the hurdle function in the pscl package to do this, by ignoring the fitting to the zero part of the model. On the other hand, it might break if there are no zeros at all (adding some zeros would be a pretty awful hack/workaround). Indeed ... If you defined a dtpoisson() for the distribution of the truncated Poisson model, you could probably also use bbmle with the formula interface and the parameters argument. This I was not aware of... Thank you for the tip ! By the way, I never delved into stats4, relying (erroneously) on its one-liner description in CRAN, which is (more or less) an implementation of stats with S4 classes. Therefore mle escaped me also... May I suggest a better one-liner ? (This goes also for bbmle...) The likelihood ratio test seems absolutely appropriate for this case. Why not? Few data, therefore a bit far from the asymptotic condition... Anyway, a better look at my data after discovering a bag'o bugs in the original files led me to reconsider my model : I wanted to assess, among other things, the effect of a boolean (really a two-class variable). After the *right* graph, I now tend to think that my distribution is indeed zero-deflated Poisson in one group and ... zero-deflated negative binomial in the other (still might be zero-deflated Poisson with very small mean, hard to say graphically...). Which gives me some insights on the mechanisms at work (yippeee !!) but will require some nonparametric beast for assessing central value differences (yes, this still has a meaning...), such as bootstrap. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modelling an incomplete Poisson distribution ?
Dear list, I have the following problem : I want to model a series of observations of a given hospital activity on various days under various conditions. among my outcomes (dependent variables) is the number of patients for which a certain procedure is done. The problem is that, when no relevant patient is hospitalized on said day, there is no observation (for which the number of patients item would be 0). My goal is to model this number of patients as a function of the various conditions described by my independant variables, mosty of them observed but uncontrolled, some of them unobservable (random effects). I am tempted to model them along the lines of : glm(NoP~X+Y+..., data=MyData, family=poisson(link=log)) or (accounting for some random effects) : lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log)) While the preliminary analysis suggest that (the right part of) a Poisson distribution might be reasonable for all real observations, the lack of observations with count==0 bothers me. Is there a way to cajole glm (and lmer, by the way) into modelling these data to an incomplete Poisson model, i. e. with unobserved 0 values ? Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modelling an incomplete Poisson distribution ?
I forgot to add that yes, I've done my homework, and that it seems to me that answers pointing to zero-inflated Poisson (and negative binomial) are irrelevant ; I do not have a mixture of distributions but only part of one distribution, or, if you'll have it, a zero-deflated Poisson. An answer by Brian Ripley (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar question leaves me a bit dismayed : if it is easy to compute the probability function of this zero-deflated RV (off the top of my head, Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm (still) able to use optim to ML-estimate lambda, using this to (correctly) model my problem set and test it amounts to re-writing some (large) part of glm. Furthermore, I'd be a bit embarrassed to test it cleanly (i. e. justifiably) : out of the top of my head, only the likelihood ration test seems readily applicable to my problem. Testing contrasts in my covariates ... hum ! So if someone has written something to that effect, I'd be awfully glad to use it. A not-so-cursory look at the existing packages did not ring any bells to my (admittedly untrained) ears... Of course, I could also bootstrap the damn thing and study the distribution of my contrasts. I'd still been hard pressed to formally test hypotheses on factors... Any ideas ? Emmanuel Charpentier Le samedi 18 avril 2009 à 19:28 +0200, Emmanuel Charpentier a écrit : Dear list, I have the following problem : I want to model a series of observations of a given hospital activity on various days under various conditions. among my outcomes (dependent variables) is the number of patients for which a certain procedure is done. The problem is that, when no relevant patient is hospitalized on said day, there is no observation (for which the number of patients item would be 0). My goal is to model this number of patients as a function of the various conditions described by my independant variables, mosty of them observed but uncontrolled, some of them unobservable (random effects). I am tempted to model them along the lines of : glm(NoP~X+Y+..., data=MyData, family=poisson(link=log)) or (accounting for some random effects) : lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log)) While the preliminary analysis suggest that (the right part of) a Poisson distribution might be reasonable for all real observations, the lack of observations with count==0 bothers me. Is there a way to cajole glm (and lmer, by the way) into modelling these data to an incomplete Poisson model, i. e. with unobserved 0 values ? Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] F test
Le jeudi 16 avril 2009 à 14:08 -0300, Mike Lawrence a écrit : summary(my_lm) will give you t-values, anova(my_lm) will give you (equivalent) F-values. Ahem. Equivalent, my tired foot... In simple terms (the real real story may be more intricate) : The F values stated by anova are something entierely different of t values in summary. The latter allow you to assess properties of *one* coefficient in your model (namely, do I have enough suport to state that it is nonzero ?). The former allows you to assess whether you have support for stating that *ALL* the coefficient related to the same factor cannot be *SIMULTANEOUSLY* null. Which is a horse of quite another color... By the way : if your summary indeed does give you the mean^K^K an unbiased estimate of your coefficient and an (hopefully) unbiased estimate of its standard error, the F ration is the ratio of estimates of remaining variabilities with and without the H0 assumption it tests, that is that *ALL* coefficients of your factor of interest are *SIMULTANEOUSLY* null. F and t numbers will be equivalent if and only if your factor of interest needs only one coefficient to get expressed, i. e. is a continuous covariable or a two-class discrete variable (such as boolean). In this case, you can test your factor either by the t value which, under H0, fluctuates as a Student's t with n_res dof (n_res being the residual degrees of freedom of the model) or by the F value, which will fluctuate as a Fisher F statistic with 1 and n_res dof, which happens (but that's not happenstance...) to be the *square* of a t with n_dof. May I suggest consulting a textbook *before* flunking ANOVA 101 ? Emmanuel Charpentier summary() might be preferred because it also provides the estimates SE. a=data.frame(dv=rnorm(10),iv1=rnorm(10),iv2=rnorm(10)) my_lm=lm(dv~iv1*iv2,a) summary(my_lm) Call: lm(formula = dv ~ iv1 * iv2, data = a) Residuals: Min 1Q Median 3Q Max -1.8484 -0.2059 0.1627 0.4623 1.0401 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.4864 0.4007 -1.2140.270 iv1 0.8233 0.5538 1.4870.188 iv2 0.2314 0.3863 0.5990.571 iv1:iv2 -0.4110 0.5713 -0.7190.499 Residual standard error: 1.017 on 6 degrees of freedom Multiple R-squared: 0.3161, Adjusted R-squared: -0.02592 F-statistic: 0.9242 on 3 and 6 DF, p-value: 0.4842 anova(my_lm) Analysis of Variance Table Response: dv Df Sum Sq Mean Sq F value Pr(F) iv11 1.9149 1.9149 1.8530 0.2223 iv21 0.4156 0.4156 0.4021 0.5494 iv1:iv21 0.5348 0.5348 0.5175 0.4990 Residuals 6 6.2004 1.0334 On Thu, Apr 16, 2009 at 10:35 AM, kayj kjaj...@yahoo.com wrote: Hi, How can I find the p-value for the F test for the interaction terms in a regression linear model lm ? I appreciate your help -- View this message in context: http://www.nabble.com/F-test-tp23078122p23078122.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] F test
Le jeudi 16 avril 2009 à 13:00 -0500, Jun Shen a écrit : Mike, I kind of have the same question. What if for a mixed effect model, say using lme(), how to specify the interaction effect (between a fixed effect and a random effect)? With lme, you have to specify a *list* of random effects as the random= argument, which, off the top of my (somewhat tired) limbic system, is specified by an lmList, iirc. But I'm pretty sure that ?lme will give you much better information... lmer is simpler. Say you basic model is Z~Y with another variable X acting as a possible source of random variation. You express random effects by adding (1|X) terms (meaning X is a simple (intercept) random effect) to the model, or (Y|X), which specifies both an intercept random effect of X and an effect of the slope of the (supposed) line binding Y to the independent variable. If you want to specify a possible variation of slopes with no possible intercept effect, say Z~Y+(0+Y|X) (unless I'm mistaken : I never used that, because I never had to work on such a model in a context where it would make sense...). and where to find the result of the interaction? Ah. That's the (multi-)million dollars question. You're dealing with a fixed*random interaction, which has a totally different meaning of a fixed*fixed interaction. The test of the latter means does the (possible) effect of the first factor vary with levels of the second factor ?, whereas the second reads as does the random factor increase variability about what I know of the effect of the fixed factor ?, with totally different consequences. Pinhero and Bates' book (2000), which among other thing, describes the care, feeding and drinks of lme, give explanations about 1) why the problem is computationnaly hard 2) how to get an approximate answer and 3) in which proportion previous advice might be misleading. This book is, IMHO, required reading for anybody with more than a passing interest on the subject, and I won't paraphrase it... Since then, Pr Bates started to develop lme4, which has different inner algorithms. In the cours of this work, he started to have doubts about the conventional wisdom about testing effects in mixed models, and did not (yet) provide these conventionals means of testing. Instead, he seems to work along the lines of MCMC sampling to assess various aspects of mixed models. But there I suggest to walk along the R-SIG-mixed-model archives, which are *very* interesting. Of course, if you want an authoritative answer (i. e. an answer that will pass a medical journal's reviewer unquestioned), you can always use SAS' proc mixed. But I wouldn't swear this answer is exact, or even sensible, as far as I can judge... Pr Bates seems to answer readily any (sensible) questions on the ME mailing list, where you will also find folks much more authorized than yours truly to answer this and that question... HTH, Emmanuel Charpentier Thanks. Jun On Thu, Apr 16, 2009 at 12:08 PM, Mike Lawrence mike.lawre...@dal.cawrote: summary(my_lm) will give you t-values, anova(my_lm) will give you (equivalent) F-values. summary() might be preferred because it also provides the estimates SE. a=data.frame(dv=rnorm(10),iv1=rnorm(10),iv2=rnorm(10)) my_lm=lm(dv~iv1*iv2,a) summary(my_lm) Call: lm(formula = dv ~ iv1 * iv2, data = a) Residuals: Min 1Q Median 3Q Max -1.8484 -0.2059 0.1627 0.4623 1.0401 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.4864 0.4007 -1.2140.270 iv1 0.8233 0.5538 1.4870.188 iv2 0.2314 0.3863 0.5990.571 iv1:iv2 -0.4110 0.5713 -0.7190.499 Residual standard error: 1.017 on 6 degrees of freedom Multiple R-squared: 0.3161, Adjusted R-squared: -0.02592 F-statistic: 0.9242 on 3 and 6 DF, p-value: 0.4842 anova(my_lm) Analysis of Variance Table Response: dv Df Sum Sq Mean Sq F value Pr(F) iv11 1.9149 1.9149 1.8530 0.2223 iv21 0.4156 0.4156 0.4021 0.5494 iv1:iv21 0.5348 0.5348 0.5175 0.4990 Residuals 6 6.2004 1.0334 On Thu, Apr 16, 2009 at 10:35 AM, kayj kjaj...@yahoo.com wrote: Hi, How can I find the p-value for the F test for the interaction terms in a regression linear model lm ? I appreciate your help -- View this message in context: http://www.nabble.com/F-test-tp23078122p23078122.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking
Re: [R] Cross-platforms solution to export R graphs
Le jeudi 09 avril 2009 à 15:04 +0200, Philippe Grosjean a écrit : Hello Rusers, I have worked on a R Wiki page for solutions in exporting R graphs, especially, the often-asked questions: - How can I export R graphs in vectorized format (EMF) for inclusion in MS Word or OpenOffice outside of Windows? - What is the best solution(s) for post-editing/annotating R graphs. The page is at: http://wiki.r-project.org/rwiki/doku.php?id=tips:graphics-misc:export. I would be happy to receive your comments and suggestions to improve this document. Well, if you insist ... The PDF import plugin in OpenOffice is still beta, and some report deem it difficult to install correctly an/or flaky. Having checked that both MSWord (=2000) and OpenOffice (=2.4) import and display correctly (i. e. vectorially, including fonts) EPS files, I switched to this format, most notably for use with the marvellous Max Kuhn's odfWeave package, which is a *must* for us working in state/administrative/corporate salt mines, where \LaTeX is deemed obscene and plain \TeX causes seizures ... The point is that this format doesn't need any intermediary step, thus allowing for automatisation. Be aware, however, that the embedded EPS images are not editable in-place by OpenOffice nor, as far as I know, by MS Word. But my point was to *avoid* post-production as much as humanly possible (I tend to be inhumanly lazy...). HTH, Emmanuel Charpentier All the best, PhG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT ?] rant (was : Re: Conversions From standard to metricunits)
Le vendredi 03 avril 2009 à 20:01 -0400, Murray Cooper a écrit : For science yes. For pleasure I'll still take a pint instead of 570ml! Snip Yes, but do you realize that you'll have to pee in fl. oz ? Aie ... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Basic doubts on the use of several script files in R batch mode
Le vendredi 03 avril 2009 à 22:30 +0200, mcnda...@mncn.csic.es a écrit : I already searched for information regarding the batch file operations within R. But I could not locate the information I need. Basically I have a doubt regarding the procedures on the batch use of several script files (*.R). How can I do this? How can I define the order of files? My intention is to programme openings, math operations and savings of *.Rdata sessions, but in a sequential manner. From the top of my limbic system : $ cat file1 file2 file3 R --vanilla --no-save # Of course, file3 should call save.image() at some convenient place Another, slightly more correct solution would be a batch job doing the relevant R CMD BATCH invocations See also the littler package, which allows to use R as a shell language. I don't know zilch about it ... HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Discriminant Analysis - Obtaining Classification Functions
reauire(MASS) ; ?predict.lda should enlighten you. Glancing at VR4 might be a bit more illuminating... HTH Emmanuel Charpentier Le vendredi 03 avril 2009 à 22:29 +0200, Pavel Kúr a écrit : Hello! I need some help with the linear discriminant analysis in R. I have some plant samples (divided into several groups) on which I measured a few quantitative characteristics. Now, I need to infer some classification rules usable for identifying new samples. I have used the function lda from the MASS library in a usual fashion: lda.1 - lda(groups~char1+char2+char3, data=xxx) I'd like to obtain the classification functions for the particular groups, with the aid of which I could classify unknown samples. I know I can use predict.lda to classify such samples, but I need to obtain some equations into which I could simply put the measured values of an unknown sample manually and the result would predict which group the sample most probably belongs to (like in eg. STATISTICA). I haven't found out how to extract these functions from the lda output. Could somebody give me some advice? Thank you in advance, Pavel Kur __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [OT ?] rant (was : Re: Conversions From standard to metric units)
Le vendredi 03 avril 2009 à 14:17 -0400, stephen sefick a écrit : I am starting to use R for almost any sort of calculation that I need. I am a biologist that works in the states, and there is often a need to convert from standard units to metric units. rant US/Imperial units are *not* standard units. The former metric system is now called Système International (International System) for a reason, which is *not* gallocentrism of a few 6e7 frogs, but rather laziness of about 5.6e9 losers who refuse to load their memories with meaningless conversion factors... /rant Emmanuel Charpentier who has served his time with pounds per cubic feet, furlongs per fortnight, BTU and other figments of British/American sadistic imagination, thank you very much... /rant # Again, didn't work the first time... Is there a package in R for this already? If not I believe that I am going to write some of the most often used in function form. My question is should I include this in my StreamMetabolism package. It is not along the same theme lines, but could loosely fit. The reason that I ask is that I don't want to clutter CRAN with a small package containing some conversion functions because I am to lazy to source them into R every time that I use them, but I also don't want the StreamMetabolism package to turn into StephenMisc Fuctions. Thoughts, comments, or suggestions would be appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LME as part of meta-analysis
Le vendredi 27 mars 2009 à 15:30 -0700, mrint a écrit : Hi, I'm having a problem using LME and hopefully I can get some help. Here is what I'm doing: I have a model like this: yi = alpha + theta * xi + beta * zi + error, errors are normally distributed mean 0, var sigma^2 xi and zi are generated from normal distributions within a specified range. True values of alpha, theta, beta and sigma^2 are chosen with a specific mean and variane tau^2. I have a function which generates the yi from the other variables, then a function that does a linear regression using lm to create the estimates. I then want to use this data to do a meta-anaylsis with the above repeated between 10-20 times. Within this, I want to use lme to create estimates for the average true value, sample mean and average standard error for alpha, theta, beta and the respective tau^2 values for each of these. For the lme part, I'm using this a-summary(lme(alp~1,random=~1|alp, weights=varFixed(~staalp^2))) This is the one for alpha. This isn't producing the type of results I would expect, can anyone see where I'm going wrong? (I suppose that your simulation aims to assess a specific model) This, and closely related subjects, have already been discussed on this very list. To make a long story short : lme doesn't (currently) accepts means and variances of groups as an input, it needs individual data. Someone (that should be Wolfgang Vischbauer, but I'm almost surely mutilating his name's spelling ; apologies, Wolfgang !) has written, specifically for meta-regression purposes, a mima function that does what you're requesting. Wolfgang has stated his intentions to turn this function into a full-fledged R package (with calling conventions closer to what other regression functions use), but the mima function available on his site still his 2 years old 0.4 version. For further details, look for mima or for meta-regression in the list archives. RSiteSearch() is your friend... However, if what you're interested with is strictly speaking a meta-analysis of 2-samples comparisons (i. e. your theta is scalar and your x_i are logicals), (at least) two R packages available on CRAN are built for this purpose : rmeta and meta. Both offer separate analyses for boolean or continuous dependent variables (i. e. y_i logical or continuous). If your theta is scalar but your x_i is continuous (i. e. you're meta-analysing a single regression coefficient), both package offer a variant for meta-analysis of effects, that might be relevant for you. A more general solution would be to enhance the forthcoming lme4 package to accept an alternative specification of random effects variances-covariances, which would allow general meta-regression. But I understand that Douglas Bates has already way too much work and not too much time on his hands, and I doubt he might be coaxed to work in this direction right now... A suggestion : you might forward your question to the r-mixed-models SIG mailing list with some profit... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate directions between points
Le samedi 28 mars 2009 à 16:06 +, Wanja Mathar a écrit : Hi everyone, i have a matrix of points and want to calculate the direction and length of each vector between the points. I can easily calculate the length with dist(testdata), but is their a function that returns a matrix of the direction of the vectors between points (angles 0-360°)? testdata - matrix(1:8, nrow=4, ncol=2, dimnames=list(c(A,B,C,D), c(x,y)) ?atan2 ; ?outer Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mean of difftime vectors : code infelicity or intended behaviour ?
Dear list, + (and -) being defined for difftime class, I expected mean() to return something sensible. This is only half-true : mean(c(1:5, 5:1),na.rm=TRUE) [1] 3 mean(as.difftime(c(1:5, 5:1),unit=mins),na.rm=TRUE) Time difference of 3 mins Fine so far. However : mean(c(1:5, NA,5:1),na.rm=TRUE) [1] 3 mean(as.difftime(c(1:5, NA,5:1),unit=mins),na.rm=TRUE) Time difference of NA mins Ouch ! Curiously, var(), max() and min() behave as expected. What's so special with mean() ? RSiteSearch(mean difftime) [ ... doesn't return anything relevant ] NB : this isn't done for the hell of it. I intended to replace some missing dates, with something computed from other dates and mean time intervals). Any thoughs ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mean of difftime vectors : code infelicity or intended behaviour ?
Oops. Clicked Send too fast (don't shoot, Brian !). I forgot : sessionInfo() R version 2.8.1 (2008-12-22) i486-pc-linux-gnu locale: LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8; LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base loaded via a namespace (and not attached): [1] grid_2.8.1 lattice_0.17-20lme4_0.999375-28 Matrix_0.999375-21 [5] nlme_3.1-90 On Tue, 17 Mar 2009 06:53:20 +, Emmanuel Charpentier wrote : Dear list, + (and -) being defined for difftime class, I expected mean() to return something sensible. This is only half-true : mean(c(1:5, 5:1),na.rm=TRUE) [1] 3 mean(as.difftime(c(1:5, 5:1),unit=mins),na.rm=TRUE) Time difference of 3 mins Fine so far. However : mean(c(1:5, NA,5:1),na.rm=TRUE) [1] 3 mean(as.difftime(c(1:5, NA,5:1),unit=mins),na.rm=TRUE) Time difference of NA mins Ouch ! Curiously, var(), max() and min() behave as expected. What's so special with mean() ? RSiteSearch(mean difftime) [ ... doesn't return anything relevant ] NB : this isn't done for the hell of it. I intended to replace some missing dates, with something computed from other dates and mean time intervals). Any thoughs ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
On Sat, 07 Mar 2009 18:04:24 -0500, David Winsemius wrote : [ Snip ... ] Nonetheless, I do think the relative numbers of package downloads might be interpretable, or at the very least, the basis for discussions over beer. *Anything* might be the basis for discussions over beer (obvious corollary to Thermogoddamics' second principle). More seriously : I don't think relative numbers of package downloads can be interpreted in any reasonable way, because reasons for package download have a very wide range from curiosity (what's this ?), fun (think fortunes...), to vital need tthink lme4 if/when a consensus on denominator DFs can be reached :-)...). What can you infer in good faith from such a mess ? Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Dear Barry, As far as I understand, you're telling us that having a bit of data mining does not harm whatever the data. Your example of pop music charts might support your point (although my ears disagree ...) but I think it is bad policy to indulge in white-noise analysis without a well-reasoned motive to do so. It might give bad ideas to potential statistics patrons (think a bit about the sorry state of financial markets :-(). More generally, I tend to be extremely wary about over-interpretation of belly grumbles as the Voice of the Spirit ... which is a very powerful urge of many statisticians and statistician's clients. Data mining can be fine for exploratory musings, but a serious study needs a model, i. e. a set of ideas and a way to reality-stress them. As far as I can see (but I might be nearsighted), I see no model linking package download to package use(s). Data may or may not become available with more or less of an effort, but I can't see the point. Emmanuel Charpentier Le dimanche 08 mars 2009 à 16:08 +, Barry Rowlingson a écrit : I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. As long as we say 'package Foo is the most downloaded package on CRAN', and not 'package Foo is the most used package for R', we can leave it to the user to decide if the latter conclusion follows from the former. In the absence of actual usage data I would think it a good approximation. Not that I would risk my life on it. Pop music charts are now based on download counts, but I wouldn't believe they represent the songs that are listened to the most times. Nor would I go so far as to believe they represent the quality of the songs... Should R have a 'Would you like to tell CRAN every time you do library(foo) so we can do usage counts (no personal data is transmitted blah blah) ?'? I don't think so Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] popular R packages
Le dimanche 08 mars 2009 à 13:22 -0500, Dirk Eddelbuettel a écrit : On 8 March 2009 at 13:27, Duncan Murdoch wrote: | But we don't even have that data, since CRAN is distributed across lots | of mirrors. On 8 March 2009 at 19:01, Emmanuel Charpentier wrote: | As far as I can see (but I might be nearsighted), I see no model linking | package download to package use(s). Data may or may not become available Which is why Debian (and Ubuntu) use the _opt-in package_ popularity-contest that collects data on packages used and submits that to a host collecting the data. This drives the so-called 'popcon' statistics. Yes, and there are many ways in which one can criticise this data collection process. But I fail to see how __not having any data__ leads to more informed decisions. Once you have data, you have an option of using or discarding it. But if you have no data, you have no option. How is that better? I question 1) the usefulness of the effort necessary to get the data ; and 2) the very concept of data mining, which seems to be the rationale for this proposed effort. Furthermore (but this is seriously off-topic), I seriously despise the very idea of popularity in scientific debates... Everybody does it is *not* a valid argument. Nor Everyone knows Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: A test with dependent samples.
a control. That's because this trial has a *pragmatic* goal : checking the acceptability of the administration of a drug on an a priori set of criteria. It does not allow inferences on the effect of the drug, and *postulates* that the non-administration of the drug will result in nothing of interest. This allows us to pull a realistic, interesting, null hypothesis out of our hats. On the other hand, the controlled plans, by virtue of having a control, allow us to be analytic, and separate the effect of the administration from the effect of the drug itself : this latter one might indeed be zero, the associated null hypothesis isn't nonsensical and the test of this null isn't worthless. End of the loosely related methodological rant In any case, my point is : hypothesis testing is *not* the alpha and omega of biostatistics, and other methods of describing and analysing experimental results are often much more interesting, nonwhistanding the fetishes of journal referees. Furthermore, testing of impossible or worthless hypotheses lead to worthless conclusions. Corollary : do not test for the sake of testing, because everybody does it or because a referee started a tantrum ; test realistic hypotheses, whose rejection has at least some relation to your subject matter. The two cents of someone tired of reading utter nonsense in prestigious journals... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Request for advice on character set conversions (those damn Excel files, again ...)
On Mon, 08 Sep 2008 01:45:51 +0200, Peter Dalgaard wrote : Emmanuel Charpentier wrote: Dear list, [ Snip ... ] This looks reasonably sane, I think. The last loop could be d[] - lapply(d, conv1, from, to), but I think that is cosmetic. You can't really do much better because there is no simple way of distinguishing between the various 8-bit character sets. Thank you Peter ! Could you point me to some not-so-simple (or even doubleplusunsimple) ways ? I get the problem not so rarely, and I'd like to pull this chard outta my poor tired foot one and for all... and I suppose that I am not alone in this predicament. You could presumably setup some heuristics. like the fact that the occurrence of 0x82 or 0x8a probably indicates cp437, but it gets tricky. (At least, in French, you don't have the Danish/Norwegian peculiarity that upper/lowercase o-slash were missing in cp437, and therefore often replaced yen and cent symbols in matrix printer ROMs. We still get the occational parcel addressed to ¥ster Farimagsgade.) Peter, you're gravely underestimating the ingenuity of some Excel l^Husers... (and your story is a possible candidate for a fortune() entry...). Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Request for advice on character set conversions (those damn Excel files, again ...)
Dear list, I have to read a not-so-small bunch of not-so-small Excel files, which seem to have traversed Window 3.1, Windows95 and Windows NT versions of the thing (with maybe a Mac or two thrown in for good measure...). The problem is that 1) I need to read strings, and 2) those strings may have various encodings. In the same sheet of the same file, some cells may be latin1, some UTF-8 and some CP437 (!). read.xls() alows me to read those things in sets of dataframes. my problem is to convert the encodings to UTF8 without cloberring those who are already (looking like) UTF8. I came to the following solution : foo-function(d, from=latin1,to=UTF-8){ # Semi-smart conversion of a dataframe between charsets. # Needed to ease use of those [EMAIL PROTECTED] Excel files # that have survived the Win3.1 -- Win95 -- NT transition, # usually in poor shape.. conv1-function(v,from,to) { condconv-function(v,from,to) { cnv-is.na(iconv(v,to,to)) v[cnv]-iconv(v[cnv],from,to) return(v) } if (is.factor(v)) { l-condconv(levels(v),from,to) levels(v)-l return(v) } else if (is.character(v)) return(condconv(v,from,to)) else return(v) } for(i in names(d)) d[,i]-conv1(d[,i],from,to) return(d) } Any advice for enhancement is welcome... Sincerely yours, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sending ... to a C external
Le vendredi 22 août 2008 à 15:16 -0400, John Tillinghast a écrit : I'm trying to figure this out with Writing R Extensions but there's not a lot of detail on this issue. I want to write a (very simple really) C external that will be able to take ... as an argument. (It's for optimizing a function that may have several parameters besides the ones being optimized.) !!! That's a hard one. I have never undertaken this kind of job, but I expect that your ... argument, if you can reach it from C (which I don't know) will be bound to a Lisp-like structure, notoriously hard to decode in C. Basically, you'll have to create very low level code (an duplicate a good chunk of the R parser-interpreter...). I'd rather treat the ... argument in a wrapper that could call the relevant C function with all arguments interpreted and bound... This wrapper would probably be an order of magnitude slower than C code, but two orders of magnitude easier to write (and maintain !). Since ... argument parsing would be done *once* before the grunt work is accomplished by C code, the slowdown would (probably) be negligible... I got the showArgs code (from R-exts) to compile and install, but it only gives me error messages when I run it. I think I'm supposed to pass it different arguments from what I'm doing, but I have no idea which ones. What exactly are CAR, CDR, and CADR anyway? Why did the R development team choose this very un-C-like set of commands? 'Cause they are fundamental to the Lisp-like language that is S/R. Read on... They are not explained much in R-exts. At the risk of incurring the R-help deities wrath : In the beginning, John Mc Carthy created Lisp 1.5, who begat MACLISP who begat ..., who begat Scheme ... . Nonwhistanding a long inheritance story, full of sound, fury, holy wars and bastardry (sometimes evoking European royalty lines...), all Lisp-like language bear one mark (dominant allele) of their common ancestor, which is the list structure. This structure is implemented as a pair of pointers, the first one pointing to the first element of the list, the second pointing to ... the list of the rest of the members (or nothing, aka NIL). In McCarthy's implementation, which was machine code on an IBM 704 (we were in 1957, mind you...) such word was spanned among different parts of a CPU word, known in the technical documentation of this dinosaur as address register and decrement register respectively. Thus the mnemonic names of Content of the Address Register and Content of Decrement Register (aka CAR and CDR) for these two pointers, names which were used for the (lisp) functions allowing to access them. Those names have stuck mainly for historical (sentimental ? hysterical ? ) reasons. Even when reasonable synonyms were introduced (e. g. first and rest in Common Lisp), all the old hands (and young dummies who wanted to emulate them) kept car and cdr close to their hearts. So, fifty one years after McCarthy stroke of genius, this piece of CS snobbery is still with us (and probably will 'til 64-bit Linux time counters roll over...). Despite its C-like syntax, its huge collection of array and array-like structures, its loops, S and R are fundamentally Lisp-like languages. Most notably, the representation of executable code is accessible by the code itself : one can create a function which computes another function. Lisp was the first language explicitly created for this purpose, and it is no happenstance that it (or one of his almost uncountable dialects) that many (most ?) of such language use Lisp fundamentals. Many of R/S common way of doing things have a Lisp smell : for example, it is no chance that {s|t|l}apply(), outer() and suchlike are *way* more efficient than for(), and they are strongly reminescent of Lisp's mapcar... BTW, this ability to compute the language is probably the most fundamental point distinguishing S/R from all the rest of statistical packages (SPSS, Stata, SAS and the rest of the crowd... Now I have to say that I'm not old enough to have first-hand knowledge of this history. I was born, but not weaned, when McCarthy unleashed Lisp on an unsuspecting world ... I learned that ca. 1978-9, while discovering VLisp. While I can't really help you (I still think that processing ... at C level is either hubris of the worst sort, pure folly or a desperate case), I hope to have entertained you. Sincerely, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: problem using panel.superpose and panel.groups
Le dimanche 17 août 2008 à 09:36 +, Dieter Menne a écrit : [ Snip .. ] Trellis graphics are a bit like hash functions: you can be close to the target, but get a far-off result. Nice candidate for a fortune() entry ... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FYI: APL in R
Le mardi 19 août 2008 à 23:52 -0700, Jan de Leeuw a écrit : http://idisk.mac.com/jdeleeuw-Public/utilities/apl/apl.R Thank you ! I was beginning to think I was the last of the dinosaurs... ISTR that someone else on the list is also an (un)repentant APLer... Although this is all prefix and no infix, we could easily recreate some of the infamous APL one-liners that nobody can possibly understand or reproduce. :-)!!! Having implemented a (primitive) stats package in APL, I know what you mean... A liberal use of lamp was ... well, enlightening... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Possible buglet (wart ?) of odfWeave 0.7.6 (with workaround)
Dear List, I have had problems inserting some (not all !) figures via odfWeave (using print(someLatticeFunction)...). The figure was correctly displayed in a R device window but the resulting ODF document displayed the correct space for the figure and an empty frame with a broken image icon and a read error mention. Exploration of the odf (.odt, in my case) file showed that the relevant files (.eps, which is a vector format much more apt than bitmaps to cutting/pasting on various media) existed in the Pictures directory of the file and displayed fine. Then lightning struck : the picture files were named an follows : content_, serial number, chunk name, .eps ; it occured to me that OpenOffice 2.4.1 (as packaged in Debian) displayed a Read error *when the chunk name had accented characters in it* (my system works in UTF8). Plain ASCII chunk names work OK. I do not know ODF specification well enough to be sure that this is an odfWeave problem (some chunk name mangling might be necessary) rather than an OpenOffice infelicity (non-implementation of a specification-requested feature) : checking with another ODF-compliant word processor might be useful, but I don't have any on hand. In any case, this problem is not obvious after re-reading the odfWeave documentation, and does not seem to have been already reported to r-help (yes, Pr Ripley, I did my homework...). This report might spare some poor non-English-writer sod some time ... For what it's worth : sessionInfo() R version 2.7.1 (2008-06-23) x86_64-pc-linux-gnu locale: LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] odfWeave_0.7.6 XML_1.96-0 lattice_0.17-13 loaded via a namespace (and not attached): [1] grid_2.7.1 tools_2.7.1 Hoping this might help, Emmanuel Charpentier PS : don't try to answer to my subscribed address (charpent (at) bacbuc (dot) dyndns (dot) org), which is the apparent source of this message : bacbuc (dot) dyndns (dot) org is down and will quite like likely remain so for at least two to three more weeks. I'm playing fast and loose with Gmane by putting my registered address as a Reply-to address and use my actual present address (emm (dot) charpentier (at) free (dot) fr) as source. I do not know what will survive Gmane, Mailman and Martin's scripts mauling... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Possible buglet (wart ?) of odfWeave 0.7.6 (with workaround)
Dear List, I have had problems inserting some (not all !) figures via odfWeave (using print(someLatticeFunction)...). The figure was correctly displayed in a R device window but the resulting ODF document displayed the correct space for the figure and an empty frame with a broken image icon and a read error mention. Exploration of the odf (.odt, in my case) file showed that the relevant files (.eps, which is a vector format much more apt than bitmaps to cutting/pasting on various media) existed in the Pictures directory of the file and displayed fine. Then lightning struck : the picture files were named an follows : content_, serial number, chunk name, .eps ; it occured to me that OpenOffice 2.4.1 (as packaged in Debian) displayed a Read error *when the chunk name had accented characters in it* (my system works in UTF8). Plain ASCII chunk names work OK. I do not know ODF specification well enough to be sure that this is an odfWeave problem (some chunk name mangling might be necessary) rather than an OpenOffice infelicity (non-implementation of a specification-requested feature) : checking with another ODF-compliant word processor might be useful, but I don't have any on hand. In any case, this problem is not obvious after re-reading the odfWeave documentation, and does not seem to have been already reported to r-help (yes, Pr Ripley, I did my homework...). This report might spare some poor non-English-writer sod some time ... For what it's worth : sessionInfo() R version 2.7.1 (2008-06-23) x86_64-pc-linux-gnu locale: LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] odfWeave_0.7.6 XML_1.96-0 lattice_0.17-13 loaded via a namespace (and not attached): [1] grid_2.7.1 tools_2.7.1 Hoping this might help, Emmanuel Charpentier PS : don't try to answer to my subscribed address (charpent (at) bacbuc (dot) dyndns (dot) org), which is the apparent source of this message : bacbuc (dot) dyndns (dot) org is down and will quite like likely remain so for at least two to three more weeks. I'm playing fast and loose with Gmane by putting my registered address as a Reply-to address and use my actual present address (emm (dot) charpentier (at) free (dot) fr) as source. I do not know what will survive Gmane, Mailman and Martin's scripts mauling... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read HUGE data sets?
Jorge Iván Vélez a écrit : Dear R-list, Does somebody know how can I read a HUGE data set using R? It is a hapmap data set (txt format) which is around 4GB. After read it, I need to delete some specific rows and columns. I'm running R 2.6.2 patched over XP SP2 using a 2.4 GHz Core 2-Duo processor and 4GB RAM. Any suggestion would be appreciated. Hmmm... Unless you're running a 64-bits version of XP, you might be SOL (nonwhistanding the astounding feats of the R Core Team, which managed to be able to use about 3,5 GB of memory under 32-bits Windows) : your *raw* data will eat more than the available memory. You might be lucky if some of them can be abstracted (e. g. long character chains that can be reduced to vectors), or get unlucky (large R storage overhead of nonreducible data). You might consider changing machines : get a 64-bit machine with gobs of memory and cross your fingers. Note that, since R pointers are 64-bits wide instead of 32-bits, data storage needs will inflate... Depending of the real meaning of your data and the processing they need, you might also consider storing your raw data in a SQL DBMS, reduce them in SQL and read in R only the relevant part(s). There also are some contributed packages that might help in special situations : biglm, birch. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odfWeave and xtable
宋时歌 a écrit : David, The value of odfWeave is not limited to newbie users. It is vastly useful for researchers in fields that do not accept LaTeX for journal paper submission (for example, sociology, demography). ... and for those of us living in the salt mines of administration (in my case), industry or economics|finance,, where damn Microsoft toolchain is mandatory and anything else anathemous : we have to disguise our outputs into something palatable to these tools, and odfWeave is a boon to us ! Given one of Max's former mail addresses (pointing to a big Pharma firm's mail server), I suspect that this was one of its motivations. Thanks again, Max ! Thot is not to say that this tool cannot be perfected. I started to work on cross-reference abilities but got interrupted bu some Real Live nonmazskable interrupts, including some unplanned emergency works, the loss of an USB key with my current work on this and some non-unsignificant sickness... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem when extacting columns of a data frame in a new data frame
Delphine Fontaine a écrit : Dear R-users, I would like to create a new data frame composed of 2 columns of another data frame. But it does not give me what I want... casesCNST[1:10,] case X1 X2 X3 X4 expected 1A1 0 0 0 0E 2A2 0 0 0 1C 3A3 0 0 0 2C 4A4 0 0 0 3C 5A5 0 0 0 4C 6A6 0 0 1 0C 7A7 0 0 1 1C 8A8 0 0 1 2C 9A9 0 0 1 3C 10 A10 0 0 1 4C expectedCNST - data.frame (cbind (casesCNST$case, casesCNST$expected)) expectedCNST[1:10,] X1 X2 11 4 2 112 3 3 223 3 4 334 3 5 445 3 6 556 3 7 593 3 8 604 3 9 615 3 10 2 3 Why does the variables change ?!? 'Cause you build your new data frame from vectors with no name (casesCNST$case is a vector with no name). to keep the original names, you should subset the original data frame, with casesCNST[,c(1,6)] or casesCNST[,c(case,expected)]. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing R on ubuntu dapper
hadley wickham a écrit : I followed the instructions at http://cran.r-project.org/bin/linux/ubuntu/README.html, but I'm getting the following error: ~: sudo apt-get install r-base Reading package lists... Done Building dependency tree... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. Since you only requested a single operation it is extremely likely that the package is simply not installable and a bug report against that package should be filed. The following information may help to resolve the situation: The following packages have unmet dependencies: r-base: Depends: r-base-core (= 2.6.1-1dapper0) but it is not installable Depends: r-recommended (= 2.6.1-1dapper0) but it is not going to be installed E: Broken packages Huh ? Ubuntu i386 packages install with no problem. However, x86_64 is more problematic : these packages are not built in the CRAN repositories. IIRC, r-base is an all architectures packages (mostly a meta-package), while r-base-core and r-recommended are binaries. Therefore, r-base is indeed available, while r-base-core and r-recommended are not. Hence the jam... I have been able to rebuild x86-64 packages on Ubuntu Gutsy starting from the Debian sources ; a cursory check lets me think that these builds are correct. Let me know if you want them (no guarantees : caveat emptor). I plan to check them extensively and propose them to the SIG-Debian mailing list, but this will have to wait... HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odfWeave cross-reference
Chris.H. Snow a écrit : How can I insert cross-references to odfWeave generated figures in my source odf before the graphic has been created with odfWeave? In the current (0.6.0) version, you can't. I wrote some modifications to odfWeave in this direction, planning to send them back to Max Kuhn after some testing, but am currently unable to work on them. Basically, you have to separate the creation of a reference to a table|figure from the creation of the table|figure itself. The simplest way is to be able 1) to create some forme of R object containing the necessary reference info, 2) to use such an R object in ordre to insert the XML code for reference and|or 3) to pass such an object as an optional argument to table|figure caption creation. My current set of functions allows all of the above, but is not correctly integrated in the package, and has not been tested extensively. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple problems
Gabor Grothendieck a écrit : On Dec 6, 2007 3:26 PM, marciarr [EMAIL PROTECTED] wrote: [ Snip ] 2- how do I solve a simple equation? Considering the equation y= exp(-x)^12, I would like to find the values of x for, for example, y=0.01, so exp(-x)^12=0.01. How do I do that using R? I know those a probably very, very simple questions, but for which I do not seem to find the answer. Search between 0 and 1 for the root of the indicated function: uniroot(function(x) 0.1 - exp(-x)^12, 0:1) I beg your pardon ? I'd rather use high-school algebra/analysis : log(exp(-x)^12)=log(0.01) 12log(exp(-x)=log(0.01) -12x=log(0.01) x=-log(0.01)/12=log(100)/12 Rushing for a sophisticated numerical tool without thinking for an explicit solution is easy, fast, lazy (I do that every day ...), but deprives you of the process of understanding the problem. Which was probably the point of this (probable) homework... Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using eval(parse(text)) , gsub(pattern, replacement, x) , to process code within a loop/custom function
Thomas Pujol a écrit : R-help users, Thanks in advance for any assistance ... I truly appreciate your expertise. I searched help and could not figure this out, and think you can probably offer some helpful tips. I apologize if I missed something, which I'm sure I probably did. I have data for many samples. (e.g. 1950, 1951, 1952, etc.) For each sample, I have many data-frames. (e.g. temp.1952, births.1952, gdp.1952, etc.) (Because the data is rather large (and for other reasons), I have chosen to store the data as individual files, as opposed to a list of data frames.) I wish to write a function that enables me to run any of many custom functions/processes on each sample of data. I currently accomplish this by using a custom function that uses: eval(parse(t=text.i2)) , and gsub(pat, rep, x) (this changes the sample number for each line of text I submit to eval(parse(t=text.i2)) ). Is there a better/preferred/more flexible way to do this? Beware : what follows is the advice of someone used to use RDBMS and SQL to work with data ; as anyone should know, everything is a nail to a man with a hammer. Caveat emptor... Unless I misunderstand you, you are trying to treat piecewise a large dataset made of a large number of reasonably-sized independent chunks. What you're trying to do seems to me a bit reinventing SAS macro language. What's the point ? IMNSHO, large datasets that are used only piecewise are much better handled in a real database (RDBMS), queried at runtime via, for example, Brian Ripley's RODBC. In your example, I'd create a table births with all your data + the relevant year. Out of the top of my mind : # Do that ONCE in the lifetime of your data : a RDBMS is probably more # apt than R dataframes for this kind of management library(RODBC) channel-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS) sqlSave(channel, tablename=Births, rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))), births.1952), cbind(data.frame(Year=rep(1953,nrow(births.1953))), births.1953), # ... ^W^Y ad nauseam ... )) rm(births.1951, births.1952, ...) # get back breathing space Beware : certain data types may be tricky to save ! I got bitten by Dates recently... See RODBC documentation, your DBMS documentation and the R Data Import/Export guide... At analysis time, you may use the result of the relevant query exactly as one of your dataframes. instead of : foo(... data=birth.1952, ...) type : foo(... data=sqlQuery(channel,select * from \Births\ where \Year\=1952;, ...) # Syntax illustrating talking to a picky DBMS... Furthermore, the variable Year bears your d information. Problem (dis)solved. You may loop (or even sapply()...) at will on d : for(year in 1952:1978) { query-sprintf(select * from \Births\ where \Year\=%d;,year) foo(... data=sqlQuery(channel,query), ...) ... } If you already use a DBMS with some connection to R (via RODBC or otherwise), use that. If not, sqlite is a very lightweight library that enables you to use a (very considerable) subset of SQL92 to manipulate your data. I understand that some people of this list have undertaken the creation of a sqlite-based package dedicated to this kind of large data management. HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odfWeave error
Cleber N. Borges a écrit : hello all, I trying to use the package 'odfWeave' and I get the follow error: ### error message # ... Removing content.xml Post-processing the contents Error in .Call(RS_XML_Parse, file, handlers, as.logical(addContext), : Error in the XML event driven parser for content_1.xml: error parsing attribute name The piece of the code is: ### code ... makeGraph, echo=T, results=xml= fileplot='C:\\DADOS\\tmp\\plotx.emf' win.metafile(fileplot) plot( rnorm(300), col='red', t='l', lwd=2 ); grid() dev.off() @ This chunk is pure R code and shouldn't output anything directly in the output file. Since you are telling echo=T', whatever is output by your code is sent to your output file, and since you assert results=xml,this output is interpreted as xml ; since this isn't XML, your're getting guff from the XML interpreter. I'd rather do : makeGraph, echo=FALSE= invisible({ # Whatever you did ... }) @ instead. Your second chunk : insertGraph, echo=T, results=xml= odfInsertPlot(file=fileplot, height=5, width=5 ) @ should insert your plot in the output file. [Snip ... ] BTW : Windows (enhanced) metafile are Windows-specific. As far as I can tell, recent versions of Office and OpenOffice.org correctly render Encapsulated Postcript files, thus freeing you from another Windows demendency. Unless you *have* to have an EMF output (it happens, I know ...), youd'better use use this format directly. HTH Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta-analysis on diagnostic tests (bivariate approach)
Pedro Emmanuel Alvarenga Americano do Brasil a écrit : Hello friends of R list, Im a physician and Im not that good in statistics. I have posted similar email in the epi-sig list but no one aswered so far. Im cunducting a systematic review on two diagnostic test for a particular tropical disease. I found a free software that make most of the analysis called MetaDiSc. However there is a paticular analysis that I wuould like to try that it is not implemented in this software. I looked in R for meta-analysis functions but the packeges available dont work with diagnostic tests. Im trying for a while to adapt in R a function develped in SAS and published in J.B. Reitsma et al. / Journal of Clinical Epidemiology 58 (2005) 982?990, wich is a bivariate approach to meta-analysis of diagnotic tests. lmer (in package lme4, IIRC), which *does* treat logistic regression (among other GLMs) directly, should do the trick. Probably with more ease... Since I do not have any experience with SAS, Im having a hard time trying to do so. There is an appendix at the original text with some step by step (in SAS syntax) on how to do it but Im stuck on a more advanced stuff which I cant find a solution. In a step it is mentioned a Van Houwelingen (Statist. Med. 2002; 21:589–624) approach to incorporate between and within study variances and using the restricted maximum likelihood estimation. This is a large tutorial paper, which goes well beyond your goals. This is the script (or function) so far bivar-function(TP,FP,FN,TN){ # find a way to sum 0.5 to each null cell if any Se-TP/(TP+FN) Sp-TN/(TN+FP) logSe-log(Se/(1-Se)) logSp-log(Sp/(1-Sp)) vlogSe-1/(Se*(1-Se)*(TP+FN)) vlogSp-1/(Sp*(1-Sp)*(TN+FP)) bsvlogSp-0 cblog-0 bsvlogSp-0 sdata-rbind(bsvlogSp,cblog,bsvlogSp,vlogSe,vlogSp) Does anyone know if there is any package or function with this bivariate random effect model that I cuold try to use? Neither a I could find any apckage or function to aply the restricted maximum likelihood estimation. Any tip that helps me go on or hand in the development of this script (function) would be most welcome. I'll try to have a look on this problem one of these days (I already have a serious backlog in daylight life...). Out of the top of (what I use as) my mind (assuming that technique and study are factors correctly identifying studies and compared techniques (and forgetting +0.5 in empty cells which can be done along the lines of TP-TP+0.5*(TP==0), etc ...)) : data-rbind(data.frame(suc=TP, fai=FN, situation=rep(Diseased,nrow(TP)), study=study, technique=technique), data.frame(suc=TN, fai=FP, situation=rep(NonDiseased,nrow(TN)), study=study, technique=technique)) data$situation-as.factor(as.character(data$situation)) Now, model-lmer(cbind(suc,fai)~situation:technique-1+(1|situation %in% study), data=data, family=binomial(link=logit)) should be close to what you're aiming at. But asking for Douglas Bates' advice (in the R-SIG mixed model list) would be much better... However, I am not a sanguine as Reitsma al about the whole ROC methodology. It has sound foundations for one specific problem : modeling (a simplified type of) signal perception. Its use in medical diagnostic evaluation is perfecly justified for problems of this kind (e. g. choosing, for example, a cutoff point to interpret a biological measurement in clinical terms, or comparing two numerical diagnostic indices). However, this approach ignores the costs and utilities of a false positive and a false negative, which may lead to very different choices and judgement criteria (I think that Frank Harrell has much better credentials than mine to explain this point). Furthermore, I have reservations about the (ab?)use of this model in situation when there is no numerical measure for which a cutoff point is to be choosen (e. g. ROC methodology in radiology) : while this is curent practice, I am not sure what real world meaning such a ROC curve has : as far as I can tell, a physician does *not* choose (reliably) his/her sensitivity level, and therefore cannot move his/her operating point along this hypothetical ROC curve. If one accepts the hypothesis that sensitivities and specificities are the only relevant data, the DOR estimation makes more sense (less untestable hypotheses...). Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unweighted meta-analysis
Roy Sanderson a écrit : Hello I'm very much a beginner on meta-analysis, so apologies if this is a trivial posting. I've been sent a set data from separate experimental studies, Treatment and Control, but no measure of the variance of effect sizes, numbers of replicates etc. Instead, for each study, all I have is the mean value for the treatment and control (but not the SD). With possibly three very special kind of exceptions, what you've been sent is insufficient for any kind of analysis (meta- or otherwise) : no way to assess variability, hence no way to assess relative importance of noise to data or relative importance of different set of data. One possible exception is when the very nature of the experiment imply that your data come from a truly one-parameter distribution. I'm thinking, for example, of count data of rare events, which might, under not-so-unreasonable(-in-special-circumstances) conditions, come from a Poisson distribution. Another possible set of exception is that when the second (and following) parameter(s) can be deduced from obvious general knowledge. For example (set in a semi-imaginary setup), one may give you the number of people using a given service at least once during a specified period, *provided* that in order to use this service, people have to register with the service provider first. The data might be a simple number (no valid indication of variability, if service use is too ferquent to be modeled by a Poisson distribution), but looking up the number of registered users in some data bank might provide you with a valid proportion and population size, which is enough to meta-analyze. But the third possibility is of course that your means are indeed the result of experimenting on *ONE* experimental unit of each group. This is highly dependent of what is measured and how (an example of this might be industrial production per unit time with two different set of tools/machines in various industrial setups : here, the experimental unit is the industrial setup, and your mean is *one* measure of speed). Then, you have *individual* data, that you should analyze accordingly (e. g. t-test or Wilcoxon test if there is no relationship between treated and control experimental unit, paired t-test or paired Wilcoxon test if you are told that the means may be related, etc ...). This is not a meta-analysis, but an analysis. Outside these possibilities, I see no point of meta-analysing anything that isn't analysable by itself. As far as I can tell, this forces me into an unweighted meta-analysis, with all the caveats and dangers associated with it. As far as I can tell, you're forced to tell your sponsor/tutor/whatever either that he doesn't know what he asks for or that he's trying to fool you (and you saw it !) ; which might lead you to ask him to rethink his question, give you more informatin about the measumements and experimental setup, to provide you (or help you find) the missing data, to stop playing silly games or to go fly a kite... Two possible approaches might be: a) Take the ln(treatment/control) and perform a Fisher's randomisation test (and also calculate +/- CI). b) Regress the treatment vs control values, then randomise (with or without replacement?) individual values, comparing the true regression coefficient with the distribution of randomisation regression coefficients. I haven't the foggiest idea of what you're trying to do here : introducing artficial variability in order to separate it for variation between groups ? Unless you are in the case of one experiment = one experimental unit per group (see above) with no information about variability, the only information you can use is the *sign* of the difference Experimental-Control : if all or almost all of them go in the same direction, one might be tempted to conclude that this direction is not random (that's the sign test) . But this is only valid if the hypothesis Direction is 50/50 random under H0 has validity under the experimental setup, which your story doesn't tell... Both approaches would appear to be fraught with risks; for example in the regression approach, it is probable that the error distribution of an individual randomised regression might not be normal - would this then invalidate the whole set of regressions? Again, you'd work against an artificial variability that you'd have introduced yourself : what is the point ? HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pass lm( ) a char vector as the variables to be included
Gavin Simpson a écrit : On Mon, 2007-11-26 at 14:17 +, [EMAIL PROTECTED] wrote: Here are the codes of the example of lm( ): ## Annette Dobson (1990) An Introduction to Generalized Linear Models. ## Page 9: Plant Weight Data. ctl - (4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt - (4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group - gl(2,10,20, labels=c(Ctl,Trt)) weight - c(ctl, trt) anova(lm.D9 - lm(weight ~ group)) lm.D90 - lm(weight ~ group - 1) # omitting intercept What I am doing is let the variable name group stored in a vector, say, g - group. The question is how to strip the quotation marks when we call lm( ) through g? Try: w = weight g = group form = as.formula(paste(w,g,sep=~)) lm(form) Regards, Richie. For more complicated automation, the ideas and examples from Bill Venables Programmer Niche article in the R newsletter from a few years ago might be of use: [39] Bill Venables. Programmer's niche. R News, 2(2):24-26, June 2002. [ bib | PDF | http ] The PDF is available here: http://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf Another possibility is to create macro (library(gtools) ; ? defmacro). See Thomas Lumley's paper in R News 2001-3 (Programmer’s Niche: Macros in R\n Overcoming R’s virtues). HTH, Emmanuel Charpentier __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.