[R] Followup [Can't install or upgrade the "PKI" package on a Debian testing system]

2016-12-05 Thread Emmanuel Charpentier

Dear list;

since I've posted the message below, Simon Urbanek has fixed the problem 
(in less than 5 hours, phewww...). The version available on Rforge *is* 
installable with OpenSSL 1.1


Note that you'll have to install it with 
"install.packages('PKI',,'https://www.rforge.net/')", NOT 
install.packages('PKI',,'http://www.rforge.net/'), which, for some 
inscrutable reason (runaway proxying ?) fails the same way as before.


A big Kudos and Thank You to Simon Urbanek, on behalf of not-so-stable 
distributions users !


    Emmanuel Charpentier

Original message follows :


Dear list,

It seems that recent changes in openssl somehow broke the installation 
or update of the PKI package. This is probably specific of my setup(s) 
(Debian testing, updated frequently).


Copy of a mail to Simon Urbanek (PKI maintainer), sent 5 days ago 
without reply nor acknowledgement so far :



It seems that recent changes in openssl rendered the PKI package 
uninstallable :


--


install.packages("PKI")

essai de l'URL 'http://cran.univ-paris1.fr/src/contrib/PKI_0.1-3.tar.gz'
Content type 'application/x-gzip' length 31058 bytes (30 KB)
==
downloaded 30 KB

* installing *source* package ‘PKI’ ...
** package ‘PKI’ correctement décompressé et sommes MD5 vérifiées
** libs
gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG  -fpic  -g -O2 
  -c asn1.c -o asn1.o
gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG  -fpic  -g -O2 
  -c init.c -o init.o
gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG  -fpic  -g -O2 
  -c pki-x509.c -o pki-x509.o

pki-x509.c: In function ‘PKI_extract_key’:
pki-x509.c:136:26: error: dereferencing pointer to incomplete type 
‘EVP_PKEY {aka struct evp_pkey_st}’

 if (EVP_PKEY_type(key->type) != EVP_PKEY_RSA)
  ^~
pki-x509.c: In function ‘get_cipher’:
pki-x509.c:244:40: error: dereferencing pointer to incomplete type 
‘EVP_CIPHER_CTX {aka struct evp_cipher_ctx_st}’

  ctx = (EVP_CIPHER_CTX*) malloc(sizeof(*ctx));
^~~~
pki-x509.c: In function ‘PKI_RSAkeygen’:
pki-x509.c:550:5: warning: ‘RSA_generate_key’ is deprecated 
[-Wdeprecated-declarations]

 rsa = RSA_generate_key(bits, 65537, 0, 0);
 ^~~
In file included from /usr/include/openssl/rsa.h:13:0,
 from pki.h:13,
 from pki-x509.c:1:
/usr/include/openssl/rsa.h:193:1: note: declared here
 DEPRECATEDIN_0_9_8(RSA *RSA_generate_key(int bits, unsigned long e, void
 ^
/usr/local/sage-7/local/lib/R//etc/Makeconf:132 : la recette pour la 
cible « pki-x509.o » a échouée

make: *** [pki-x509.o] Erreur 1
ERROR: compilation failed for package ‘PKI’
* removing ‘/usr/local/sage-7/local/lib/R/library/PKI’

Les packages source téléchargés sont dans
‘/tmp/Rtmpnmt97E/downloaded_packages’
Warning message:
In install.packages("PKI") :
  l'installation du package ‘PKI’ a eu un statut de sortie non nul

--

This problem blocks the installation of rstanarm, brms, rsconnect and 
shinystan among others. Not exactly trivial.


As far as I know, my libssl libraries are all at 1.1.0c, as well as openssl.


New data point : it turns out that this problem also impedes the update 
of PKI : my running R installatin still has PKI 1.3, which seems enough 
for rstanarm, brms, rsconnect and shinystan to run and install/upgrade.


However, on a new installation of R (installed in Sage), PKI can't be 
installed. I tried to install PKI 1.3 fro Rforge, with no success.


- Did someone already had this problem ?
- Has he/she been able to work around it ? If so, How ?
- Is my mail to Simon Urbanek sufficient as a bug report ? If not, what
   should I do ?

Sincerely yours,

Emanuel Charpentier

PS : If possible, I'd appreciate to be CC'd of your answers : I'm not on 
the list and have had trouble subscribing "reasonably". I'm following it 
through the mail archives.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Can't install or upgrade the "PKI" package on a Debian testing system

2016-12-05 Thread Emmanuel Charpentier

Dear list,

It seems that recent changes in openssl somehow broke the installation 
or update of the PKI package. This is probably specific of my setup(s) 
(Debian testing, updated frequently).


Copy of a mail to Simon Urbanek (PKI maintainer), sent 5 days ago 
without reply nor acknowledgement so far :



It seems that recent changes in openssl rendered the PKI package 
uninstallable :


--

> install.packages("PKI")
essai de l'URL 'http://cran.univ-paris1.fr/src/contrib/PKI_0.1-3.tar.gz'
Content type 'application/x-gzip' length 31058 bytes (30 KB)
==
downloaded 30 KB

* installing *source* package ‘PKI’ ...
** package ‘PKI’ correctement décompressé et sommes MD5 vérifiées
** libs
gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG  -fpic  -g -O2 
  -c asn1.c -o asn1.o
gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG  -fpic  -g -O2 
  -c init.c -o init.o
gcc -I/usr/local/sage-7/local/lib/R//include -DNDEBUG  -fpic  -g -O2 
  -c pki-x509.c -o pki-x509.o

pki-x509.c: In function ‘PKI_extract_key’:
pki-x509.c:136:26: error: dereferencing pointer to incomplete type 
‘EVP_PKEY {aka struct evp_pkey_st}’

 if (EVP_PKEY_type(key->type) != EVP_PKEY_RSA)
  ^~
pki-x509.c: In function ‘get_cipher’:
pki-x509.c:244:40: error: dereferencing pointer to incomplete type 
‘EVP_CIPHER_CTX {aka struct evp_cipher_ctx_st}’

  ctx = (EVP_CIPHER_CTX*) malloc(sizeof(*ctx));
^~~~
pki-x509.c: In function ‘PKI_RSAkeygen’:
pki-x509.c:550:5: warning: ‘RSA_generate_key’ is deprecated 
[-Wdeprecated-declarations]

 rsa = RSA_generate_key(bits, 65537, 0, 0);
 ^~~
In file included from /usr/include/openssl/rsa.h:13:0,
 from pki.h:13,
 from pki-x509.c:1:
/usr/include/openssl/rsa.h:193:1: note: declared here
 DEPRECATEDIN_0_9_8(RSA *RSA_generate_key(int bits, unsigned long e, void
 ^
/usr/local/sage-7/local/lib/R//etc/Makeconf:132 : la recette pour la 
cible « pki-x509.o » a échouée

make: *** [pki-x509.o] Erreur 1
ERROR: compilation failed for package ‘PKI’
* removing ‘/usr/local/sage-7/local/lib/R/library/PKI’

Les packages source téléchargés sont dans
‘/tmp/Rtmpnmt97E/downloaded_packages’
Warning message:
In install.packages("PKI") :
  l'installation du package ‘PKI’ a eu un statut de sortie non nul

--

This problem blocks the installation of rstanarm, brms, rsconnect and 
shinystan among others. Not exactly trivial.


As far as I know, my libssl libraries are all at 1.1.0c, as well as openssl.


New data point : it turns out that this problem also impedes the update 
of PKI : my running R installatin still has PKI 1.3, which seems enough 
for rstanarm, brms, rsconnect and shinystan to run and install/upgrade.


However, on a new installation of R (installed in Sage), PKI can't be 
installed. I tried to install PKI 1.3 fro Rforge, with no success.


- Did someone already had this problem ?
- Has he/she been able to work around it ? If so, How ?
- Is my mail to Simon Urbanek sufficient as a bug report ? If not, what
   should I do ?

Sincerely yours,

Emanuel Charpentier

PS : If possible, I'd appreciate to be CC'd of your answers : I'm not on 
the list and have had trouble subscribing "reasonably". I'm following it 
through the mail archives.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] FYI : XML 3.4.2 still breaks odfWeave 0.7.17

2011-08-14 Thread Emmanuel Charpentier
Dear list,

perusing the GMane archive shows that the issue with XML 3.4.x still bugs 
odfWeave users.

I just checked that the newer XML 3.4.2 version still give the same 
problem. Using it to weave a bit of documentation written with LibreOffice 
3.3.3 (current in Debian testing) leads me to a 192 Kb, 590 pages 
document, whereas reinstalling XML 3.2.0 gives me a 4 pages,  24Kb file 
(not entirely correct, though, but I haven't yet debugged it).

I can send the relevant file (.RData, source and good and bad 
compilations) if this can help.

Sincerely yours,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odfWeave 0.7.17 stutters on Debian testing 64-bit amd64 systems.

2011-05-14 Thread Emmanuel Charpentier
Dear list,

This is a copy of a mail sent to Max Kuhn, original author and
maintainer of the odfWeave package, which seems not to have received it.
It reports a problem that seems to be very implementation specific
(reproductible on three Debian testing amd64 machine, does *not* happen
on two i686 Debian testing systems, does *not* happen on an Ubuntu 11.06
amd64 machine) and therefore not attributable to odfWeave itself (which
is pure R) but to a software component it uses (XML and the underlying
libxml2, R itself, etc ...), but I need to know how to track this
problem.

Apologies fror cross-posting to r-help and r-debian-sig, but I think
that the issue is ... complicated and might not be as Debian-specific as
it seems at first view.

Sincerely,

Emmanuel Charpentier

Dear Max,

A few days ago, I started to  have problems with odfWeave 0.7.17 on a
couple of amd64 systems : the compiled files contained numerous copies
of the source files, more or less interlaced, and a few copies of the
target productions.

Then I noticed that using an older 32-bit system resulted in correct
files. An attempt with yet another machine (recent i686 netbook)
confirmed that 32-bit systems gave okay results.

Setup : in all machines, I use Debian testing with updates. My packages
are self-compiled (i. e. installed via  install.packages()).

I enclose a very minimalist source and the resulting targets. Logs of
execution on 32- and 64-bit systems are affixed after this message.

Since odfWeave is pure R, I doubt that it could be the source of the
problem. This leaves us with two obvious targets : R itself (the Debian
package is current), or the  XML library.

Do you have any idea about how to proceed to find the source of the
problem (and how to fix in) ?

Sincerely,

Emmanuel Charpentier

Execution on a 32-bit system :

 library(odfWeave)
Le chargement a nécessité le package : lattice
Le chargement a nécessité le package : XML
 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i486-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=fr_FR.utf8   LC_NUMERIC=C 
 [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8
 [5] LC_MONETARY=C LC_MESSAGES=fr_FR.utf8   
 [7] LC_PAPER=fr_FR.utf8   LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C   
[11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C  

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] odfWeave_0.7.17 XML_3.2-0   lattice_0.19-26

loaded via a namespace (and not attached):
[1] grid_2.13.0
 system.time(odfWeave(In1.odt, Out1-32.odt))
  Copying  In1.odt 
  Setting wd to  /tmp/RtmpS8lBt8/odfWeave11161739126 
  Unzipping ODF file using unzip -o In1.odt 
Archive:  In1.odt
 extracting: mimetype
   creating: Configurations2/statusbar/
  inflating: Configurations2/accelerator/current.xml  
   creating: Configurations2/floater/
   creating: Configurations2/popupmenu/
   creating: Configurations2/progressbar/
   creating: Configurations2/toolpanel/
   creating: Configurations2/menubar/
   creating: Configurations2/toolbar/
   creating: Configurations2/images/Bitmaps/
  inflating: content.xml 
  inflating: manifest.rdf
  inflating: styles.xml  
 extracting: meta.xml
 extracting: Thumbnails/thumbnail.png  
  inflating: settings.xml
  inflating: META-INF/manifest.xml   

  Removing  In1.odt 
  Creating a Pictures directory

  Pre-processing the contents
  Sweaving  content.Rnw 

  Writing to file content_1.xml
  Processing code chunks ...

  'content_1.xml' has been Sweaved

  Removing content.xml

  Post-processing the contents
  Removing content.Rnw 
  Removing styles.xml
  Renaming styles_2.xml to styles.xml
  Removing manifest.xml
  Renaming manifest_2.xml to manifest.xml
  Removing extra files

  Packaging file using zip -r In1.odt . 
  adding: manifest.rdf (deflated 54%)
  adding: mimetype (stored 0%)
  adding: Pictures/ (stored 0%)
  adding: Configurations2/ (stored 0%)
  adding: Configurations2/images/ (stored 0%)
  adding: Configurations2/images/Bitmaps/ (stored 0%)
  adding: Configurations2/menubar/ (stored 0%)
  adding: Configurations2/progressbar/ (stored 0%)
  adding: Configurations2/toolbar/ (stored 0%)
  adding: Configurations2/floater/ (stored 0%)
  adding: Configurations2/accelerator/ (stored 0%)
  adding: Configurations2/accelerator/current.xml (stored 0%)
  adding: Configurations2/popupmenu/ (stored 0%)
  adding: Configurations2/toolpanel/ (stored 0%)
  adding: Configurations2/statusbar/ (stored 0%)
  adding: content.xml (deflated 75%)
  adding: META-INF/ (stored 0%)
  adding: META-INF/manifest.xml (deflated 83%)
  adding: Thumbnails/ (stored 0%)
  adding: Thumbnails/thumbnail.png (deflated 60%)
  adding: meta.xml (deflated 56%)
  adding: styles.xml (deflated 83

Re: [R] odfWeave 0.7.17 stutters on Debian testing 64-bit amd64 systems.

2011-05-14 Thread Emmanuel Charpentier
Dear Brian,

Thank you very much for this answer. However ...

Le samedi 14 mai 2011 à 11:52 +0100, Prof Brian Ripley a écrit :
 Note the difference in XML versions.
 
 odfWeave does not work correctly with XML 3.4-x: this has been 
 reported to the maintainer (and can be seen on the CRAN package checks 
 at http://cran.r-project.org/web/checks/check_results_odfWeave.html). 
 I suggest you try downgrading to an earlier version of XML.

Do you mean the XML R package  ? Or the libxml2 Debian package ? I am
worried by the diference in behaviour of i386 and amd64 versions (and,
on amd64, the discrepancies between Ubuntu 11.04 and Debian testing,
which are easier to explain : Debian has  slightly more recent version).

I'll try to downgrade the R xml package (easier) and report results.

Again, thank you very much !

Emmanuel Charpentier


 On Sat, 14 May 2011, Emmanuel Charpentier wrote:
 
  Dear list,
 
  This is a copy of a mail sent to Max Kuhn, original author and
  maintainer of the odfWeave package, which seems not to have received it.
  It reports a problem that seems to be very implementation specific
  (reproductible on three Debian testing amd64 machine, does *not* happen
  on two i686 Debian testing systems, does *not* happen on an Ubuntu 11.06
  amd64 machine) and therefore not attributable to odfWeave itself (which
  is pure R) but to a software component it uses (XML and the underlying
  libxml2, R itself, etc ...), but I need to know how to track this
  problem.
 
  Apologies fror cross-posting to r-help and r-debian-sig, but I think
  that the issue is ... complicated and might not be as Debian-specific as
  it seems at first view.
 
  Sincerely,
 
  Emmanuel Charpentier
 
  Dear Max,
 
  A few days ago, I started to  have problems with odfWeave 0.7.17 on a
  couple of amd64 systems : the compiled files contained numerous copies
  of the source files, more or less interlaced, and a few copies of the
  target productions.
 
  Then I noticed that using an older 32-bit system resulted in correct
  files. An attempt with yet another machine (recent i686 netbook)
  confirmed that 32-bit systems gave okay results.
 
  Setup : in all machines, I use Debian testing with updates. My packages
  are self-compiled (i. e. installed via  install.packages()).
 
  I enclose a very minimalist source and the resulting targets. Logs of
  execution on 32- and 64-bit systems are affixed after this message.
 
  Since odfWeave is pure R, I doubt that it could be the source of the
  problem. This leaves us with two obvious targets : R itself (the Debian
  package is current), or the  XML library.
 
  Do you have any idea about how to proceed to find the source of the
  problem (and how to fix in) ?
 
  Sincerely,
 
 Emmanuel Charpentier
 
  Execution on a 32-bit system :
 
  library(odfWeave)
  Le chargement a nécessité le package : lattice
  Le chargement a nécessité le package : XML
  sessionInfo()
  R version 2.13.0 (2011-04-13)
  Platform: i486-pc-linux-gnu (32-bit)
 
  locale:
  [1] LC_CTYPE=fr_FR.utf8   LC_NUMERIC=C
  [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8
  [5] LC_MONETARY=C LC_MESSAGES=fr_FR.utf8
  [7] LC_PAPER=fr_FR.utf8   LC_NAME=C
  [9] LC_ADDRESS=C  LC_TELEPHONE=C
  [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C
 
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base
 
  other attached packages:
  [1] odfWeave_0.7.17 XML_3.2-0   lattice_0.19-26
 
  loaded via a namespace (and not attached):
  [1] grid_2.13.0
  system.time(odfWeave(In1.odt, Out1-32.odt))
   Copying  In1.odt
   Setting wd to  /tmp/RtmpS8lBt8/odfWeave11161739126
   Unzipping ODF file using unzip -o In1.odt
  Archive:  In1.odt
  extracting: mimetype
creating: Configurations2/statusbar/
   inflating: Configurations2/accelerator/current.xml
creating: Configurations2/floater/
creating: Configurations2/popupmenu/
creating: Configurations2/progressbar/
creating: Configurations2/toolpanel/
creating: Configurations2/menubar/
creating: Configurations2/toolbar/
creating: Configurations2/images/Bitmaps/
   inflating: content.xml
   inflating: manifest.rdf
   inflating: styles.xml
  extracting: meta.xml
  extracting: Thumbnails/thumbnail.png
   inflating: settings.xml
   inflating: META-INF/manifest.xml
 
   Removing  In1.odt
   Creating a Pictures directory
 
   Pre-processing the contents
   Sweaving  content.Rnw
 
   Writing to file content_1.xml
   Processing code chunks ...
 
   'content_1.xml' has been Sweaved
 
   Removing content.xml
 
   Post-processing the contents
   Removing content.Rnw
   Removing styles.xml
   Renaming styles_2.xml to styles.xml
   Removing manifest.xml
   Renaming manifest_2.xml to manifest.xml
   Removing extra files
 
   Packaging file using zip -r In1.odt .
   adding

Re: [R] odfWeave 0.7.17 stutters on Debian testing 64-bit amd64 systems.

2011-05-14 Thread Emmanuel Charpentier
Dear Brian,

You are right (as usual...) : downgrading XML to 3.2.0 (the
next-to-most-recent version) enables me to compile my minimalistic
example correctly.

And yes, there is no other XML than XML. Silly me...

Now, I have to understand why the i386 and Ubuntu machines were *not*
upgraded to 3.4.0 by the update.packages(ask=FALSE) I'm used to
do ...

Thank you again ! Now I can again use a multiprocessor eficiently to run
the same analyses on 18 datasets (thanks to mclapply (multicore)). A
serious win in my case...

Emmanuel Charpentier

Le samedi 14 mai 2011 à 12:59 +0100, Prof Brian Ripley a écrit :
 On Sat, 14 May 2011, Emmanuel Charpentier wrote:
 
  Dear Brian,
 
  Thank you very much for this answer. However ...
 
  Le samedi 14 mai 2011 à 11:52 +0100, Prof Brian Ripley a écrit :
  Note the difference in XML versions.
 
  odfWeave does not work correctly with XML 3.4-x: this has been
  reported to the maintainer (and can be seen on the CRAN package checks
  at http://cran.r-project.org/web/checks/check_results_odfWeave.html).
  I suggest you try downgrading to an earlier version of XML.
 
  Do you mean the XML R package  ? Or the libxml2 Debian package ? I am
 
 Only the R package is called XML.  Why would I write XML if I meant 
 libxml2?
 
  worried by the diference in behaviour of i386 and amd64 versions (and,
  on amd64, the discrepancies between Ubuntu 11.04 and Debian testing,
  which are easier to explain : Debian has  slightly more recent version).
 
  I'll try to downgrade the R xml package (easier) and report results.
 
  Again, thank you very much !
 
  Emmanuel Charpentier
 
 
  On Sat, 14 May 2011, Emmanuel Charpentier wrote:
 
  Dear list,
 
  This is a copy of a mail sent to Max Kuhn, original author and
  maintainer of the odfWeave package, which seems not to have received it.
  It reports a problem that seems to be very implementation specific
  (reproductible on three Debian testing amd64 machine, does *not* happen
  on two i686 Debian testing systems, does *not* happen on an Ubuntu 11.06
  amd64 machine) and therefore not attributable to odfWeave itself (which
  is pure R) but to a software component it uses (XML and the underlying
  libxml2, R itself, etc ...), but I need to know how to track this
  problem.
 
  Apologies fror cross-posting to r-help and r-debian-sig, but I think
  that the issue is ... complicated and might not be as Debian-specific as
  it seems at first view.
 
  Sincerely,
 
Emmanuel Charpentier
 
  Dear Max,
 
  A few days ago, I started to  have problems with odfWeave 0.7.17 on a
  couple of amd64 systems : the compiled files contained numerous copies
  of the source files, more or less interlaced, and a few copies of the
  target productions.
 
  Then I noticed that using an older 32-bit system resulted in correct
  files. An attempt with yet another machine (recent i686 netbook)
  confirmed that 32-bit systems gave okay results.
 
  Setup : in all machines, I use Debian testing with updates. My packages
  are self-compiled (i. e. installed via  install.packages()).
 
  I enclose a very minimalist source and the resulting targets. Logs of
  execution on 32- and 64-bit systems are affixed after this message.
 
  Since odfWeave is pure R, I doubt that it could be the source of the
  problem. This leaves us with two obvious targets : R itself (the Debian
  package is current), or the  XML library.
 
  Do you have any idea about how to proceed to find the source of the
  problem (and how to fix in) ?
 
  Sincerely,
 
 Emmanuel Charpentier
 
  Execution on a 32-bit system :
 
  library(odfWeave)
  Le chargement a nécessité le package : lattice
  Le chargement a nécessité le package : XML
  sessionInfo()
  R version 2.13.0 (2011-04-13)
  Platform: i486-pc-linux-gnu (32-bit)
 
  locale:
  [1] LC_CTYPE=fr_FR.utf8   LC_NUMERIC=C
  [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8
  [5] LC_MONETARY=C LC_MESSAGES=fr_FR.utf8
  [7] LC_PAPER=fr_FR.utf8   LC_NAME=C
  [9] LC_ADDRESS=C  LC_TELEPHONE=C
  [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C
 
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base
 
  other attached packages:
  [1] odfWeave_0.7.17 XML_3.2-0   lattice_0.19-26
 
  loaded via a namespace (and not attached):
  [1] grid_2.13.0
  system.time(odfWeave(In1.odt, Out1-32.odt))
   Copying  In1.odt
   Setting wd to  /tmp/RtmpS8lBt8/odfWeave11161739126
   Unzipping ODF file using unzip -o In1.odt
  Archive:  In1.odt
  extracting: mimetype
creating: Configurations2/statusbar/
   inflating: Configurations2/accelerator/current.xml
creating: Configurations2/floater/
creating: Configurations2/popupmenu/
creating: Configurations2/progressbar/
creating: Configurations2/toolpanel/
creating

Re: [R] Multivariate binary response analysis

2010-12-14 Thread Emmanuel Charpentier
Dear Derek, dear list,

A couple suggestions (and more) :

0) check that your problem can't be expressed in terms of what glmer() 
(lme4(a|b)? package(s)) can solve. I use this often, and am quite 
impressed with its abilities.

1) Why not modeling that directly in BUGS ? That might not be easy, will 
almost certainly not be fast to converge and will require lots of post-
convergence analysis but offers a direct solution.

Current BUGS implementations (OpenBUGS 3.1.x, JAGS 2.x) are well-
interfaced to R. My current favorite is JAGS + rjags, but this preference 
might come from years of frustration about (Win|Open)BUG on Wine on 
Linux, and the current offering of OpenBUGS + BRugs, currently in beta on 
the Community page of the OpenBUGS website, seems quite decent. Memory 
use might have been a  problem in the (not so) good old days, but in a 
era of multigigabyte netbooks, I doubt this is still relevant. 
Computation time is more of a concern : although Lunn  al. (2009)(1) 
allude to a parallelized version of WinBUGS(2), I am not aware of any 
publicly-accessible parallelized BUGS interpreter, which is a shame in 
times where even said netbooks get multicore CPUs ...).

2) The nice MCMCglmm package is able to run a large subclass of DAG 
Bayesian models expressible as generalizations of (possibly 
multidependent) multivaried mixed-model GLMs, *way* faster than current 
BUGS implementations (the author claims an acceleration factor of about 
30 on his test problem, and my (limited) experience does not disprove 
him).

It seems (to me) that your problem *might* be expressible in terms 
palatable to MCMCglm(). Be aware, however, that using this package will 
*require* some serious understanding of the author's notation used in his 
(excellent) documentation. Furthermore, if you're interested in the 
random effects values, you're SOL : I have not (yet) been able to 
retrieve traces of their distributions in the resultant object. But 
again, my understanding of this package is limited.

My personal bias goes to BUGS, as you might have guessed, but the public 
health warning that the Spiegelhalter gang put in front of the WinBUGS 
manual still apply : MCMC sampling can be dangerous. To which I add 
model checking, convergence assessment and validation might eat more 
time than modeling itself.

HTH,

Emmanuel Charpentier

--

1. David Lunn et al., “Rejoinder to commentaries on ‘The BUGS project: 
Evolution, critique and future directions’,” Statistics in Medicine 28, 
no. 25 (11, 2009): 3081-3082.
2. http://www.page-meeting.org/pdf_assets/6232-PARALLEL%
20Girgis_POSTER_final.pdf. Accessed on Dec, 14, 2010.

On Mon, 13 Dec 2010 15:31:53 -0800, Janszen, Derek B wrote :

 Greetings ~
 
 I need some assistance determining an appropriate approach to analyzing
 multivariate binary response data, and how to do it in R.
 
 The setting: Data from an assay, wherein 6-hours-post-fertilization
 zebrafish embryos (n=20, say) are exposed in a vial to a chemical
 (replicated 3 times, say), and 5 days later observed for the
 presence/absence (1/0) of defects in several organ systems (yolk sac,
 jaw, snout, body axis, pericardial edema, etc.) for each fish. The assay
 can be used as a screen (in which case a single response 'any' is first
 analyzed) as well as for comparing the responses of different classes of
 chemicals (a MANOVA-type analysis). Interest is focused on where
 response(s) are occurring, any associations among responses, and
 ultimately on possible biological mechanisms (the fish are tested for
 behavioral activity prior to this assay, and then ground up to provide
 RNA for microarray assays. A-lotta-data!).
 
 What I *wish* I could do is something like glm(response.matrix ~
 treat/vial, family=binomial(logit), data=zf.dat) but I know this can't
 be done. I came across the baymvb (Bayesian analysis of multivariate
 binary data) package in the R contributed packages archives, but it is
 no longer supported and the author is incommunicado. In the baymvb
 function the model is specified as single.stacked.response ~
 structure.factor + linear.model.terms. Huh? This looks suspiciously
 similar to analyzing repeated measures data in SAS as a univariate
 response with a split-plot structure (which forces the response
 covariance matrix to have a compound symmetric structure). If this is
 what's happening with this function it is definitely not appropriate.
 How about a GEE approach (I'm not familiar with the liter
  ature, or with any of the R packages that implement it)? Any other
  recommendations? (NB: just discovered the bayesm package. Don't know if
  this will work for me or not.)
 
 Any help would be greatly appreciated. Derek Janszen, PhD
 Sr Research Biostatistician
 Computational Biology  Bioinformatics Pacific Northwest National
 Laboratory Richland, WA  99352 USA
 derek.jans...@pnl.gov
 
 
 
   [[alternative HTML version deleted

[R] Second summary [( S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?]

2010-12-13 Thread Emmanuel Charpentier
Dear list,

The starting solution proposed by Duncan Murdoch is an *excellent* 
start for Sweave'ing. One can trivially redefine Sweave to start using it 
quasi-transparently (to use it transparently would require patching 
Sweave, which would be worth some failsafes and possibly options 
tinkering).

Integrating it on odfWeave is not as simple, because the driver as to 
spit out correct XML. I suspect that the odfCat() function of odfWeave 
will be useful, but my (very) cursory glance at odfWeave source did not 
(yet) ensured me that one could use the shortcuts that the simple 
structure of \TeX{} file allows. XML *needs* some caution, and the Open 
Document specification is *not* a simple beast. However, I have good 
hopes.

I'll keep the list posted about further enhancements, as my (somewhat 
busy) schedule will allow.

Again, kudos and a big thank you to Duncan Murdoch, who was able to give 
a *correct* solution before I have been able to *correctly* state the 
problem.

Emmanuel Charpentier

On Sat, 11 Dec 2010 22:58:10 +, Emmanuel Charpentier wrote :

[ Snip... ]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summary (Re: (S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?)

2010-12-12 Thread Emmanuel Charpentier
Dear list,

see comment at end.

On Sat, 11 Dec 2010 22:58:10 +, Emmanuel Charpentier wrote :

 Dear list,
 
 Inspired by the original Knuth tools, and for paedaogical reasons, I
 wish to produce a document presenting some source code with interspersed
 comments in the source (see Knuth's books rendering TeX and metafont
 sources to see what I mean).
 
 I seemed to remember that a code chunk could be defined piecewise, like
 in
 
 Comments...
 
 Chunk1, eval=FALSE, echo=TRUE=
 SomeCode
 @
 
 Some other comments...
 
 Chunk2, eval=FALSE, echo=TRUE=
 MoreCode
 @
 
 And finally,
 
 Chunk3, eval=TRUE, echo=TRUE=
 Chunk1
 Chunk2
 EndOfTheCode
 @
 
 That works ... as long as SomeCode, MoreCode and EndOfTheCode are self-
 standing pieces of R code, but *not* code fragments. You can *not*
 intersperse comments in, say, a function body, or  local() environment
 this way : when Sweaving, *R* complains of an incomplete source (makes
 noise about an unexpected end of input at the end of Chunk1, IIRC, and
 never sees Chunk2).
 
 I hoped that Sweave's alternative syntax could offer a way out : no
 such luck.
 
 There seems to be no way to delay R evaluation of a R chunk passed by
 Sweave ; at least, the eval=FALSE option of chunk declaration is not
 sufficient for that.
 
 Am I missing something in the Sweave nd odfWeve documentations (that I
 read till I grew green and moldy) ? Or does this require a fundamental
 change in the relevant Sweave drivers ?
 
 Can you suggest alternative ways of doing what I mean to do ? The only
 workaround I found is to paste a second copy of my code in a \verbatim
 environment (or, in the case of odfWeave, in the text part), and spice
 it with \end{verbatim} comments.. \begin{verbatim} chunks. This way, I
 lose any guarantee of consistency between commented text and effective
 code.
 
 Any other idea ?
 
   Emmanuel Charpentier


To summarize the answers I got so far, there seems to be no satisfactory 
solutions to my current problem with either Sweave or odfWeave :
- every (Sw|odfW)eave code chunk has to be parseable in itself. 
One cannot break it in unparseable pieces in home to paste it later ;
- the (hypothetical) parse=FALSE option, suggested by Duncan 
Murdoch, is, well, hypothetical ;
- the brew package is not integrated in either Sweave or 
odfWeave ;
- R-style comments will get you only so far (where is math markup 
when you need it ?) ;
- subdivising work in small units is not a practical solution, 
when your big piece of software is a collection of small parts and 
local variable assignments, embedded in a (perforce large) local 
environment to keep things clean...

Therefore, I let this problem to sleep. However, I Cc this answer (with 
the original question below) to Max Kuhn and Friedrich Leisch, in the 
(faint) hope that this feature, which does not seem to have been missed 
by anybody in 8 years, might be considered sufficiently useful to grant 
addition someday...

Sincerely yours,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?

2010-12-11 Thread Emmanuel Charpentier
Dear list,

Inspired by the original Knuth tools, and for paedaogical reasons, I wish 
to produce a document presenting some source code with interspersed 
comments in the source (see Knuth's books rendering TeX and metafont 
sources to see what I mean).

I seemed to remember that a code chunk could be defined piecewise, like in

Comments...

Chunk1, eval=FALSE, echo=TRUE=
SomeCode
@

Some other comments...

Chunk2, eval=FALSE, echo=TRUE=
MoreCode
@

And finally,

Chunk3, eval=TRUE, echo=TRUE=
Chunk1
Chunk2
EndOfTheCode
@

That works ... as long as SomeCode, MoreCode and EndOfTheCode are self-
standing pieces of R code, but *not* code fragments. You can *not* 
intersperse comments in, say, a function body, or  local() environment 
this way : when Sweaving, *R* complains of an incomplete source (makes 
noise about an unexpected end of input at the end of Chunk1, IIRC, and 
never sees Chunk2).

I hoped that Sweave's alternative syntax could offer a way out : no 
such luck.

There seems to be no way to delay R evaluation of a R chunk passed by 
Sweave ; at least, the eval=FALSE option of chunk declaration is not 
sufficient for that.

Am I missing something in the Sweave nd odfWeve documentations (that I 
read till I grew green and moldy) ? Or does this require a fundamental 
change in the relevant Sweave drivers ?

Can you suggest alternative ways of doing what I mean to do ? The only 
workaround I found is to paste a second copy of my code in a \verbatim 
environment (or, in the case of odfWeave, in the text part), and spice 
it with \end{verbatim} comments.. \begin{verbatim} chunks. This way, I 
lose any guarantee of consistency between commented text and effective 
code.

Any other idea ?

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Of possible interest for (Win|Open)BUGS users on Linux platforms

2010-10-22 Thread Emmanuel Charpentier
Dear list,

It might be of interest to (Win|Open)BUGS users trying to use it from R
on Linux platform, and to Linux R users trying to call (Win|Open)BUGS to
note that two new packages have appeared on the OpenBUGS User
Conributed Code (http://openbugs.info/w/UserContributedCode).

R2OpenBUGS seems to be an adaptation of the R2WinBUGS package, able to
run native OpenBUGS without Wine contorsions (of late, I have been
unable to run WinBUGS from R2WinBUGS correctly without debug option,
WinBUGS not opening the script file. Probably yet another wine
regression...).

A version of BRugs able to use the linux version of OpenBUGS is also
avilable.

Their author (apparently Chris Jackson from MRC Cambridge, which I thank
heartily) states that [p]ending additional user testing, they will be
posted to CRAN. I add that such testing (and reporting !) might hasten
this posting, an therefore the general availability of these packages.

These packages might also be useful to stress-test the other BUGS
implementation (JAGS), which gave me pause recently (in the
deviance/pD/DIC department...).

First data point : I have been able to install these two packages on
Ubuntu Lucid (64 bits, using the g++-multilib package) and Debian
Squeeze (32 bits) and to test BRugs quickly without apparent problems.

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Including a text file (JAGS model) in an R package

2010-08-16 Thread Emmanuel Charpentier
Sorry for chiming in a tad late (I was incommunicado for two weeks)

Le lundi 02 août 2010 à 18:50 +, Ben Bolker a écrit :
 Prof Brian Ripley ripley at stats.ox.ac.uk writes:
 
  
  Here is how I do it:
  
  write(
  model {
 for (year in 1:N) {
  D[year] ~ dpois(mu[year])
  log(mu[year]) - b[1] + step(year - changeyear) * b[2]
 }
 for (j in 1:2) { b[j] ~ dnorm(0.0, 1.0E-6) }
 changeyear ~ dunif(1,N)
  }
  , coalmining.jags)
  
  which has the advantage that the model description is right there in 
  the .Rd file.
  
  If you want a file to be part of the package, put it in e.g.
  
  pkg source/inst/JAGSmodels/model.txt
  
  and in the example use
  
  system.file(JAGSmodels, model.txt, package=whatever)
 
   What Prof Ripley said, but also: for JAGS and WinBUGS models you can
 actually specify
 
 coalmining - function() {
  for (year in 1:N) {
  D[year] ~ dpois(mu[year])
  log(mu[year]) - b[1] + step(year - changeyear) * b[2]
 }
 for (j in 1:2) { b[j] ~ dnorm(0.0, 1.0E-6) }
 changeyear ~ dunif(1,N)
 }
 
 which is even more nicely integrated into the R code ...

... but will fail with BUGS truncation/censoring notation, which cannot
pass for R syntax.

The R2WinBUGS package has a write.model() function, which I shoplifted,
simplified (JAGS doesn't seem to be as picky as WinBUGS in tne numerical
notation department), extended (to take advantage of the JAGS' data
transform feature) for use with JAGS (which is much more convenient to
use nowadays with R on a Unix-like platform than either WinBUGS or
OpenBUGS). I planned to submit it to Martyn Plummer, but got
interrupted...

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dataframe horizontal scrolling

2010-05-10 Thread Emmanuel Charpentier
under Unix/X11 (and IIRC under Windows), edit(some.data.frame) will
invoke a smallish spreadsheet-like data editor, which will allow you to
move around your data with no fuss. I probably wouldn't use it for any
serious data entry, but found damn useful in a lot of data-debugging
situations (you know, the sort of ... thing ... that hit the fan when
you've been coaxed to accept Excel spreatsheet as dats sets).

HTH,

Emmanuel Charpentier

Le lundi 10 mai 2010 à 16:06 -0400, Michael H a écrit :
 R experts,
  
 I am working with large multivariable data frames ( 50 variables) 
 and I would like to scroll horizontally across my output
 to view each data frame rather than having to scroll down vertically-
 wrapped data frames.I have been using R Commander as a programming 
 interface. If I assign a long character string to a vector I can scroll 
 across its output easily but not a dataframe of equivalent width. I 
 just want my data frame variables to fully output horizontally rather 
 than partially with vertical wrapping, if that is possible. Any help 
 would be appreciated.
  
 Thank you,
  
 Mike
 _
 Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
 
 N:WL:en-US:WM_HMP:042010_1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmer with non integer weights

2010-04-22 Thread Emmanuel Charpentier
Sorry for this late answer (I've had a seriously nonmaskable interrupt).

Since I have technical questions not related to R, I take the liberty to
follow this up by e-mail.

I might post a followup summary if  another R problem arises...

Emmanuel Charpentier

Le lundi 19 avril 2010 à 23:40 -0800, Kay Cichini a écrit :
 hello,
 
 it's the Morisita Horn Index, which is an ecological index for similarity
 between two multivariate objects (vegetation samples with species and its
 abundance) where a value of one indicates completely same relative
 importance of species in both samples and 0 denotes total absence of any
 same species.
 
 it can be expressed as a probability:
 
 (prob. that an individual drawn from sample j 
 and one from sample k belong to the same species)
 -   = MH-INdex
 (prob. that two ind. drawn from either sample will 
 belong to same species)

[ Technical rambling ]

Hmmm ... that's the *ratio* of two probabilities, not a probability.
According to http://www.tnstate.edu/ganter/B412%20Ch%201516%
20CommMetric.html, that I found in the first page of answers to a simple
google query, in can also be thought of the ratio of two distances
between sites : (maximal distnce - actual distance) / maximal distance
(with a massive (over-?) simplification). There is no reason to think a
priori that the logit transformation (or the asin(sqrt()) transformation
has better properties for this index than any other mapping from [0 1]
to R.

(BTW, a better mapping might be from [0 1] to [0 Inf] or conversely, but
negative distances have no (obvious) meaning. Here asin(sqrt()) might
make more sense that qlogis().)

[ End of technical rambling ]

But I have trouble understanding how a similarity or distance index
can characterize *one* site... Your data clearly associate a MH.Index to
each site : what distance or similarity do you measure at this site ?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmer with non integer weights

2010-04-19 Thread Emmanuel Charpentier
Le lundi 19 avril 2010 à 03:00 -0800, Kay Cichini a écrit : 
 hi emmanuel,
 
 thanks a lot for your extensive answer.
 do you think using the asin(sqrt()) transf. can be justified for publishing
 prurpose or do i have to expect criticism.

Hmmm ... depends of your reviewers. But if an half-asleep dental surgeon
caught that after an insomnia, you might expect that a fully caffeinated
reviewer will. Add Murphy's law to the mix and ... boom !

 naivly i excluded that possibility, because of violated anova-assumptions,
 but if i did get you right the finite range rather posses a problem here.

No. your problem is that you model a probability as a smooth (linear)
finite function of finite variables. Under those assumptions, you can't
get a *certitude* (probability 0 or 1). Your model is *intrinsically*
inconsistent with your data.

In other word, I'm unable to believe both your model (linear
whathyoumaycallit regression) and your data (wich include certainties)
*simultaneously*.

I'd reconsider your 0 or 1, as meaning *censored* quantities (i. e. no
farther than some epsilon from 0 or 1), with *hard* data (i. e. not a
cooked-up estimate such as the ones i used) to estimate epsilon. There
are *lots* of ways to fit models with censored dependent variables.

 why is it in this special case an advantage? 

It's bloody hell *not* a specific advantage : if you want to fit a
linear model to a a probability, you *need* some function mapping R to
the open ]0 1[ (i. e. all reals strictly superior to 0 and strictly
inferior to 1 ; I thing that's denoted (0 1) in English/American usage).
Asin(sqrt()) does that.

However, (asin(sqrt()))^-1 has a big problem (mapping back [0 1] i. e.
*including* 0 and 1, *not* (0 1), to R) which *hides* the (IMHO bigger)
problem of the inadequacy of your model to your data ! In other words,
it lets you shoot yourself in the foot after a nice sciatic nerve
articaïne block making the operation painless (but still harmful). On
the other hand, logit (or, as pointed by Martin Maechler, qlogis), is
kind enough to choke on this (i. e. returning back Inf values, which
will make the regression program choke).

So please quench my thirst : what exactly is MH.Index supposed to be ?
How is it measured, estimated, guessed or divined ?

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmer with non integer weights

2010-04-18 Thread Emmanuel Charpentier
 and standard deviation for each variable,
   plus standard error of the mean:

 Mean  SD  Naive SE Time-series SE
deviance 534.2639 4.42164 0.0807278   0.122396
pi[1]  0.6396 0.10877 0.0019859   0.001892
pi[2]  0.2964 0.06592 0.0012036   0.001163
pi[3]  0.5083 0.04120 0.0007522   0.000805
pi[4]  0.5702 0.07170 0.0013091   0.001221
sigma.lpi[1]   3.0387 0.36356 0.0066376   0.008803
sigma.lpi[2]   2.0519 0.22665 0.0041381   0.005358
sigma.lpi[3]   1.0928 0.12531 0.0022879   0.003305
sigma.lpi[4]   0.8699 0.26650 0.0048656   0.010146

2. Quantiles for each variable:

 2.5%  25%  50%  75%97.5%
deviance 527.9142 530.9711 533.4347 536.6295 544.7738
pi[1]  0.4130   0.5683   0.6429   0.7195   0.8251
pi[2]  0.1826   0.2511   0.2909   0.3373   0.4390
pi[3]  0.4266   0.4809   0.5093   0.5359   0.5892
pi[4]  0.4183   0.5254   0.5701   0.6150   0.7106
sigma.lpi[1]   2.4362   2.7772   2.9965   3.2600   3.8395
sigma.lpi[2]   1.6606   1.8935   2.0320   2.1811   2.5598
sigma.lpi[3]   0.8803   1.0035   1.0754   1.1711   1.3670
sigma.lpi[4]   0.5022   0.6901   0.8198   0.9906   1.5235

One should also note that the deviance is *extremely* sensitive to the
choice of epsilon, whic tends to indicate that, at small epsilon values,
the sharp impossible values dominate the evaluation of the
oefficients they are involved in.

Beta models (not shown) give similar results, to a lesser extend.

In short, your sharp data are essentially incompatible with your
model, which can't be fitted unless you get at least a rough idea of
their precision.

HTH,

Emmanuel Charpentier, DDS, MSc

 best regards,
 kay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmer with non integer weights

2010-04-18 Thread Emmanuel Charpentier
Addendum to my previous answer :

In that special case, the limited range of the asin(sqrt())
transformation, which is a shortcoming, turns out to be useful. The
fixed-effect doefficients seem semi-reasonable (except for stageB) :

 (sin(coef(lm(asin(sqrt(MH.Index))~0+stage, data=similarity^2
   stageAstageBstageCstageD 
0.6164870 0.3389430 0.5083574 0.5672021 

The random effects being nested in the fixed efect, one can't afford to
be lazy in the parametrization of the corresponding random effect :

 summary(lmer(asin(sqrt(MH.Index))~stage+(stage|site),
data=similarity))
Linear mixed model fit by REML 
Formula: asin(sqrt(MH.Index)) ~ stage + (stage | site) 
   Data: similarity 
   AIC BIC logLik deviance REMLdev
 155.3 199 -62.65111.8   125.3
Random effects:
 Groups   NameVariance Std.Dev. Corr 
 site (Intercept) 0.043579 0.20876   
  stageB  0.033423 0.18282  -0.999   
  stageC  0.043580 0.20876  -1.000  0.999
  stageD  0.043575 0.20875  -1.000  0.999  1.000 
 Residual 0.128403 0.35833   
Number of obs: 136, groups: site, 39

Fixed effects:
Estimate Std. Error t value
(Intercept)  0.930360.08431  11.035
stageB  -0.308790.10079  -3.064
stageC  -0.136600.09981  -1.369
stageD  -0.077550.14620  -0.530

Correlation of Fixed Effects:
   (Intr) stageB stageC
stageB -0.836  
stageC -0.845  0.707   
stageD -0.577  0.482  0.487
 v-fixef(lmer(asin(sqrt(MH.Index))~stage+(stage|site),
data=similarity))
 v[2:4]-v[1]+v[2:4]
 names(v)[1]-stageA
 (sin(v))^2
   stageAstageBstageCstageD 
0.6429384 0.3390903 0.5083574 0.5672021 

But again, we're exploiting a shortcoming of the asin(sqrt())
transformation.

HTH,

Emmanuel Charpentier

Le vendredi 16 avril 2010 à 00:15 -0800, Kay Cichini a écrit :
 thanks thierry,
 
 i considered this transformations already, but variance is not stabilized
 and/or normality is neither achieved.
 i guess i'll have to look out for non-parametrics?
 
 best regards,
 kay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-pkgs] SOAR - Stored object caches for R

2010-04-08 Thread Emmanuel Charpentier
Le mercredi 07 avril 2010 à 18:03 +1000, William Venables a écrit :

[ Snip ... ]

 [ ... ]   The effect is
 somewhat like that of the use of the .Data directory in S-PLUS, (a
 program not unlike R), though somewhat more manually driven.

I'm not sure that such old-timers puns qualify for fortune()ification,
but this is a good one...

William :

did you consider possible interaction with other cacheing efforts such
as cachesweave ? It seems that the idea of alleviating memory/CPU time
requirements is a concern to many people (of course, the intrinsically
interactive development process that S/R allow(s) makes this a big
concern). Unifying those efforts, and maybe incorporating them in the
base interpreter, might be worthwhile...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression in an incomplete dataset

2010-04-05 Thread Emmanuel Charpentier
Dear Desmond,

a somewhat analogous question has been posed recently (about 2 weeks
ago) on the sig-mixed-model list, and I tried (in two posts) to give
some elements of information (and some bibliographic pointers). To
summarize tersely :

- a model of information missingness (i. e. *why* are some data
missing ?) is necessary to choose the right measures to take. Two
special cases (Missing At Random and Missing Completely At Random) allow
for (semi-)automated compensation. See literature for further details.

- complete-case analysis may give seriously weakened and *biased*
results. Pairwise-complete-case analysis is usually *worse*.

- simple imputation leads to underestimated variances and might also
give biased results.

- multiple imputation is currently thought of a good way to alleviate
missing data if you have a missingness model (or can honestly bet on
MCAR or MAR), and if you properly combine the results of your
imputations.

- A few missing data packages exist in R to handle this case. My ersonal
selection at this point would be mice, mi, Amelia, and possibly mitools,
but none of them is fully satisfying(n particular, accounting for a
random effect needs special handling all the way in all packages...).

- An interesting alternative is to write a full probability model (in
BUGS fo example) and use Bayesian estimation ; in this framework,
missing data are naturally modeled in the model used for analysis.
However, this might entail *large* work, be difficult and not always
succeed (numerical difficulties. Furthermore, the results of a Byesian
analysis might not be what you seek...

HTH,

Emmanuel Charpentier

Le lundi 05 avril 2010 à 11:34 +0100, Desmond Campbell a écrit :
 Dear all,
 
 I want to do a logistic regression.
 So far I've only found out how to do that in R, in a dataset of complete 
 cases.
 I'd like to do logistic regression via max likelihood, using all the study 
 cases (complete and incomplete). Can you help?
 
 I'm using glm() with family=binomial(logit).
 If any covariate in a study case is missing then the study case is dropped, 
 i.e. it is doing a complete cases analysis.
 As a lot of study cases are being dropped, I'd rather it did maximum 
 likelihood using all the study cases.
 I tried setting glm()'s na.action to NULL, but then it complained about NA's 
 present in the study cases.
 I've about 1000 unmatched study cases and less than 10 covariates so could 
 use unconditional ML estimation (as opposed to conditional ML estimation).
 
 regards
 Desmond
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multilevel modeling with count variables

2010-03-27 Thread Emmanuel Charpentier
Your request might find better answers on the R-SIG-mixed-models
list ...

Anyway, some quick thoughts :

Le vendredi 26 mars 2010 à 15:20 -0800, dadrivr a écrit :
 By the way, my concern with lmer and glmer is that they don't produce
 p-values, 

The argumentation of D. Bates is convincing ... A large thread about
these issues exist on this (R-help) list archive.

I won't get in the (real) debate about the (dubious) value of hypothesis
testing, and I'm aware that there are domains where journal editors
*require* p-values. Sigh... This insistence seems to be weakening,
however.

and the techniques used to approximate the p-values with those
 functions (pvals.fnc, HPDinterval, mcmcsamp, etc.) only apply to Gaussian
 distributions.

(Probably ?) true for pvals.fnc, false for HPDinterval and mcmcsamp. You
can always generate a sufficient number of estimates and assess a
confidence interval and a p-value from such samples. See any good text
on use of simulation in statistics...

The fly in the ointment is that current versions of lmer seem to have
problems with mcmcsamp(). Bootstrap comes to mind, but might be
non-trivial to do efficiently, especially in a hierarchical/multilevel
context.

 Given that I would likely be working with quasi-poisson
 distributions, is there a good alternative mixed effects modeling approach

glmer, Bayesian modeling through BUGS (whose probabilities
interpretation is (quite) different from p-values).

 that would output significance values?

No. You'd have to compute them yourself ... if you're able to derive the
(possibly asymptotic) distribution of your test statistic under the
hypothesis you aim to refute. This nut might be harder to crack than it
appears...

Now, to come back to your original question, you might try 1) to use
glm/glmer to model your dependent variable as a (quasi-)Poisson
variable, and 2) use log transformations of the independent cout
variables ; a more general solution is given by the Box-Cox
transformation family : see the relevant function in MASS, which also
offers the logtrans family. ISTR that John Fox's car package offers a
function aiming at finding the optimal simultaneous transformations of a
dependent variable and its predictors. Other packages I'm not aware of
might offer other solutions ; in particular, I'd recommend to peek at
Frank Harrell's Hmisc and rms packages documentation, whose wealth I did
not yet seriously assess...

Bayesian modeling with BUGS would also allow you to try to fit any model
you might wish to test (provided that it can be written as a directed
acyclic graph and that the  distributions of you variables are either
from a standard family available in BUGS or that you are able to
express the (log-)density of the non-standard distribution you wish to
use). But, again, no p-values in sight. Would you settle for Bayes
factors between two models ? or DIC comparisons ?

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recommendations on use of - operator

2010-03-18 Thread Emmanuel Charpentier
An old (Lisp ? C ?) quote, whose author escapes me now, was :

syntactic sugar causes cancer of the semicolon

.. but nowadays semicolons are rarely used in real-life R.

Emmanuel Charpentier

PS and, BTW, neither C nor Lisp have much use for semicolons... Was that
Pascal (or one of its spawns) ? 

Le jeudi 18 mars 2010 à 12:00 +, Prof Brian Ripley a écrit :
 - is usually only used when you suddenly remember you wanted to keep 
 result of a computation, especially in the days before command-line 
 editors were universal.
 
 fn(a,b) ... oh I'd better keep that ... - z
 
 On Thu, 18 Mar 2010, Dan Kelley wrote:
 
  I have never used   -  but I noticed at
 
  http://github.com/jiho/r-utils/blob/master/beamer_colors.R
 
  that some people do.
 
  In fact, the above-named code has a sort of elegance about it 
 (except 
 for the use of = for assignment...).  To my eye,  -   calls to mind a type 
 of assignment that is meant to stand out.  For example, perhaps it would make 
 sense to use  -  to assign to things that are not expected to vary through 
 the rest of the code block.
 
  Q1: is there a convention, informal or otherwise, on when to use the   -   
  operator?
 
  Q2: if there is no convention, but if people think a convention might help, 
  what would that convention be?
 
  Dan Kelley, PhD
  Professor and Graduate Coordinator
  Dept. Oceanography, Dalhousie University, Halifax NS B3H 4J1
  kelley@gmail.com (1-minute path on US server) or dan.kel...@dal.ca 
  (2-hour path on Cdn server)
  Phone 902 494 1694;  Fax  902 494 3877
 
 
 
 
 
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recommendations on use of - operator

2010-03-18 Thread Emmanuel Charpentier
Le jeudi 18 mars 2010 à 15:49 -0400, Duncan Murdoch a écrit :
 On 18/03/2010 3:10 PM, Emmanuel Charpentier wrote:
  An old (Lisp ? C ?) quote, whose author escapes me now, was :
  
  syntactic sugar causes cancer of the semicolon
  
  .. but nowadays semicolons are rarely used in real-life R.
  
  Emmanuel Charpentier
  
  PS and, BTW, neither C nor Lisp have much use for semicolons... Was that
  Pascal (or one of its spawns) ? 
 
 Not sure what you mean about C:  it's a statement terminator there. 
 Pascal also uses it, but as a statement separator.

Quite right ! And I know I'm falling in that trap every time I don't
code in C for more than a few weeks (it has been three months, now...).
That's probably because this end-of-statement role desn't strike my
psyche as important enough... (And maybe because Lisp and, to a lesser
extend, Pascal were my computer mother tongues, APL a serious crush
and Fortran something of an (almost)-read-only acquaintance).

Nice thinko...

Emmanuel Charpentier
Still recovering...
 
 Duncan Murdoch
 
  
  Le jeudi 18 mars 2010 à 12:00 +, Prof Brian Ripley a écrit :
  - is usually only used when you suddenly remember you wanted to keep 
  result of a computation, especially in the days before command-line 
  editors were universal.
 
  fn(a,b) ... oh I'd better keep that ... - z
 
  On Thu, 18 Mar 2010, Dan Kelley wrote:
 
  I have never used   -  but I noticed at
 
http://github.com/jiho/r-utils/blob/master/beamer_colors.R
 
  that some people do.
  In fact, the above-named code has a sort of elegance about it 
  (except 
  for the use of = for assignment...).  To my eye,  -   calls to mind a 
  type of assignment that is meant to stand out.  For example, perhaps it 
  would make sense to use  -  to assign to things that are not expected to 
  vary through the rest of the code block.
  Q1: is there a convention, informal or otherwise, on when to use the   - 
operator?
 
  Q2: if there is no convention, but if people think a convention might 
  help, what would that convention be?
 
  Dan Kelley, PhD
  Professor and Graduate Coordinator
  Dept. Oceanography, Dalhousie University, Halifax NS B3H 4J1
  kelley@gmail.com (1-minute path on US server) or dan.kel...@dal.ca 
  (2-hour path on Cdn server)
  Phone 902 494 1694;  Fax  902 494 3877
 
 
 
 
 
 
 
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R on Linux - a primer

2010-03-16 Thread Emmanuel Charpentier
Le dimanche 14 mars 2010 à 18:04 -0400, Axel Urbiz a écrit :
 Hi,
 
 I'm looking to move from Windows into a 64-bit Linux environment. Which is
 the best Linux Flavor to use within R? To install R on this environment, do
 I need to do any compiling?

I'd like to add two cents of folly to the (wise) advice you've received
already.

Indeed , Ubuntu is one very good distribution whose management system
has made the care and feeding of a Linux system a *lot* easier for the
not-so-system-oriented people like yours truly. Whereas my first
contacts with a Unix-like system were about 30 years ago (Oh my, how
time flies, and how far away are Xenix and our 68000 systems ...), I'm
*still* not fond of system maintenance for it's own sake. Ubuntu added
an (almost) fool-proof maintenance system to an excellent distribution
called Debian, thus lowering the Linux entry bar to as low as it can be
humanely made. Some pretended that Ubuntu was a code word for I'm too
stupid to configure Debian ; quite untrue ! It only means I'm too
busy|lazy to configure Debian, which is a Good Thing (TM).

But Debian has its strong points,  and one of them is *extremely* strong
for an R user : Dirk Eddelbuettel (whose name I'm almost surely
misspelling (sorry, Dirk !)) has created a marvelous system called
cran2deb which routinely creates binary Debian packages from (almost)
the 2000+ R packages available nowadays.

That might look small change : the basic tools used for
developing/compiling most R packages are small beer (at least by today's
standards).But some of them might depend on fiendishly
difficult-to-maintain foreign libraries. Dirk's cran2deb takes care of
that and creates any information that Debian's dpkg maintenance system
needs to automate *your* R maintenance chores by integrating them in
Debian's maintenance scheme, which is as automatic as you can get
without becoming an incomprehensible beast.

In fact, cran2deb is so good that Im seriously tempted to go back to
Debian (after almost 8 years of Debian use, Ubuntu's ease-of-use, easy
access to no-so-exotic hardware drivers (and the then-incessant
politically correct yack-yacking on some Debian mailing lists...) made
me switch to an early Ubuntu distribution). I did not yet switch back
(mostly for not-so-superficial hardware support reasons), but I
maintain a backup Debian installation for the hell of it and to test
waters. So far, they have been a lot less rough than they used to be,
but there are still occasional rows (e. g. a recent gotcha with
openoffice.org, which would have render myself unable to work with those
d*mn Word files for about a month, or forced me to do a maual repair
(which I hate...)).

So consider Debian as a (desirable) alternative to Ubuntu.

HTH,

Emmanuel Charpentier, DDS, MSc
affiliation withdrawn,
notwithstanding Frank Harrell's
whims...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use of R in clinical trials

2010-02-23 Thread Emmanuel Charpentier
Le samedi 20 février 2010 à 09:44 -0800, Dieter Menne a écrit :
 If you check
 
 http://n4.nabble.com/R-help-f789696.html
 
 you will note that this thread has the largest number of read since years.
 Looks like an encouragement to Mark to keep the mentioned CRAN document
 updated.
 
 To add a more serious note to my sarcastic comments earlier:
 
 I don't think the FDA or our national agencies in Europe are to blame. I
 know that they have eminent statisticians there who know what they are
 talking of and are much more flexible than the culprits.
 
 The people to blame for the Nobody got fired for using SAS attitude are
 reviewers and bosses with mainly medical background who make decisions. It
 the difference in the use of the term validated which leads to confusion.
 
 A method is considered validated in medicine when it has been compared
 with a gold standard and is non-inferior within reasonable limits. For
 example, in the diagnosis of a gastric ulcer gastroscopy is the gold
 standard, and cheaper or less invasive tests are measured against that. 
 
 However, sometimes gold standards are the easy way out to avoid litigation,
 and statistical evidence against these is brushed aside. Think of year it
 needed to accept the Australian bush doctor's evidence that the bacteria
 Helicobactor pylori is a main determinant for gastric ulcers; everyone
 knew that ulcers were caused by stress alone.
 
 Or gastric emptying: the gold standard is (was?) the use of a radioactive
 marker that was recorded after a meal. Since radioactivity cannot rise out
 of nothing, it was a well known fact that stomach content always goes down
 after a meal. After people started measuring the real volume of the liquid
 in the stomach with MRI imaging, it came out that the content INCREASED
 after the meal due to strong gastric secretion. Despite visible evidence
 from the images, the increase was considered not validated, because is was
 in contradiction of the gold standard.
 
 Validated in medicine means: Some well-known person has made a publication
 on the subject. He or she may be right, but not always.
 
 Mention the word three times in a discussion, and my blood pressure is at
 200 mmHg

Furthermore, the gold standard might have to be replaced by some
platinium. The FDA itself seems open to some radical changes. See :

http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm

Tho only software products explicitely mentioned in this documents
are ... WinBUGS, BRugs and OpenBugs. Not exactly a common component of
your average validated toolchain...

By the way, a validated acceptance of Bayesian methods for regulatory
purposes (for devices, admittedly, but FDAs arguments are *also*
applicable to drugs assessment) might well be a window of opportunity
for R, which offers :
- interfaces to WinBUGS
- interface to JAGS, an alternative implementation of the BUGS
language and model
- various packages for alternative MCMC and/or conjugate
estimation of posterior distributions
- a good system of production of formally validatable documents
(Sweave|odfWeave, and possibly RRPM (Bioconductor)).

Looks like R might have a head start over SAS for this new race...

However, Bert Gunter's remarks are *still* valid : a lot of money has
already been invested in SAS production systems, esp. in the
pharmaceutical industry. This industry will consider switching systems
if and only if this switch allows them to save *serious* money or to
*seriously* streamline their current process. We all know it's possible,
but the economic|financial case still has to be done. An implementation
of such a system in a new sector (namely medical devices), using the
Bayesian headstart and the FDA incentive to consider Bayesian methods,
might help making this case.

 My message: If you hear not validated  or validated, question it. 

My message : timeo apothicarios et dona ferentes

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (quite possibly OT) Re: how to make R running on a Linux server display a plot on a Windows machine

2010-02-22 Thread Emmanuel Charpentier
 be smart to
at least consider this seriously.

In short, you should discuss your options (memory, X server, VNC/RDP,
Linux) with your *local* friendly help. Again, the R help list is
*definitely* *NOT* the right place for learning about the care and
feeding of computers.

HTH,

Emmanuel Charpentier
preparing to be shot at dawn
by the list police...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data views (Re: (Another) Bates fortune?)

2010-02-07 Thread Emmanuel Charpentier
aimed at various data structure problems (including hierarchical data),
but not *the* solution to data-representation problem in R.

Any thoughts ?

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What are Type II or III contrast? (contrast() in contrast package)

2010-02-03 Thread Emmanuel Charpentier
Le mercredi 03 février 2010 à 00:01 -0500, David Winsemius a écrit :
 On Feb 2, 2010, at 11:38 PM, Peng Yu wrote:
 
  ?contrast in the contrast package gives me the following description.
  However, I have no idea what Type II and III contrasts are. Could
  somebody explain it to me? And what does 'type' mean here?
 
 *‘type’*: set ‘type=average’ to average the individual contrasts
 (e.g., to obtain a Type II or III contrast)
 
 In no particular order:
 http://courses.washington.edu/b570/handouts/type3ss.pdf
 http://core.ecu.edu/psyc/wuenschk/SAS/SS1234.doc
 http://n4.nabble.com/a-kinder-view-of-Type-III-SS-td847282.html
 
 Don't expect any follow-up questions to be answered or further  
 citations offered. This is really more in the realm of statistics  
 education than an R specific question.

Nonwhistanding David Winsemius' closing remark, I'd like to add
something that should be requested reading (and maybe hinted at in
lm()'s help page) :

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

(BTW, despite is age, MASS *is* requested reading, and Bill Venables'
exegeses should be part of it).

HTH,

--
Emmanuel Charpentier, DDS, MsC :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What are Type II or III contrast? (contrast() in contrast package)

2010-02-03 Thread Emmanuel Charpentier
Le mercredi 03 février 2010 à 09:23 -0600, Peng Yu a écrit :
 On Wed, Feb 3, 2010 at 2:12 AM, Emmanuel Charpentier
 charp...@bacbuc.dyndns.org wrote:
  Le mercredi 03 février 2010 à 00:01 -0500, David Winsemius a écrit :
  On Feb 2, 2010, at 11:38 PM, Peng Yu wrote:
 
   ?contrast in the contrast package gives me the following description.
   However, I have no idea what Type II and III contrasts are. Could
   somebody explain it to me? And what does 'type' mean here?
  
  *‘type’*: set ‘type=average’ to average the individual contrasts
  (e.g., to obtain a Type II or III contrast)
 
  In no particular order:
  http://courses.washington.edu/b570/handouts/type3ss.pdf
  http://core.ecu.edu/psyc/wuenschk/SAS/SS1234.doc
  http://n4.nabble.com/a-kinder-view-of-Type-III-SS-td847282.html
 
  Don't expect any follow-up questions to be answered or further
  citations offered. This is really more in the realm of statistics
  education than an R specific question.
 
  Nonwhistanding David Winsemius' closing remark, I'd like to add
  something that should be requested reading (and maybe hinted at in
  lm()'s help page) :
 
  http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf
 
  (BTW, despite is age, MASS *is* requested reading, and Bill Venables'
  exegeses should be part of it).
 
 Do you by any means indicate that MASS describes Type I, II, III, IV
 contrast? Although MASS describes contrasts, but I don't think it
 describes Type I, II, III, IV contrast.

it does not. But it explains (tersely) *WHY* one has *serious* logical
difficulties when tryng to interpret contrasts not abiding to
marginality principle.

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] system vs shell wait command

2010-01-24 Thread Emmanuel Charpentier
Le dimanche 24 janvier 2010 à 04:25 -0800, robbert blonk a écrit :
 Dear All,
 
 I try to invoke a second program (called gencont) within a R script.
 However this second program needs some time to finish. After finishing the
 output should be read into R and used for further calculation.
 
 Under windows, it perfectly works:
 
 result-shell(gencont  yn.txt,intern=TRUE,wait=TRUE) 
 # R waits untill the program finishes (apporx 3 min) and uses the result as
 an object. yn.txt contains the answer to a question prompted by the second
 program before running
 resultf-result[length(result)] 
 # reads the last line, which contains the required result
 
 However, At our Linux server, shell() doesn't work. Here, I'll need
 system()...
 result-system(./gencont.exe  yn.txt,intern=TRUE,wait=TRUE) #slightly
 different

You're using a Windows executable through wine (maybe transparently
via mod_binmisc) ? And this executable prints its output on the
console ? Right ?

 resultf-result[length(result)] #same
 
 The problem now is that R under Linux does not wait for the second program
 to finish, even if wait = TRUE...how come?

Might be wine, might be, more probably, gencont.exe : I have encountered
some windows programs whose main purpose was to set things up and launch
a second process which did the real work. Cases in point : Office 2000
and Office 2003 installers [yuck !], for example.

FWIW, system(wineconsole cmd) opens a new window and waits for it's
user to type exit. It returns 0 as a result.

ISTR that Windows98' COMMAND.COM had an (internal ?) command called
start launching another process, with an option to wait or not for it
to terminate before returning ; this could be true of Windows 2000 and
later, but I can't check right now. Wine's own cmd.exe has a start
program :

wineconsole --backend=curses cmd

CMD version 1.1.31

Z:\home\charpentstart
Lance un programme, ou ouvre un document dans le programme normalement
utilisé avec cette extension.
Usage :
start [options] fichier_programme [...]
start [options] fichier_document

Optionss:
/M[inimized] Lance le programme minimisé.
/MAX[imized] Lance le programme maximisé.
/R[estored]  Lance le programme normalement (ni minimisé ni maximisé).
/W[ait]  Attend que le programme lancé se termine, et termine
ensuite avec son code de sortie.
/ProgIDOpen  Open a document using the following progID.
/L   Montre la licence d'utilisation.

start.exe version 0.2 Copyright (C) 2003, Dan Kegel
Start est fourni sans AUCUNE GARANTIE  ; pour les ddtails lancez avec
l'option /L
.
Ceci est un logiciel libre, et vous  êtes invité à le redistribuer
sous certaines condition; lancez 'start /L' pour les détails.
Z:\home\charpent

This might help you ...

 What should I do?

I seriously doubt that system') can read back wht is spit in the
console. Better redirect that to another text file.

Some options come to mind, but beware : I didn't try any of them, not
having your gencont.exe program.
- system(wineconsole cmd gencont.exe  yn?txt  result.txt), then
result-read.lines(result.txt) ?
- ftp your input to a windows box, ssh -f yourwindowsbox gencont.exe
 yn.txt  results.txt, ftp your results back (or, equivalently,
working on a Windows share) ?
- port gencont.exe to linux ? :-) (this might seem silly, but might be
the easiest solution if this program is (as it seems according to your
partial explanations) a simple filter (read data, munch them, spit
output) not using windows-specific functions or system calls).

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to use SQL code in R

2009-11-16 Thread Emmanuel Charpentier
Le lundi 16 novembre 2009 à 13:14 +0800, sdlywjl666 a écrit :
 Dear All,
   How to use SQL code in R?

Depends on what you mean by using SQL code in R...

If you mean Query, update or otherwise mess with an existing database,
look at the RODBC packages and/or various RDBI-related packages (hint :
look at BDR's nice manual on importing/exporting data in R which comes
with the standrd R documentation).

If you mean use standard SQL code to manipilte R dataframes, look at
the sqldf ackage (quite nice, but be aware that is work in progress).

If you mean use Oracle's command language or pgsql or ... in order to
transfer in R code written for a specific database, I'm afraid your
SOL... In that last case, however, note that you may be able to use R
*inside* your external database instead, depending on its ability to use
external code (I'm thinking of the rpgsql language, which allows running
R code in a pgsql function in PostgreSQL).

HTH,

Emmanuel Charpentier

BTW, there exist a R database Special Interest Group, with a mailing
list. Lookup their archive, and maybe ask a (slightly less general, if
possible) version of your question there...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What parts of 'Statistical Models in S' are not applicable to R?

2009-11-11 Thread Emmanuel Charpentier
Le mercredi 11 novembre 2009 à 10:22 +0100, Uwe Ligges a écrit :
 
 Peng Yu wrote:
  According to Amazon review, 'Statistical Models in S' is a key
  reference for understanding the methods implemented in several of
  S-PLUS' high-end statistical functions, including 'lm()', predict()',
  'design()', 'aov()', 'glm()', 'gam()', 'loess()', 'tree()',
  'burl.tree()', 'nls()' and 'ms()'.
  
  But since it is for S, some part of the book may not be applicable to
  R. Some examples (e.g. interaction.plot()) discussed in this book are
  not available in R. Without, working examples, it is sometimes
  difficult for me to understand the materials in the book.
  
  Besides the functions mentioned in the Amazon review, could somebody
  give me a hint on what chapters (or sections) in this book are not
  appropriate to R?
 
 
 They all are appropriate, but nuances differ these days, as some nuances 
 differ for recent S-PLUS versions, 17 years later. It should still be 
 fine to learn some relevant concepts.

You could also note that, at least in the 4th (last) edition of the
book, the authors have marked passages with differences between R and S+
with a marginal R.

Now this book has grow a bit out of date since its lst edition
(2002 ?) : Newer R packages implements various things previously not
implemented in R (e.g. multiple comparisons after ANOVA, previously
available in S+ with the multicomp function, nd implemented (with a
lot more generalizability) in the multcomp package).

A 5th edition might be in order, but that would be a *daunting* task :
The language R has grew (e. g. namespaces), the nature and extend of
avilable tasks has grew *enormously*, and I don't think that producing a
book that would be to 2009 R what VR4 was to 2002 R is doable by a
two-person team, as talented, dedicated and productive as these two
persons might be (modulo Oxford sarcasm :-). Furthermore, these two
persons already give an enormous amount of time and effort to other R
development (search for R-help activity of BV and BDR, or meditate on
the recently published stats on R source activity...).

Such a document would probably have to be something other than a book to
stay up to date and accurate, and even coordinating such a task would
need serious time... Even if it would exclude anything present in the
vrious packages help files, and should limit to tutorial introductions,
examples and discussions, the sheer volume (1700+ packages last time I
looked) and the difficulty of coordination (how do you discuss 5
different packages, implementing various means to solve the same
problem ?) would involve serious organizational difficulties.

So I doubt such a document will get produced in the foreseeable future.
Frequent R-help reading and note-taking is the second-best option...

To come back to R-vs-S+ topic : unless I'm mistaken, R seems to be
currently the dominant version of the S language, and most published S
material will nowadays (implicitly) be aimed at R. This should reduce
the importance of the problem.

Sincerely,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Models for Discrete Choice in R

2009-11-09 Thread Emmanuel Charpentier
Le dimanche 08 novembre 2009 à 19:05 -0600, Frank E Harrell Jr a écrit :
 Emmanuel Charpentier wrote:
  Le dimanche 08 novembre 2009 à 17:07 -0200, Iuri Gavronski a écrit :
  Hi,
 
  I would like to fit Logit models for ordered data, such as those
  suggested by Greene (2003), p. 736.
 
  Does anyone suggests any package in R for that?
  
  look up the polr function in package MASS (and read the relevant pages
  in VR4 and some quoted references...) or the slightly more
  sophisticated (larger range of models) lrm function in F. Harrell's
  Design (now rms) packge (but be aware that Design is a huge beast witch
  carries its own computing universe, based on (strong) Harrell's view
  of what a regression analysis should be : reading his book is, IMHO,
  necessary to understand his choices and agree (or disgree) with them).
  
  If you have a multilevel model (a. k. a. one random effect grouping),
  the repolr packge aims at that, but I've been unable to use it
  recently (numerical exceptions).
  
  By the way, my dependent variable is ordinal and my independent
  variables are ratio/intervalar.
  
  Numeric ? Then maybe some recoding/transformation is in order ... in
  which case Design/rms might or might not be useful.
 
 I'm not clear on what recoding or transformation is needed for an 
 ordinal dependent variable and ratio/interval independent variables, nor 
 why rms/Design would not be useful.

I was thinking about transformations/recoding of the *independent*
variables... 

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme4 and incomplete block design

2009-11-08 Thread Emmanuel Charpentier
, 0, 0, 0, 1, 
0)) 
   AIC   BIC logLik deviance
 87.33 98.72 -38.6777.33
Random effects:
 Groups NameVariance Std.Dev.
 block  (Intercept) 0.86138  0.9281  
Number of obs: 72, groups: block, 24

Fixed effects:
Estimate Std. Error z value Pr(|z|)   
(Intercept)  -1.9246 0.7117  -2.704  0.00684 **
treatment21.3910 0.8568   1.624  0.10446   
treatment30.4527 0.9163   0.494  0.62124   
treatment40.4526 0.9163   0.494  0.62131   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Correlation of Fixed Effects:
   (Intr) trtmn2 trtmn3
treatment2 -0.775  
treatment3 -0.721  0.598   
treatment4 -0.721  0.598  0.558


In the first case (your original data), treatment is interpreted as
a numeric (quantitative) variable , and whr lmre estimtes is a logistic
regression coefficient of the outcome n this numeric variable. Probbly
nonsensical, unless you hve reason to thin that your factor is ordered
and should be treated as numeric).

In the second case, treatment is a factor, so you get an estimate for
each treatment level except the first, to be interpreted as difference
of means with the first level.

I fell in that trap myself a few times, and took the habit to give evels
to my fctors tht cannot be interpreted as numbers (such as f-paste(F,
as.character(v))).

 Any help is appreciated!

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Models for Discrete Choice in R

2009-11-08 Thread Emmanuel Charpentier
Le dimanche 08 novembre 2009 à 17:07 -0200, Iuri Gavronski a écrit :
 Hi,
 
 I would like to fit Logit models for ordered data, such as those
 suggested by Greene (2003), p. 736.
 
 Does anyone suggests any package in R for that?

look up the polr function in package MASS (and read the relevant pages
in VR4 and some quoted references...) or the slightly more
sophisticated (larger range of models) lrm function in F. Harrell's
Design (now rms) packge (but be aware that Design is a huge beast witch
carries its own computing universe, based on (strong) Harrell's view
of what a regression analysis should be : reading his book is, IMHO,
necessary to understand his choices and agree (or disgree) with them).

If you have a multilevel model (a. k. a. one random effect grouping),
the repolr packge aims at that, but I've been unable to use it
recently (numerical exceptions).

 By the way, my dependent variable is ordinal and my independent
 variables are ratio/intervalar.

Numeric ? Then maybe some recoding/transformation is in order ... in
which case Design/rms might or might not be useful.

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bootstrap inference for the sample median?

2009-08-30 Thread Emmanuel Charpentier
Le dimanche 30 août 2009 à 18:43 +0530, Ajay Shah a écrit :
 Folks,
 
 I have this code fragment:
 
   set.seed(1001)
   x - c(0.79211363702017, 0.940536712079832, 0.859757602692931, 
 0.82529998629531, 
  0.973451006822, 0.92378802164835, 0.996679563355802,
  0.943347739494445, 0.992873542980045, 0.870624707845108, 
 0.935917364493788)
   range(x)
   # from 0.79 to 0.996
 
   e - function(x,d) {
 median(x[d])
   }
 
   b - boot(x, e, R=1000)
   boot.ci(b)
 
 The 95% confidence interval, as seen with `Normal' and `Basic'
 calculations, has an upper bound of 1.0028 and 1.0121.
 
 How is this possible? If I sample with replacement from values which
 are all lower than 1, then any sample median of these bootstrap
 samples should be lower than 1. The upper cutoff of the 95% confidence
 interval should also be below 1.

Nope. Normal and Basic try to adjust (some form of) normal
distribution to the sample of your statistic. But the median of such a
small sample is quite far from a normal (hint : it is a discrete
distribution with only very few possible values, at most as many value
as the sample. Exercise : derive the distribution of median(x)...).

To convince yourself, look at the histogram of the bootstrap
distribution of median(x). Contrast with the bootstrap distribution of
mean(x). Meditate. Conclude...

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Testing year effect in lm() ***failed first time, sending again

2009-08-06 Thread Emmanuel Charpentier
Le mercredi 05 août 2009 à 09:52 -0700, Mark Difford a écrit :
 Emmanuel,
 
  somewhat incomplete  help pages : what in h*ll are valid arguments to
  mcp() beyond Tukey ??? Curently, you'll have to dig in the source to
  learn that...).
 
 Not so: they are clearly stated in ?contrMat.

Oops.. I oversaw that.

With my apologies,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Testing year effect in lm() ***failed first time, sending again

2009-08-05 Thread Emmanuel Charpentier
Le jeudi 30 juillet 2009 à 16:41 -0600, Mark Na a écrit :
 Dear R-helpers,
 
 I have a linear model with a year effect (year is coded as a factor), i.e.
 the parameter estimates for each level of my year variable have significant
 P values (see some output below) and I am interested in testing:
 
 a) the overall effect of year;
 b) the significance of each year vis-a-vis every other year (the model
 output only tests each year against the baseline year).

install.packges(multcomp)
help.start()

and use the vignettes ! They're good (and are a good complement to
somewhat incomplete  help pages : what in h*ll are valid arguments to
mcp() beyond Tukey ??? Curently, you'll have to dig in the source to
learn that...).

HTH

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odfWeave : sudden and unexplained error

2009-08-01 Thread Emmanuel Charpentier
Dear list, dear Max,

I a currently working on a report. I'm writing it with OpenOffice.org
and odfWeave. I'm working increentally : I write a bit, test
(interactively) some ideas, cutting-and-pasting code to the Ooo report
when satisfied with it. I the process, I tend to recompile the .odt
source a *lot*.

Suddenly, odfWeave started to give me an incomprehensible error even
before starting the compilation itself (InFile is my inut .odt file,
Outfile is the resultant .odt file) :

 odfWeave(InFile, OutFile)
  Copying  SrcAnalyse1.odt 
  Setting wd to  /tmp/RtmphCUkSf/odfWeave01225949667 
  Unzipping ODF file using unzip -o SrcAnalyse1.odt 
Archive:  SrcAnalyse1.odt
 extracting: mimetype
  inflating: content.xml 
  inflating: layout-cache
  inflating: styles.xml  
 extracting: meta.xml
  inflating: Thumbnails/thumbnail.png  
  inflating: Configurations2/accelerator/current.xml  
   creating: Configurations2/progressbar/
   creating: Configurations2/floater/
   creating: Configurations2/popupmenu/
   creating: Configurations2/menubar/
   creating: Configurations2/toolbar/
   creating: Configurations2/images/Bitmaps/
   creating: Configurations2/statusbar/
  inflating: settings.xml
  inflating: META-INF/manifest.xml   

  Removing  SrcAnalyse1.odt 
  Creating a Pictures directory

  Pre-processing the contents
Erreur : cc$parentId == parentId is not TRUE
 
Perusing the documentation and the r-help list archives didn't turn up
anything relevant. This error survived restarting OOo, restarting R,
restarting its enclosing Emacs session and even rebooting the damn
hardware...

Any idea ?

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to perform a Likelihood-ratio test?

2009-07-13 Thread Emmanuel Charpentier
Le lundi 13 juillet 2009 à 13:04 +0200, Lars Bergemann a écrit :
 Hi.
 
  
 
 I would like to use a likelihood-ratio test to compare a linear model (lm()) 
 to another linear model containing the first one to see if the extra factors 
 are needed - but I was unable to find any help on how to do that.
 
 Could you help me and tell me what to do? Or tell me where to find help on 
 this topic?

?anova? Check the test argument ...:-)

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unix commands on R

2009-07-08 Thread Emmanuel Charpentier
Le mercredi 08 juillet 2009 à 16:36 -0400, suman Duvvuru a écrit :
 I am using R on unix. While in R how do I execute the unix shell commands?
 Is there a way to do it? I am executing a function in R and the matrix
 resulting from the function has to be passed on as an input to unix command.
 
 
 Any directions will be helpful.

?system
?capture
?textConnection

These three functions (I'm not sure bout the real name of the second)
should allow you to get the standard output of a Unix command line.

Consider also cpturing your command output in a file (reusable...).

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odfWeave : problems with odfInsertPlot() and odfFigureCaption()

2009-07-01 Thread Emmanuel Charpentier
Dear Max,

Le mardi 30 juin 2009 à 21:36 -0400, Max Kuhn a écrit :
 Well, on this one, I might bring my record down to commercial software
 bug fix time lines...

Having actually looked at the code before wailing, I had such a hunch...

[ Snip ... ]

 We had to do some interesting things to get figure captions working
 with odfWeave. You would think that the figure caption function just
 write out some xml and are done (like the insert plot code and the
 table caption code). Unfortunately, it doesn't.
 
 It writes the caption to a global variable (I know, I know) and other
 code actually writes the caption. We had good reason to do this,
 mostly related to the complexity associated with the context of the
 code chunk (e.g. was it inside a table, at paragraph, etc). I've been
 thinking about this one for a while since the current implementation
 doesn't allow you to pass the figure caption to other functions (it
 writes a global variable and returns NULL). I haven't come up with a
 good solution yet.

ISTR from my Lisp/Scheme days that a good heuristic in this kind of case
is to try to build a closure and return that to a wrapper.
Unfortunately, R doesn't have anything as simple to use as let/let*, and
you have to play fast and loose with environments...

 aside
 I've not been as prompt with odfWeave changes in comparison to other
 packages (many of you have figured this out). This is mostly due to
 the necessary complexity of the code. In a lot of ways, I think that
 we let the feature set be too rich and should have put some constrains
 on the availability of code chunks (e.g. only in paragraphs).

I didn't even think of this. A priori, that's hard to do (whereas
(\La)Texw will need \par or *two* newlines to end a paragraph and
start another, oowriter will take one newline as an end of paragraph and
will emit the corresponding XML.

Come to think of it, you might try to insert a chunk in a table case, or
in a footnote...

And I don't know about other ODF tools...

   I want
 to add more features (like the one that you are requesting), but
 seemingly small changes end up being a considerable amount of work
 (and I don't time). I've been thinking about simplifying the code (or
 writing rtfWeave) to achieve the same goals without such a complex
 system.

What about something along the line of :
foo-odfTable(something)
bar-odfInsertPlot(somethingelse)
odfCaptionize(foo,Something)
odfCaptionize(bar(Something else)
?

 The ODF format is a double-edged sword. It allows you to do many more
 things than something like RTF, but makes it a lot more difficult to
 program for. Take tables as an example. The last time I looked at the
 RTF specification, this was a simple markup structure. In ODF, the
 table structure is somewhat simple, but the approach to formatting the
 table is complex and not necessarily well documented.

Indeed...

For example,
 some table style elements go into styles.xml whereas others go into
 contest.xml (this might be implementation dependent)
 
 So, I could more easily expand the functionality of odfWeave by
 imposing some constraints about where the code chunks reside. The
 earlier versions of odfWeave did not even parse the xml; we got a lot
 done via regular expressions and gsub. However, we went to a xml
 parsing approach when we found out that the context of the code chunk
 matters.
 
 Feedback is welcome: do people actually use code chunks in places
 other than paragraphs?
 /aside

I didn't even think of it. But now that you tell it, you got me a nice
idea for a workaround ... :-)

 So, back to your question, the relevant code for captions is in
 odfWeave:::withCaptionXML (odfInsertPlot uses this to write the xml).
 You can try that and I might be able to look at it in the next few
 days.

I did look that. Tjhe problem is that your code has dire warnings not to
call it with a caption more than once in a figure=TRUE (sic..) chunk.

 Thanks,

*I* thank *you, Max,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odfWeave : problems with odfInsertPlot() and odfFigureCaption()

2009-06-30 Thread Emmanuel Charpentier
Dear Max, dear list,

I have trouble using odfInsertPlot to insert plots outside fig=TRUE
chunks.

My goal is to dump a bunch of (relatively uninteresting, but required)
graphs from within a loop, possibly interspeded with some tables. This
is not possible within a normal fig=TRUE chunk.

odfInsertPlot() is able to insert a previously saved plot. When used in
a fig=TRUE, results=hide chunk, it produces just a blank frame (with a
caption if caption=odfFigureCaption() is used).

Used in a fig=TRUE, results=xml chunk, it produces an error at file
rte-assembly (see end of post). However, when wrapped in a cat(), the
same code produces 1) the figure (without caption), then a blank frame
with the caption, or just the figure (without caption) if the caption=()
argument is absent.

The same cat(odfInsertPlot()) construct can be used in a results=xml
chunk (no fig=TRUE) ; however, I found no way to insert the caption.

Therefore : I can programmatically create graphs and insert them via
results=xml chunks, but I can't caption them.

Do someone has a workaround (except reprogramming odfFigureCaption()
with a new argument taking the output of odfInsertPlot()) ?

Sincerely,

Emmanuel Charpentier

Annex : log of compilation with fig=true, results=xml.
 odfWeave(Test1Src.odt, Test1.odt)
  Copying  Test1Src.odt 
  Setting wd to  /tmp/RtmpenMTgz/odfWeave3023572231 
  Unzipping ODF file using unzip -o Test1Src.odt 
Archive:  Test1Src.odt
 extracting: mimetype
  inflating: content.xml 
  inflating: layout-cache
  inflating: styles.xml  
 extracting: meta.xml
  inflating: Thumbnails/thumbnail.png  
  inflating: Configurations2/accelerator/current.xml  
   creating: Configurations2/progressbar/
   creating: Configurations2/floater/
   creating: Configurations2/popupmenu/
   creating: Configurations2/menubar/
   creating: Configurations2/toolbar/
   creating: Configurations2/images/Bitmaps/
   creating: Configurations2/statusbar/
  inflating: settings.xml
  inflating: META-INF/manifest.xml   

  Removing  Test1Src.odt 
  Creating a Pictures directory

  Pre-processing the contents
  Sweaving  content.Rnw 

  Writing to file content_1.xml
  Processing code chunks ...
1 : term hide(label=ProlégomènesStandard)
2 : term xml(label=EssaiGraphe1)
3 : echo term verbatim(label=TestChunk)
4 : term xml(label=SecondChunk)

  'content_1.xml' has been Sweaved

  Removing content.xml

  Post-processing the contents
error parsing attribute name
attributes construct error
xmlParseStartTag: problem parsing attributes
Couldn't find end of Start Tag draw:frame line 6
error parsing attribute name
attributes construct error
xmlParseStartTag: problem parsing attributes
Couldn't find end of Start Tag draw:image line 6
Opening and ending tag mismatch: text:p line 6 and draw:frame
Opening and ending tag mismatch: office:text line 2 and text:p
Opening and ending tag mismatch: office:body line 2 and office:text
Opening and ending tag mismatch: office:document-content line 2 and
office:body
Extra content at the end of the document
Erreur : 1: error parsing attribute name
2: attributes construct error
3: xmlParseStartTag: problem parsing attributes
4: Couldn't find end of Start Tag draw:frame line 6
5: error parsing attribute name
6: attributes construct error
7: xmlParseStartTag: problem parsing attributes
8: Couldn't find end of Start Tag draw:image line 6
9: Opening and ending tag mismatch: text:p line 6 and draw:frame
10: Opening and ending tag mismatch: office:text line 2 and text:p
11: Opening and ending tag mismatch: office:body line 2 and office:text
12: Opening and ending tag mismatch: office:document-content line 2 and
office:body
13: Extra content at the end of the document


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NaiveBayes fails with one input variable (caret and klarR packages)

2009-06-30 Thread Emmanuel Charpentier
Le mardi 30 juin 2009 à 14:51 -0400, Max a écrit :
 I just put a new version on cran...

Now, *that's* support !!! bug report at 17:31, acknowledgement at 20:12,
(probable) bug fix at 20:51. Bravo, bravissimo, Max !

Try that, SAS support !

Emmanuel Charpentier

And I insist that this is not ironic in the least... It's genuinely
admirative.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] questions about meta-analysis

2009-06-27 Thread Emmanuel Charpentier
Le samedi 27 juin 2009 à 13:02 +0800, sdzhangping a écrit :
 Dear R users:
 
 In the example of meta-analysis (cochrane, package rmeta), I can not
 found the p-value of Test for overall effect, and some other indices
 (Z, I, weight and et al).  How can I get the these indices listed? 
  library(rmeta)
  data(cochrane)
  cochrane
   name ev.trt n.trt ev.ctrl n.ctrl
 1 Auckland 36   532  60538
 2Block  169   5 61
 3Doran  481  11 63
 4Gamsu 14   131  20137
 5 Morrison  367   7 59
 6 Papageorgiou  171   7 75
 7  Tauesch  856  10 71
  a=meta.MH(n.trt,n.ctrl,ev.trt,ev.ctrl,names=name,data=cochrane)
  summary(a)
 Fixed effects ( Mantel-Haenszel ) meta-analysis
 Call: meta.MH(ntrt = n.trt, nctrl = n.ctrl, ptrt = ev.trt, pctrl =
 ev.ctrl, 
 names = name, data = cochrane)
 
OR (lower  95% upper)
 Auckland 0.580.38   0.89
 Block0.160.02   1.45
 Doran0.250.07   0.81
 Gamsu0.700.34   1.45
 Morrison 0.350.09   1.41
 Papageorgiou 0.140.02   1.16
 Tauesch  1.020.37   2.77
 
 Mantel-Haenszel OR =0.53 95% CI ( 0.39,0.73 ) (where is Z and
 p-value ?)
 Test for heterogeneity: X^2( 6 ) = 6.9 ( p-value 0.3303 )

You might easily recompute them : use the confidence interval to
guesstimate the standard deviation of OR, its value and the damn Z and p
you long for... Remember that it's log(OR) that is (asymptotically)
normally distributed. The weights are proportional to inverse of
variance and sum to 1. You might peek at the source code for
enlightment...

Alternatively, you may grow lazy an use the meta package, whose
function metabin() will give you all that. The recent metafor package
seems also quite interesting.

Another (smarter(?)) alternative is to ask yourself *why* you need Z and
p. Z is just a computing device allowing you to use a known density. And
the real meaning of p and its usefulness has been discussed, disputed
and fought over at exceedingly large lengths in the past 80 years.

Of course, if you have an instructor to please ...

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA with means and SDs as input

2009-06-26 Thread Emmanuel Charpentier
Le jeudi 25 juin 2009 à 12:15 +0200, Sebastian Stegmann a écrit :
 Dear R-community,
 
 I'm struggling with a paper that reports only fragmented results of a
 2by2by3 experimental design. However, Means and SDs for all cells are given.
 
 Does anyone know a package/function that helps computing an ANOVA with only
 Means and SDs as input (resulting in some sort of effect size and
 significance test)?
 
 The reason why I'm interested is simple: I'm conducting a meta-analysis. If
 I included only what the authors would like to show the world, the results
 would be somewhat biased...

What you aim to do is not, as far as I can tell, possible using
already-programmed solutions in R packages. Some packages offer very
close solutions (e. g. lme()) but lack some necessary options (e. g. the
ability to fix intra-level variance in lme() : varFixed() works as
documented, not as what its name suggests...).

Why not use some established meta-analysis package such as meta or
rmeta ? You may have to recompute some variances or effect sizes, but it
can usually be done.

If you aim to go a bit farther (i. e. more than 2-groups comparisons),
the newly-released metafor package seems quite interesting. Wolfgang
Viechtbauer aimed to offer a tool for meta-regression, and the current
stage of the package (0.5.something IIRC) seems to be quite useable, if
somewhat counter-intuitive in places.

I'd recommend to have a look at it. WV didn't think useful to include
the vignettes and documentation for mima(), predecessor of metafor,
which explains quite nicely the motivation and algorithms he used. Too
bad... However, this documentation is still available on his Web site.
Be aware that the current main interface to the rma() function
(workhorse of the package) does not (yet ?) follow the usual format for
regression-like functions common to many R packages (the (formula, data,
subset, etc...) format). For example, you may make meta-regression, but
you'll have to code the model matrix yourself. Similarly, the necessary
bricks for (e. g.) indirect comparisons are here, but you'll have to
build them yourself (bring your wheelbarrow an some mortar...).

An alternative to be seriously considered : Bayesian modeling.
Meta-analysis is one of the rare medical domains where journal reviewers
won't immediately reject a paper at the sight of the B***sian word... As
far as I know, psychological journals are not as asinine. Here again,
some manual work will be in order. Some tutorial material is readily
available (google Bayesian meta analysis tutorial, which should lead
you to the various Statistics in Medicine tutorials, among other
things).

 I've combed through the web and my various books on R, but it's not that
 easy to find. Or is it?

Variance analysis through manual decomposition of sum of squares ?
Hugh ... I did that when I first learned anova(so long a time ago that I
am uncomfortable to think of it. In these times, even a pocket
calculator was too expensive for a random student (therefore forbidden
in exams), and I learned to cope with slide rule...). Tiring and messy.
Furthermore, manual algorithms (i. e. simple, non-iterative
algorithms) exists only for the simplest (e. g. one-way anova) or ideal
(e. g. balanced two-way anova with equal group sizes) cases ;
furthermore, modern critiques of the bases of such works cast some
doubts on their foundations (e. g. random- and mixed-effect anova)...

Therefore, the art is fast forgotten, if not totally lost. If you
insist, I might try to unearth my copy of Winer's handbook (ca. 1959)
and look up specific questions.

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] User defined GLM?

2009-06-22 Thread Emmanuel Charpentier
Le lundi 22 juin 2009 à 09:35 -0700, francogrex a écrit :
 Hello, I have this generalized linear formula:
 log(x[i]/n[i])=log(sum(x)/sum(n)) + beta[i]

I do not understand the problem as stated. if x[i] and n[i] are known,
and unless sum(n)=0, your dataset reduces to a set of nrow(dataset)
independent linear equations with nrow(dataset) unknowns (the beta[i]),
whose solution is trivially beta[i]=log(x[i]/n[i])-log(sum(x)/sum(n),
except when n[i]=0 in which case your equation has no solution.

Could you try to re-express your problem ?

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] large numbers of observations using ME() of spdep

2009-06-06 Thread Emmanuel Charpentier
Le dimanche 07 juin 2009 à 02:50 +0900, lucero mariani a écrit :
 Dear All,
 We aim to remove the spatial structure of our data using Moran Eigen
 Vectors and spdep package . Our data has 3694 samples and 13
 variables.
 The computer stop working after almost 4 days of processing (we found
 it emitting a sharp sound and with all colors on the screen. No
 wories, it was restared without problem!).

Cool ! Farm it out to a night club (and get a real computer with the
proceedings).

Sorry. Couldn't resist !

And twice sorry to be unable to help with your *real* problem...

Emmanuel Charpentier

 And we are left with
 nothing: no result file was produced since the calculations were
 interrumpted!
 Consequently, I am looking for a way to accelerate calculations with
 ME() of spdep and to save the results step by step (for each eigen
 value).
 I came accross several promissing ideas: 1)make a compiled program in
 C and call it in R with the command system(); 2)use muliR. However, I
 do not know how to use these tools. I do not understand exactely why
 the calculations are so slow and I do know know how to write a program
 in C...
 I am using Ubuntu 9.04 and R version 2.8.1 .
 
 The code in R is a fallows:
 taimin.xy = taimin[c(dy, dx)];
 coordinates(taimin.xy) - ~dy+dx; #dy, dx, name of the respective
 coordinate columns
 coords-coordinates(taimin.xy);
 library(spdep);
 TaiminGabrielGraph-gabrielneigh(coords, nnmult = 12);
 tai.gab.nb-graph2nb(TaiminGabrielGraph,sym=TRUE);
 nbtaim_distsg - nbdists(tai.gab.nb, coords);
 nbtaim_simsg - lapply(nbtaim_distsg, function(x) (1-((x/(4*50))^2)) );
 MEtaig.listw - nb2listw(tai.gab.nb, glist=nbtaim_simsg, style=B);
 sevmtaig - ME(Presabs
 ~Age+Curv+Zon2005+ZoneMeiji+Wi+Sol+Slope+Seadist+Elev+Basin,data=taimin,
 family=binomial,listw=MEtaig.listw)
 
 Any help is welcome!
 
 Thanks
 
 Lucero Mariani, Yokohama National University, Japan.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] New Meta-Analysis Package (metafor)

2009-06-05 Thread Emmanuel Charpentier
Le vendredi 05 juin 2009 à 14:03 +0200, Viechtbauer Wolfgang (STAT) a
écrit :
 A new package is now available via CRAN, called metafor.
 
 The metafor package consists of a collection of functions for
 conducting meta-analyses in R. Fixed- and random-effects models (with
 and without moderators) can be fitted via the general linear
 (mixed-effects) model. For 2x2 table data, the Mantel-Haenszel and
 Peto's method are also implemented. The package also provides various
 plot functions (for example, for forest, funnel, and radial plots) and
 functions for assessing the model fit and obtaining case diagnostics.

Thank you very much ! I think that this package was expected ...

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Small mystery : passing a subset= argument to lme|lm through ...

2009-06-04 Thread Emmanuel Charpentier
Dear list,

I have problems involving passing a subset= argument through 
I'm trying to augment the set of defined analyses for mice (homonymous
package) with a call to lme. This package create multiple imputations of
missing data in a mids object, each completed data set may be obtained
through the complete(data, set) function.

 sessionInfo()
R version 2.9.0 (2009-04-17) 
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods
base 

other attached packages:
[1] odfWeave_0.7.10 XML_2.3-0   lattice_0.17-25

loaded via a namespace (and not attached):
[1] grid_2.9.0   MASS_7.2-47  nlme_3.1-92  rpart_3.1-44
 

# First attempt : 
foo-function(formula, data, ...) {
  library(nlme)
  call - match.call()
  if (!is.mids(data))
stop(The data must have class mids)
  analyses-lapply(1:data$m, function(i) lme(formula,
 data=complete(data,i),
 ...))
  object - list(call = call, call1 = data$call, nmis = data$nmis, 
 analyses = analyses)
  oldClass(object) - c(mira, oldClass(object))
  return(object)
}
 bar1-foo(ttotal~nbpradio, data=Data1.Imp, random=~1|ctr)
 class(bar1$analyses[[1]])
[1] lme # Fine : the random= argument *HAS BEEN* passed to lme()
# through ... ; therefore it's (usually) possible.
# Further tests (not shown) show that results are reasonable (i. e. 
# compliant with expectations...).
 bar2-foo(log(ttotal)~log(nbpradio), data=Data1.Imp, random=~1|ctr,
subset=nbpradio0)
Erreur dans eval(expr, envir, enclos) : 
  ..2 utilisé dans un mauvais contexte, aucun ... où chercher
 

Further exploration show that a subset= argument *DOES NOT GET PASSED*
to lme(), which complaints furthermore not to find a ... argument.
Since lme is a method private to the nlme package, I can't trace the
source of error.

[Later : Same observation with lm() instead of lme()...]

Now, ISTR that an analogous problem *has already been discussed* on this
list, but even a prolonged use of RSiteSearch() did not retrieve it.
http://finzi.psych.upenn.edu/R/Rhelp02/archive/10139.html hints that
Modelling and graphics functions with formulas all have slightly 
different nonstandard evaluation rules, and it turns out that lme() 
doesn't evaluate extra arguments like subset in the parent environment,
but this does not tell me why a ... argument is *not* found. Different
frame of evaluation ?

May some kind soul point me to the right direction ? (He may even
further my shame by telling me how he retrieved the relevant
information...).

Sincerely,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can I Compile R with MS Access 2003 Developer Extensions?

2009-05-31 Thread Emmanuel Charpentier
Le dimanche 31 mai 2009 à 12:08 -0700, Felipe Carrillo a écrit :
 Hi:
 I have an application that uses MS Access as front end interacting
 with Excel,Word,Sigmaplot and R in the background. I use the programs
 to do different tasks but I mainly use R to create graphics on my
 Access forms without having to manually open R.
 To make this short, I was wondering if R can be compiled from Access
 using the Developer Extensions. I know that the MS office programs can
 be compiled with it but don't know if it can be done including R. 

Somehow, I strongly doubt it : R compilation needs tools totally alien
to Microsoft's world vision.

However, ISTR that there exists some sort of R-WinWorld bridge. Have a
look here : http://cran.miroir-francais.fr/contrib/extra/dcom/

Note that I can't vouch for it myself : I didn't touch a Windows
computer for years...

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Regression with Constraints

2009-05-27 Thread Emmanuel Charpentier
Le mardi 26 mai 2009 à 14:11 -0400, Stu @ AGS a écrit :
 Hi!
 I am a bit new to R.
 I am looking for the right function to use for a multiple regression problem
 of the form:
 
 y = c1 + x1 + (c2 * x2) - (c3 * x3)
 
 Where c1, c2, and c3 are the desired regression coefficients that are
 subject to the following constraints:
 
 0.0  c2  1.0, and
 0.0  c3  1.0

Sounds rather like an in-the-closet Bayesian problem (with a very
strange prior...). Did you consider to submit it to WinBUGS (or JAGS) ?

If you still want a direct optimization, you could have started :

RSiteSearch(optimization constraint)

Which would have quickly led you to ask :

? constrOptim

 y, x1, x2, and x3 are observed data.  
 I have a total of 6 rows of data in a data set.

??? I that's real-life data, I wonder what kind of situation forces you
to estimate 3+1 parameters (c1, c2, c3 and the residual, which is not
really a parameter) with 6 data points ? Your problem can be written as
a system of 6 linear equations with 3 unknowns (c1, c2, c3), leaving you
room to search in (a small piece of) R^3 (the residual is another way to
express your objective function, not an independent parameter).

Of course, if it's homework, get lost !

Emmanuel Charpentier

 Is optim in the stats package the right function to use?
 Also, I can't quite figure out how to specify the constraints.
 Thank you!
 
 -Stu
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Regression with Constraints

2009-05-27 Thread Emmanuel Charpentier

$counts
function gradient 
  12   12 

$convergence
[1] 0

$message
[1] CONVERGENCE: REL_REDUCTION_OF_F = FACTR*EPSMCH

 
 system.time(D1.bound-optim(par=c(c1=0.5, c2=0.5, c3=0.5),
+   fn=objfun,
+   gr=objgrad,
+   data=Data1,
+   method=L-BFGS-B,
+   lower=c(-Inf,0,0),
+   upper=c(Inf,1,1)))
utilisateur système  écoulé 
  0.004   0.000   0.004 
 
 D1.bound
$par
c1 c2 c3 
-0.5052117  0.2478706  0.7626612 

$value
[1] 12.87678

# About the same values as for the unbounded case. Okay

$counts
function gradient 
   99 

$convergence
[1] 0

$message
[1] CONVERGENCE: REL_REDUCTION_OF_F = FACTR*EPSMCH

 
 # The real test : what will be found when the real objective is out
of
 # bounds ?
 
 system.time(D2.bound-optim(par=c(c1=0.5, c2=0.5, c3=0.5),
+   fn=objfun,
+   gr=objgrad,
+   data=Data2,
+   method=L-BFGS-B,
+   lower=c(-Inf,0,0),
+   upper=c(Inf,1,1)))
utilisateur système  écoulé 
  0.004   0.000   0.004 
 
 D2.bound
$par
   c1c2c3 
2.0995442 0.2192566 0.000 

# The optimizer bangs is pretty little head on the c3 wall. c1 is out of
# the picture, but c2 happens to be not so bad...

$value
[1] 14.4411

# The residual is (as expected) quite larger than in the unbound case...

$counts
function gradient 
   88 

$convergence
[1] 0

$message
[1] CONVERGENCE: REL_REDUCTION_OF_F = FACTR*EPSMCH

# ... and that's not a computing anomaly.

This also illustrates the hubris necessary to fit 3 coefficients on 6
data points...

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to google for R stuff?

2009-05-20 Thread Emmanuel Charpentier
Le mercredi 20 mai 2009 à 09:02 -0400, Kynn Jones a écrit :
 Hi!  I'm new to R programming, though I've been programming in other
 languages for years.
 
 One thing I find most frustrating about R is how difficult it is to use
 Google (or any other search tool) to look for answers to my R-related
 questions.  With languages with even slightly more distinctive names like
 Perl, Java, Python, Matlab, OCaml, etc., usually including the name of the
 language in the query is enough to ensure that the top hits are relevant.
  But this trick does not work for R, because the letter R appears by itself
 in so many pages, that the chaff overwhelms the wheat, so to speak.
 
 So I'm curious to learn what strategies R users have found to get around
 this annoyance.

ISTR having this question or very close ones at least thrice in the last
two months.

Time for a FAQ entry ? (It does not seem to exist : I checked...)

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to save R clean sessions in BATCH mode?

2009-05-18 Thread Emmanuel Charpentier
Le samedi 16 mai 2009 à 17:21 +0200, mcnda...@mncn.csic.es a écrit :
 Thanks a lot for all of you that have reply me about opening and  
 ending R workspaces in BATCH mode. However replies were a king general  
 and I’m afraid I could not take the entire message from them.  
 Therefore I chose to expose here a representative fraction of my work.
 
 I have 50 Rdata files (F1,F2,F3,F4,…,F50) with objects inside.
 I need to:
 
 open F1:
 - perform some simple operations with the objects
 - export the solution with write.table
 - end F1 session
 open F2
 repeat procedures as F1
 …
 open F50
 repeat procedures as F1
 …
 
 My difficulty here is to end a workspace and open one from the scratch  
 to avoid mixing files from consecutive worksessions, and thus using R  
 memory unnecessarily. I could use rm() to delete objects from the  
 previous sessions but it seems not an efficient task.

And re-loading R, rebuilding a whole process context, re-allocating
memory is an efficient one ?

Hah !

 Any suggestions on how to perform this in Batch Mode? An examplified  
 help would be nice!

Why not encapsulate your procedures in a function taking the filename as
its argument and loopîng on the filenames list ? Anything created in the
function, being local to the function, will be (efficiently) cleaned up
at the function exit. Magic...

Exemple :

 ls()
character(0)
 Foo-runif(10,0,1)
 ls()
[1] Foo
 ?save.image
 save.image(Foo1.RData)
 ls()
[1] Foo
 rm(list=ls())
 Foo-letters[round(runif(10,min=1,max=26))]
 Foo
 [1] v m b y g u r f y q
 save.image(Foo2.RData)
 rm(list=ls())
 bar-edit()
bar-edit()
Waiting for Emacs...
 bar
function(filename) {
  load(file=filename)
  print(ls())
  print(Foo)
  invisible(NULL)
}
 ls()
[1] bar
 bar(Foo1.RData)
[1] filename Foo # Note : by default, ls() list the function's
# environment, not the global one... ** no bar here...
 [1] 0.8030422 0.6326055 0.8188481 0.6161665 0.5917206 0.6631358
0.7290200
 [8] 0.2970315 0.2016259 0.4473244
 ls()
[1] bar   # Bar is still in the global environment...
 bar(Foo2.RData)
[1] filename Foo 
 [1] v m b y g u r f y q
 ls()
[1] bar
 

Good enough for you ?

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Do you use R for data manipulation?

2009-05-11 Thread Emmanuel Charpentier
Le lundi 11 mai 2009 à 23:20 +0800, ronggui a écrit :

[ Snip... ]

  But, at least in my trade, the ability to handle Excel files is a must
  (this is considered as a standard for data entry. Sigh ...).

[ Re-snip... ]

 I don't think Excel is  a standard tool for data entry. Epidata entry
 is much more professional.

Irony squared ?

This *must* go in the fortunes file !
Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Do you use R for data manipulation?

2009-05-06 Thread Emmanuel Charpentier
Le mercredi 06 mai 2009 à 00:22 -0400, Farrel Buchinsky a écrit :
 Is R an appropriate tool for data manipulation and data reshaping and data
 organizing? 
[ Large Snip ! ... ]

Depends on what you have to do.

I've done what can be more or less termed data management with almost
uncountable tools (from Excel (sigh...) to R with SQL, APL, Pascal, C,
Basic (in 1982 !), Fortran and even Lisp in passing...).

SQL has strong points : join is, to my tastes, more easily expressed in
SQL than in most languages, projection and aggregation are natural.

However, in SQL, there is no natural ordering of row tables, which
makes expressing algorithms using this order difficult. Try for example
to express the differences of a time series ... (it can be done, but it
is *not* a pretty sight).

On the other hand, R has some unique expressive possibilities (reshape()
comes to mind).

So I tend to use a combination of tools : except for very small samples,
I tend to manage my data in SQL and with associated tools (think data
editing, for example ; a simple form in OpenOffice's Base is quite easy
to create, can handle anything for which an ODBC driver exists, and
won't crap out for more than a few hundreds line...). finer manipulation
is usually done in R with  native tools and sqldf.

But, at least in my trade, the ability to handle Excel files is a must
(this is considered as a standard for data entry. Sigh ...). So the
first task is usually a) import data in an SQL database, and b) prepare
some routines to dump SQL tables / R dataframes in Excel tor returning
back to the original data author...

HTH

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with lme4 model specification

2009-05-06 Thread Emmanuel Charpentier
Le mercredi 06 mai 2009 à 15:21 -0700, boeinguy2 a écrit :
 I am new to R and am trying to specify a model for mixed model analysis. 
 When I run the following model I get an error:
 
 AAT- lmer(Y ~ S + A + (1|S:A/H), data=AT, REML=True)
 
 The error looks like this:
 
 Error in Spar_loc:`:` : NA/NaN argument
 In addition: Warning messages:
 1: In model.matrix.default(mt, mf, contrasts) :
   variable 'Spar_loc' converted to a factor

This is suspicious : Spar_loc doesn't appear in your statement...

 2: In Spar_loc:`:` :
   numerical expression has 720 elements: only the first used

Ditto...

 I am having trouble specifying th random component.  It should reflect the
 random term as H nested within the interaction of S  A.  What am I doing
 wrong?

The precedence of modelling operators is not very clear (and does not
seem to be formally defined in the standard documentation...). In
doubt, use parentheses.

So try ...+(1|(S:A)/H), just in case...

You might also try to define it outside the model by adding variables :

AT-within(AT,{SA=S:A, SAH=SA:H[,drop=TRUE]})
AAT-lmer(y~S+A+(1|SAH),data=AT,REML=TRUE)

 :confused:

Enlightened ?

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple imputations : wicked dataset. Need advice for follow-up to a possible solution.

2009-04-27 Thread Emmanuel Charpentier
Answering to myself (for future archive users' sake), more to come
(soon) :

Le jeudi 23 avril 2009 à 00:31 +0200, Emmanuel Charpentier a écrit :
 Dear list,
 
 I'd like to use multiple imputations to try and save a somewhat badly
 mangled dataset (lousy data collection, worse than lousy monitoring, you
 know that drill... especially when I am consulted for the first time
 about one year *after* data collection).
 
 My dataset has 231 observations of 53 variables, of which only a very
 few has no missing data. Most variables have 5-10% of missing data, but
 the whole dataset has only 15% complete cases (40% when dropping the 3
 worst cases, which might be regained by other means).

[ Big snip ... ]

It turns out that my problems were caused by ... the dataset. Two very
important variables (i. e. of strong influence on the outcomes and
proxies) are ill-distributed :
- one is a modus operandi (two classes)
- the second is center (23 classes, alas...)

My data are quite ill-distributed : some centers have contributed a
large number of observations, some other very few. Furthermore, while
few variables are quite badly known, the missingness pattern is such
as :
- some centers have no directly usable information (= complete cases)
under one of the modi operandi
- some other have no complete case at all...

Therefore, any model-based prediction method using the whole dataset
(recommended for multiple imputations, since one should not use for
inference a richer set of data than what was imputed (seen this
statement in a lot of references)) fails miserably.

Remembering some fascinating readings (incl. VR) and an early (20 years
ago) excursion in AI (yes, did that, didn't even got the T-shirt...), I
have attempted (with some success) to use recursive partitioning for
prediction. This (non-)model has some very interestind advantages in my
case :
- model-free
- distribution-free (quite important here : you should see my density
curves... and I won't speak about the outliers !)
- handles missing data gracefully (almost automagically)
- automatic selection and ranking of the pertinent variables
- current implementation in R has some very nice features, such as
surrogate splits if a value is missing, auto-pruning by
cross-validation, etc ...

It has also some drawbacks :
- no (easy) method for inference
- not easy to abstract (you can't just publish an ANOVA table and a
couple of p-values...)
- no well-established (i. e. acknowledged by journal reviewers) =
difficult to publish

These latter point do not bother me in my case. So I attempted to use
this for imputation.

Since mice is based on a chained equations approach and allows the
end-user to write its own imputation functions, I wrote a set of such
imputers to be called within the framework of the Gibbs sampler. They
proceed as follow :
- build a regression or classification tree of the relevant variable
using the rest of the dataset
- predict the relevant variable for *all* the dataset,
- compute residuals from known values of the relevant variable and
their prediction
- impute values to missing data as prediction + a random residual.

It works. It's a tad slower than prediction using
normal/logistic/multinomial modelling (about a factor of 3, but for y
first trial, I attempted to err on the side of excessive precision ==
deeper trees). It does not exhibit any obvious statistical
misfeatures.

But I have questions :

1) What is known of such imputation by regression/classification trees
(aka recursive partitionning) ? A quick research didn't turn up very
much : the idea has been evoked here and there, but I am not aware of
any published solution. In particular, I have no knowledge of any
theoretical (i. e. probability) wotrk on their properties ?

2) Where could I find published datasets having been used to validate
other imputation methods ?

3) Do you think that these functions should be published ?

Sincerely yours,

Emmanuel Charpentier

PS :

 Could someone offer an explanation ? Or must I recourse to sacrifying a
 black goat at midnight next new moon ?
 
 Any hint will be appreciated (and by the way, next new moon is in about
 3 days...).

The goat has endured the fear of her life, but is still alive... will
probably start worshipping the Moon, however.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rounding problem

2009-04-27 Thread Emmanuel Charpentier
Seconded !

Emmanuel Charpentier

Le lundi 02 mars 2009 à 21:06 -0700, Greg Snow a écrit :
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of Prof Brian Ripley
  Sent: Monday, March 02, 2009 12:38 AM
  To: tedzzx
  Cc: r-help@r-project.org
  Subject: Re: [R] rounding problem
  
 
 I nominate the following for the fortunes package:
 
  R is Open Source and so you can modify it to emulate the bugs in
  other software: that is not one of the aims of its developers so
  please don't expect us to do so for you.
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loess over split data

2009-04-25 Thread Emmanuel Charpentier
Dear Luc,

I stumbled on your unanswered question only this morning. Sorry for this
lateness...

Le jeudi 23 avril 2009 à 15:56 -0400, Luc Villandre a écrit :
 Dear R users,
 
 I am having trouble devising an efficient way to run a loess() function 
 on all columns of a data.frame (with the x factor remaining the same for 
 all columns) and I was hoping that someone here could help me fix my 
 code so that I won't have to resort to using a for loop.
 
 (You'll find the upcoming lines of code in a single block near the end 
 of the message.)
 
 Here's a small sample of my data :

 my.data=data.frame(birth.vec=c(274, 259, 288, 284, 276, 274, 284, 288, 
 273, 278), final.weight= c(2609.328, 2955.701, 4159.837, 4591.258, 
 5144.559, 2877.159, 4384.498, 3348.454, 3323.391, 2918.055)) ;
 
 birth.vec is in days and final.weight is in grams. Now, what I'm 
 trying to do is to output smoothed curves for the 10th, 50th and 90th 
 percentiles over completed weeks. In other words, I transform 
 birth.vec into a factor vector of completed weeks with
 
 completed.weeks = as.factor(floor(my.data$birth.vec/7)) ;
 
 I use these factors to get weight quantiles by completed weeks with
 
 quan.list = tapply(my.data$final.weight,INDEX = 
 completed.weeks,FUN=quantile,probs=c(0.1,0.5,0.9)) ;
 
 I then collapse the list into a data.frame with
 
 quan.frame = data.frame(do.call(rbind,quan.list)) ;
 
 Now comes the time to apply the loess() function to each percentile 
 curve (i.e. to each column in quan.frame). Note that the x values for 
 all percentile columns (i.e. X10., X50., X90.)
 
 x.values = as.numeric(rownames(quan.frame)) ;
 
 Once again, using tapply() (and transposing beforehand):
 
 t.quan.frame = t(quan.frame) ;
 
 ## The following command doesn't work
 
 smooth.result = 
 tapply(t.quan.frame,INDEX=as.factor(rep(1:nrow(t.quan.frame),each=ncol(t.quan.frame))),FUN=loess)
  
 ;
 
 The formula argument of loess() is of course missing, since I have no 
 idea how I could specify it in a way that the function could understand it.

Why not tapply(yadda, yadda, function(x)loess(whatever you mean, x is
data...)) ?

But the whole idea seems ... bizarre :

1) converting time to a *class* variable destroys any ordinal relation
between different times ; a curve has no meaning. At the minimum, keep
this variable numeric.

2) You lose any notion of individual data (e. g. weight of each week).

3) Do you have reason to expect empirical quantiles very much different
of asymptotic quantiles (modulo a possible variable transformation) ?
That forces you to coarsen your data, which is rarely a good idea...
If no, the asymptotic IC80 curves can be obtained much cheaper. Modulo
some possible typos :

my.loess-loess(final.weight~birth.vec,data=my.data)
my.pred-predict(my.loess,
 new.data=data.frame(birth.vec=my.pred.times-with(my.data,
   
seq(min(birth.weight),
   
max(birth.weight,
 se=TRUE)
my.curve=with(my.pred,
  data.frame(time=my.pred.times,
 center=fit,
 lb=fit+qnorm(0.1)*se.fit,
 ub=fit+qnorm(0.9*se.fit))

# Graphics *will* help :

with(my.curve,{
 plot(x=range(time),
  y=range(center,lb,ub),
  type=n,
  xlab=Achieved time (days),
  ylab=Final weight (g),
  main=A suspicious loess() fit);
 lines(final.weight~birth.vec, type=p, data=my.data) ;
 lines(center~time, lty=1) ;
 lines(lb~time, lty=3) ;
 lines(ub~time, lty=3)
   })

Adding your empirical centiles to this graph might help...

NB : on your example data set, there is a big gap between the earliest
point and the rest, thus suspicious predictions in the gap (lower bound
is *negative* ! ) ; maybe a variable change (log ?) might be in order ?
This can be assessed only on the full data.

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Imputation in mice/norm

2009-04-25 Thread Emmanuel Charpentier
Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit :
 I'm trying to use either mice or norm to perform multiple imputation to fill
 in some missing values in my data.  The data has some missing values because
 of a chemical detection limit (so they are left censored).  I'd like to use
 MI because I have several variables that are highly correlated.  In SAS's
 proc MI, there is an option with which you can limit the imputed values that
 are returned to some range of specified values.  Is there a way to limit the
 values in mice?  

You may do that by writing your own imputation function and assign them
for the imputation of particular variable (see argument
imputationMethod and details in the man page for mice).

  If not, is there another MI tool in R that will allow me to
 specify a range of acceptable values for my imputed data?

In the function amelia (package Amelia), you might specify a bounds
argument, which allows for such a limitation. However, be aware that
this might destroy the basic assumption of Amelia, which is that your
data are multivariate normal. Maybe a change of variable is in order (e.
g. log(concentration) has usually much better statistical properties
than concentration).

Frank Harrell's aregImpute (package Hmisc) has the curtail argument
(TRUE by default) which limits imputations to the range of observed
values.

But if your left-censored variables are your dependent variables (not
covariates), may I suggest to analyze these data as censored data, as
allowed by Terry Therneau's coxph function (package survival) ? code
your missing data as such a variable (use :
coxph(Surv(min(x,yourlimit,na.rm=TRUE),
   !is.na(x),type=left)~Yourmodel) to do this on-the-fly).

Another possible idea is to split your (supposedly x) variable in two :
observed (logical), and value (observed value if observed, detection
limit if not) and include these two data in your model. You probably
will run into numerical difficulties due to the (built-in total
separation...).

HTH,

Emmanuel Charpentier

 Thanks for the help,
 Todd
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Imputation in mice/norm

2009-04-25 Thread Emmanuel Charpentier
Danke sehr, herr Professor ! This one escaped me (notably because it's a
trifle far from my current interests...).

Emmanuel Charpentier

Le samedi 25 avril 2009 à 08:25 -0500, Frank E Harrell Jr a écrit :
 Emmanuel Charpentier wrote:
  Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit :
  I'm trying to use either mice or norm to perform multiple imputation to 
  fill
  in some missing values in my data.  The data has some missing values 
  because
  of a chemical detection limit (so they are left censored).  I'd like to use
  MI because I have several variables that are highly correlated.  In SAS's
  proc MI, there is an option with which you can limit the imputed values 
  that
  are returned to some range of specified values.  Is there a way to limit 
  the
  values in mice?  
  
  You may do that by writing your own imputation function and assign them
  for the imputation of particular variable (see argument
  imputationMethod and details in the man page for mice).
  
   If not, is there another MI tool in R that will allow me 
  to
  specify a range of acceptable values for my imputed data?
  
  In the function amelia (package Amelia), you might specify a bounds
  argument, which allows for such a limitation. However, be aware that
  this might destroy the basic assumption of Amelia, which is that your
  data are multivariate normal. Maybe a change of variable is in order (e.
  g. log(concentration) has usually much better statistical properties
  than concentration).
  
  Frank Harrell's aregImpute (package Hmisc) has the curtail argument
  (TRUE by default) which limits imputations to the range of observed
  values.
  
  But if your left-censored variables are your dependent variables (not
  covariates), may I suggest to analyze these data as censored data, as
  allowed by Terry Therneau's coxph function (package survival) ? code
  your missing data as such a variable (use :
  coxph(Surv(min(x,yourlimit,na.rm=TRUE),
 !is.na(x),type=left)~Yourmodel) to do this on-the-fly).
  
  Another possible idea is to split your (supposedly x) variable in two :
  observed (logical), and value (observed value if observed, detection
  limit if not) and include these two data in your model. You probably
  will run into numerical difficulties due to the (built-in total
  separation...).
  
  HTH,
  
  Emmanuel Charpentier
  
  Thanks for the help,
  Todd
 
 
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 All see
 
 @Article{zha09non,
author =   {Zhang, Donghui and Fan, Chunpeng and Zhang, 
 Juan and Zhang, {Cun-Hui}},
title ={Nonparametric methods for measurements below 
 detection limit},
journal =  Stat in Med,
year = 2009,
volume =   28,
pages ={700-715},
annote =   {lower limit of detection;left censoring;Tobit 
 model;Gehan test;Peto-Peto test;log-rank test;Wilcoxon test;location 
 shift model;superiority of nonparametric methods}
 }
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple imputations : wicked dataset ? Wicked computers ? Am I cursed ? (or stupid ?)

2009-04-22 Thread Emmanuel Charpentier
 
 

The textbook example (example for mice) runs almost perfectly
(warnings in the second example)

Package mix (Joseph Schafer, maintained by Brian Ripley) :

 # mix needs dataset columns sorted and counted
 p-ncol(DTM-DataTest[sapply(DataTest,is.factor)])
 DTM-cbind(DTM,DataTest[!sapply(DataTest,is.factor)])
 # Preliminary grouping and sorting
 system.time(s-prelim.mix(DTM,p))
   user  system elapsed 
  0.012   0.000   0.009 
Warning messages:
1: In storage.mode(w) - integer :
  NAs introduits lors de la conversion automatique
2: In max(w[!is.na(w[, j]), j]) :
  aucun argument pour max ; -Inf est renvoyé
3: NAs introduits lors de la conversion automatique 
4: In max(w[!is.na(w[, j]), j]) :
  aucun argument pour max ; -Inf est renvoyé
5: NAs introduits lors de la conversion automatique 
6: In max(w[!is.na(w[, j]), j]) :
  aucun argument pour max ; -Inf est renvoyé
7: NAs introduits lors de la conversion automatique 
 # Parameter estimation
 system.time(thetahat-em.mix(s,showits=TRUE))
Erreur dans rep(prior, s$ncells) : argument 'times' incorrect
Timing stopped at: 0 0 0 
### indeed : s$ncells is NA !
### Therefore, the rest crashes :
 # Data augmentation proper
 system.time(newtheta - da.mix(s, thetahat, steps=100, showits=TRUE))
Erreur dans rep(prior, s$ncells) : argument 'times' incorrect
Timing stopped at: 0 0 0 
 # Single imputation...
 system.time(ximp1 - imp.mix(s, newtheta))
Erreur dans imp.mix(s, newtheta) : objet 'newtheta' introuvable
Timing stopped at: 0.004 0 0.001 
 

The textbook example (example for da.mix) runs perfectly (and fast !).

=

I might be particularly stupid and misunderstanding manual and the
textbook examples of one or two packages, but five !

Visual/manual/graphical examination of my dataset does not suggest an
explanation.

I am not aware of anything fishy in my setup (one of the machines is a
bit aging, but the one used for the reported tests is almost new.

Could someone offer an explanation ? Or must I recourse to sacrifying a
black goat at midnight next new moon ?

Any hint will be appreciated (and by the way, next new moon is in about
3 days...).

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graph with 15 combinations

2009-04-20 Thread Emmanuel Charpentier
Le lundi 20 avril 2009 à 21:04 +0200, Penner, Johannes a écrit :
 Dear R helpers,
 
 I have a data set with 4 types (W, C, E  S). Now I have values for all
 types plus all possible combinations (the order is unimportant): W, C,
 WC, E, WE, CE, WCE, S, WS, CS, WCS, ES, WES, CES  WCES. Ideally I would
 like to represent everything in one graph and as concise as possible.
 Drawing 4 circles and depicting it as overlap just gives me 13 out of
 the 15 possibilities needed (as e.g. depicted here
 http://www.psy.ritsumei.ac.jp/~akitaoka/classic9e.html in the graph
 Four circles surrounding illusion).
 
 Does anybody has a nice solution, ideally with a possible solution in R?

Plot on a torus. Should be trivial in R once you've found the torus
feeder for your printer... :-)

Emmanuel Charpentier

 Thanks in advance!
 Johannes
 
 --
 Project Coordinator BIOTA West Amphibians
 
 Museum of Natural History
 Dep. of Research (Herpetology)
 Invalidenstrasse 43
 D-10115 Berlin
 Tel: +49 (0)30 2093 8708
 Fax: +49 (0)30 2093 8565
 
 http://www.biota-africa.org
 http://community-ecology.biozentrum.uni-wuerzburg.de
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Modelling an incomplete Poisson distribution ?

2009-04-19 Thread Emmanuel Charpentier
Dear Ben,

Le samedi 18 avril 2009 à 23:37 +, Ben Bolker a écrit :
 Emmanuel Charpentier charpent at bacbuc.dyndns.org writes:
 
  
  I forgot to add that yes, I've done my homework, and that it seems to me
  that answers pointing to zero-inflated Poisson (and negative binomial)
  are irrelevant ; I do not have a mixture of distributions but only part
  of one distribution, or, if you'll have it, a zero-deflated Poisson.
  
  An answer by Brian Ripley
  (http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar
  question leaves me a bit dismayed : if it is easy to compute the
  probability function of this zero-deflated RV (off the top of my head,
  Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm
  (still) able to use optim to ML-estimate lambda, using this to
  (correctly) model my problem set and test it amounts to re-writing some
  (large) part of glm. Furthermore, I'd be a bit embarrassed to test it
  cleanly (i. e. justifiably) : out of the top of my head, only the
  likelihood ration test seems readily applicable to my problem. Testing
  contrasts in my covariates ... hum !
  
  So if someone has written something to that effect, I'd be awfully glad
  to use it. A not-so-cursory look at the existing packages did not ring
  any bells to my (admittedly untrained) ears...
  
  Of course, I could also bootstrap the damn thing and study the
  distribution of my contrasts. I'd still been hard pressed to formally
  test hypotheses on factors...
  
 
   I would call this a truncated Poisson distribution, related
 to hurdle models.  You could probably use the hurdle function
 in the pscl package to do this, by ignoring the fitting to
 the zero part of the model.  On the other hand, it might break
 if there are no zeros at all (adding some zeros would be a
 pretty awful hack/workaround).

Indeed ... 

   If you defined a dtpoisson() for the distribution of the
 truncated Poisson model, you could probably also use bbmle
 with the formula interface and the parameters argument.

This I was not aware of... Thank you for the tip !

By the way, I never delved into stats4, relying (erroneously) on its
one-liner description in CRAN, which is (more or less) an implementation
of stats with S4 classes. Therefore mle escaped me also... May I suggest
a better one-liner ? (This goes also for bbmle...)

   The likelihood ratio test seems absolutely appropriate for
 this case.  Why not?

Few data, therefore a bit far from the asymptotic condition...

Anyway, a better look at my data after discovering a bag'o bugs in the
original files led me to reconsider my model : I wanted to assess, among
other things, the effect of a boolean (really a two-class variable).
After the *right* graph, I now tend to think that my distribution is
indeed zero-deflated Poisson in one group and ... zero-deflated negative
binomial in the other (still might be zero-deflated Poisson with very
small mean, hard to say graphically...). Which gives me some insights on
the mechanisms at work (yippeee !!) but will require some nonparametric
beast for assessing central value differences (yes, this still has a
meaning...), such as bootstrap.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Modelling an incomplete Poisson distribution ?

2009-04-18 Thread Emmanuel Charpentier
Dear list,

I have the following problem : I want to model a series of observations
of a given hospital activity on various days under various conditions.
among my outcomes (dependent variables) is the number of patients for
which a certain procedure is done. The problem is that, when no relevant
patient is hospitalized on said day, there is no observation (for which
the number of patients item would be 0). 

My goal is to model this number of patients as a function of the
various conditions described by my independant variables, mosty of
them observed but uncontrolled, some of them unobservable (random
effects). I am tempted to model them along the lines of :

glm(NoP~X+Y+..., data=MyData, family=poisson(link=log))

or (accounting for some random effects) :

lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log))

While the preliminary analysis suggest that (the right part of) a
Poisson distribution might be reasonable for all real observations, the
lack of observations with count==0 bothers me.

Is there a way to cajole glm (and lmer, by the way) into modelling these
data to an incomplete Poisson model, i. e. with unobserved 0
values ?

Sincerely,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Modelling an incomplete Poisson distribution ?

2009-04-18 Thread Emmanuel Charpentier
I forgot to add that yes, I've done my homework, and that it seems to me
that answers pointing to zero-inflated Poisson (and negative binomial)
are irrelevant ; I do not have a mixture of distributions but only part
of one distribution, or, if you'll have it, a zero-deflated Poisson.

An answer by Brian Ripley
(http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar
question leaves me a bit dismayed : if it is easy to compute the
probability function of this zero-deflated RV (off the top of my head,
Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm
(still) able to use optim to ML-estimate lambda, using this to
(correctly) model my problem set and test it amounts to re-writing some
(large) part of glm. Furthermore, I'd be a bit embarrassed to test it
cleanly (i. e. justifiably) : out of the top of my head, only the
likelihood ration test seems readily applicable to my problem. Testing
contrasts in my covariates ... hum !

So if someone has written something to that effect, I'd be awfully glad
to use it. A not-so-cursory look at the existing packages did not ring
any bells to my (admittedly untrained) ears...

Of course, I could also bootstrap the damn thing and study the
distribution of my contrasts. I'd still been hard pressed to formally
test hypotheses on factors...

Any ideas ?

Emmanuel Charpentier

Le samedi 18 avril 2009 à 19:28 +0200, Emmanuel Charpentier a écrit :
 Dear list,
 
 I have the following problem : I want to model a series of observations
 of a given hospital activity on various days under various conditions.
 among my outcomes (dependent variables) is the number of patients for
 which a certain procedure is done. The problem is that, when no relevant
 patient is hospitalized on said day, there is no observation (for which
 the number of patients item would be 0). 
 
 My goal is to model this number of patients as a function of the
 various conditions described by my independant variables, mosty of
 them observed but uncontrolled, some of them unobservable (random
 effects). I am tempted to model them along the lines of :
 
 glm(NoP~X+Y+..., data=MyData, family=poisson(link=log))
 
 or (accounting for some random effects) :
 
 lmer(NoP~X+Y+(X|Center)), data=Mydata, family=poisson(link=log))
 
 While the preliminary analysis suggest that (the right part of) a
 Poisson distribution might be reasonable for all real observations, the
 lack of observations with count==0 bothers me.
 
 Is there a way to cajole glm (and lmer, by the way) into modelling these
 data to an incomplete Poisson model, i. e. with unobserved 0
 values ?
 
 Sincerely,
 
   Emmanuel Charpentier
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] F test

2009-04-16 Thread Emmanuel Charpentier
Le jeudi 16 avril 2009 à 14:08 -0300, Mike Lawrence a écrit :
 summary(my_lm) will give you t-values, anova(my_lm) will give you
 (equivalent) F-values.

Ahem. Equivalent, my tired foot...

In simple terms (the real real story may be more intricate) :

The F values stated by anova are something entierely different of t
values in summary. The latter allow you to assess properties of *one*
coefficient in your model (namely, do I have enough suport to state that
it is nonzero ?). The former allows you to assess whether you have
support for stating that *ALL* the coefficient related to the same
factor cannot be *SIMULTANEOUSLY* null. Which is a horse of quite
another color...

By the way : if your summary indeed does give you the mean^K^K an
unbiased estimate of your coefficient and an (hopefully) unbiased
estimate of its standard error, the F ration is the ratio of estimates
of remaining variabilities with and without the H0 assumption it
tests, that is that *ALL* coefficients of your factor of interest are
*SIMULTANEOUSLY* null.

F and t numbers will be equivalent if and only if your factor of
interest needs only one coefficient to get expressed, i. e. is a
continuous covariable or a two-class discrete variable (such as
boolean). In this case, you can test your factor either by the t value
which, under H0, fluctuates as a Student's t with n_res dof (n_res being
the residual degrees of freedom of the model) or by the F value, which
will fluctuate as a Fisher F statistic with 1 and n_res dof, which
happens (but that's not happenstance...) to be the *square* of a t with
n_dof.

May I suggest consulting a textbook *before* flunking ANOVA 101 ?

Emmanuel Charpentier

  summary() might be preferred because it also
 provides the estimates  SE.
 
  a=data.frame(dv=rnorm(10),iv1=rnorm(10),iv2=rnorm(10))
  my_lm=lm(dv~iv1*iv2,a)
  summary(my_lm)
 
 Call:
 lm(formula = dv ~ iv1 * iv2, data = a)
 
 Residuals:
 Min  1Q  Median  3Q Max
 -1.8484 -0.2059  0.1627  0.4623  1.0401
 
 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)  -0.4864 0.4007  -1.2140.270
 iv1   0.8233 0.5538   1.4870.188
 iv2   0.2314 0.3863   0.5990.571
 iv1:iv2  -0.4110 0.5713  -0.7190.499
 
 Residual standard error: 1.017 on 6 degrees of freedom
 Multiple R-squared: 0.3161,   Adjusted R-squared: -0.02592
 F-statistic: 0.9242 on 3 and 6 DF,  p-value: 0.4842
 
  anova(my_lm)
 Analysis of Variance Table
 
 Response: dv
   Df Sum Sq Mean Sq F value Pr(F)
 iv11 1.9149  1.9149  1.8530 0.2223
 iv21 0.4156  0.4156  0.4021 0.5494
 iv1:iv21 0.5348  0.5348  0.5175 0.4990
 Residuals  6 6.2004  1.0334
 
 
 On Thu, Apr 16, 2009 at 10:35 AM, kayj kjaj...@yahoo.com wrote:
 
  Hi,
 
 
  How can I find the p-value for the F test for the interaction terms in a
  regression linear model lm ?
 
  I appreciate your help
 
 
  --
  View this message in context: 
  http://www.nabble.com/F-test-tp23078122p23078122.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] F test

2009-04-16 Thread Emmanuel Charpentier
Le jeudi 16 avril 2009 à 13:00 -0500, Jun Shen a écrit :
 Mike,
 
 I kind of have the same question. What if for a mixed effect model, say
 using lme(), how to specify the interaction effect (between a fixed effect
 and a random effect)? 

With lme, you have to specify a *list* of random effects as the
random= argument, which, off the top of my (somewhat tired) limbic
system,  is specified by an lmList, iirc. But I'm pretty sure that
?lme will give you much better information...

lmer is simpler. Say you basic model is Z~Y with another variable X
acting as a possible source of random variation. You express random
effects by adding (1|X) terms (meaning X is a simple (intercept) random
effect) to the model, or (Y|X), which specifies both an intercept random
effect of X and an effect of the slope of the (supposed) line binding Y
to the independent variable. If you want to specify a possible variation
of slopes with no possible intercept effect, say Z~Y+(0+Y|X) (unless
I'm mistaken : I never used that, because I never had to work on such a
model in a context where it would make sense...).

 and where to find the result of the interaction?

Ah. That's the (multi-)million dollars question. You're dealing with a
fixed*random interaction, which has a totally different meaning of a
fixed*fixed interaction. The test of the latter means does the
(possible) effect of the first factor vary with levels of the second
factor ?, whereas the second reads as does the random factor increase
variability about what I know of the effect of the fixed factor ?, with
totally different consequences.

Pinhero and Bates' book (2000), which among other thing, describes the
care, feeding and drinks of lme, give explanations about 1) why the
problem is computationnaly hard 2) how to get an approximate answer and
3) in which proportion previous advice might be misleading. This book
is, IMHO, required reading for anybody with more than a passing interest
on the subject, and I won't paraphrase it...

Since then, Pr Bates started to develop lme4, which has different inner
algorithms. In the cours of this work, he started to have doubts about
the conventional wisdom about testing effects in mixed models, and did
not (yet) provide these conventionals means of testing. Instead, he
seems to work along the lines of MCMC sampling to assess various aspects
of mixed models. But there I suggest to walk along the R-SIG-mixed-model
archives, which are *very* interesting.

Of course, if you want an authoritative answer (i. e. an answer that
will pass a medical journal's reviewer unquestioned), you can always use
SAS' proc mixed. But I wouldn't swear this answer is exact, or even
sensible, as far as I can judge...

Pr Bates seems to answer readily any (sensible) questions on the ME
mailing list, where you will also find folks much more authorized than
yours truly to answer this and that question...

HTH,

Emmanuel Charpentier

 Thanks.
 
 Jun
 
 On Thu, Apr 16, 2009 at 12:08 PM, Mike Lawrence mike.lawre...@dal.cawrote:
 
  summary(my_lm) will give you t-values, anova(my_lm) will give you
  (equivalent) F-values. summary() might be preferred because it also
  provides the estimates  SE.
 
   a=data.frame(dv=rnorm(10),iv1=rnorm(10),iv2=rnorm(10))
   my_lm=lm(dv~iv1*iv2,a)
   summary(my_lm)
 
  Call:
  lm(formula = dv ~ iv1 * iv2, data = a)
 
  Residuals:
 Min  1Q  Median  3Q Max
  -1.8484 -0.2059  0.1627  0.4623  1.0401
 
  Coefficients:
 Estimate Std. Error t value Pr(|t|)
  (Intercept)  -0.4864 0.4007  -1.2140.270
  iv1   0.8233 0.5538   1.4870.188
  iv2   0.2314 0.3863   0.5990.571
  iv1:iv2  -0.4110 0.5713  -0.7190.499
 
  Residual standard error: 1.017 on 6 degrees of freedom
  Multiple R-squared: 0.3161, Adjusted R-squared: -0.02592
  F-statistic: 0.9242 on 3 and 6 DF,  p-value: 0.4842
 
   anova(my_lm)
  Analysis of Variance Table
 
  Response: dv
   Df Sum Sq Mean Sq F value Pr(F)
  iv11 1.9149  1.9149  1.8530 0.2223
  iv21 0.4156  0.4156  0.4021 0.5494
  iv1:iv21 0.5348  0.5348  0.5175 0.4990
  Residuals  6 6.2004  1.0334
 
 
  On Thu, Apr 16, 2009 at 10:35 AM, kayj kjaj...@yahoo.com wrote:
  
   Hi,
  
  
   How can I find the p-value for the F test for the interaction terms in a
   regression linear model lm ?
  
   I appreciate your help
  
  
   --
   View this message in context:
  http://www.nabble.com/F-test-tp23078122p23078122.html
   Sent from the R help mailing list archive at Nabble.com.
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
 
  --
  Mike Lawrence
  Graduate Student
  Department of Psychology
  Dalhousie University
 
  Looking

Re: [R] Cross-platforms solution to export R graphs

2009-04-09 Thread Emmanuel Charpentier
Le jeudi 09 avril 2009 à 15:04 +0200, Philippe Grosjean a écrit :
 Hello Rusers,
 
 I have worked on a R Wiki page for solutions in exporting R graphs, 
 especially, the often-asked questions:
 - How can I export R graphs in vectorized format (EMF) for inclusion in 
   MS Word or OpenOffice outside of Windows?
 - What is the best solution(s) for post-editing/annotating R graphs.
 
 The page is at: 
 http://wiki.r-project.org/rwiki/doku.php?id=tips:graphics-misc:export.
 
 I would be happy to receive your comments and suggestions to improve 
 this document.

Well, if you insist ...

The PDF import plugin in OpenOffice is still beta, and some report deem
it difficult to install correctly an/or flaky.  Having checked that both
MSWord (=2000) and OpenOffice (=2.4) import and display correctly (i.
e. vectorially, including fonts) EPS files, I switched to this format,
most notably for use with the marvellous Max Kuhn's odfWeave package,
which is a *must* for us working in state/administrative/corporate salt
mines, where \LaTeX is deemed obscene and plain \TeX causes seizures ...
The point is that this format doesn't need any intermediary step, thus
allowing for automatisation. Be aware, however, that the embedded EPS
images are not editable in-place by OpenOffice nor, as far as I know, by
MS Word. But my point was to *avoid* post-production as much as humanly
possible (I tend to be inhumanly lazy...).

HTH,

Emmanuel Charpentier

 All the best,
 
 PhG

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT ?] rant (was : Re: Conversions From standard to metricunits)

2009-04-04 Thread Emmanuel Charpentier
Le vendredi 03 avril 2009 à 20:01 -0400, Murray Cooper a écrit :
 For science yes. For pleasure I'll still take a pint instead of 570ml!

Snip

Yes, but do you realize that you'll have to pee in fl. oz ? Aie ...

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Basic doubts on the use of several script files in R batch mode

2009-04-04 Thread Emmanuel Charpentier
Le vendredi 03 avril 2009 à 22:30 +0200, mcnda...@mncn.csic.es a écrit :
 I already searched for information regarding the batch file operations  
 within R. But I could not locate the information I need.
 
 Basically I have a doubt regarding the procedures on the batch use of  
 several script files (*.R).
 How can I do this?
 How can I define the order of files?
 
 My intention is to programme openings, math operations and savings of  
 *.Rdata sessions, but in a sequential manner.

From the top of my limbic system :

$ cat file1 file2 file3  R --vanilla --no-save
# Of course, file3 should call save.image() at some convenient place

Another, slightly more correct solution would be a batch job doing the
relevant R CMD BATCH invocations

See also the littler package, which allows to use R as a shell language.
I don't know zilch about it ...

HTH

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Discriminant Analysis - Obtaining Classification Functions

2009-04-03 Thread Emmanuel Charpentier
reauire(MASS) ; ?predict.lda should enlighten you. Glancing at VR4
might be a bit more illuminating...

HTH

Emmanuel Charpentier

Le vendredi 03 avril 2009 à 22:29 +0200, Pavel Kúr a écrit :
 Hello!
 
 I need some help with the linear discriminant analysis in R.
 I have some plant samples (divided into several groups) on which I
 measured a few quantitative characteristics. Now, I need to infer some
 classification rules usable for identifying new samples.
 I have used the function lda from the MASS library in a usual fashion:
 
 lda.1 - lda(groups~char1+char2+char3, data=xxx)
 
 I'd like to obtain the classification functions for the particular
 groups, with the aid of which I could classify unknown samples. I know
 I can use predict.lda to classify such samples, but I need to obtain
 some equations into which I could simply put the measured values of an
 unknown sample manually and the result would predict which group the
 sample most probably belongs to (like in eg. STATISTICA).
 I haven't found out how to extract these functions from the lda output.
 Could somebody give me some advice?
 
 Thank you in advance,
 
 Pavel Kur
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [OT ?] rant (was : Re: Conversions From standard to metric units)

2009-04-03 Thread Emmanuel Charpentier
Le vendredi 03 avril 2009 à 14:17 -0400, stephen sefick a écrit :
 I am starting to use R for almost any sort of calculation that I need.
  I am a biologist that works in the states, and there is often a need
 to convert from standard units to metric units.

rant

US/Imperial units are *not* standard units. The former metric system
is now called Système International (International System) for a
reason, which is *not* gallocentrism of a few 6e7 frogs, but rather
laziness of about 5.6e9 losers who refuse to load their memories with
meaningless conversion factors...

/rant

Emmanuel Charpentier
who has served his time with
pounds per cubic feet, furlongs
per fortnight, BTU and other
figments of British/American
sadistic imagination, thank you
very much...

/rant # Again, didn't work the first time...

  Is there a package in
 R for this already?  If not I believe that I am going to write some of
 the most often used in function form.  My question is should I include
 this in my StreamMetabolism package.  It is not along the same theme
 lines, but could loosely fit.  The reason that I ask is that I don't
 want to clutter CRAN with a small package containing some conversion
 functions because I am to lazy to source them into R every time that I
 use them, but I also don't want the StreamMetabolism package to turn
 into StephenMisc Fuctions.  Thoughts, comments, or suggestions would
 be appreciated.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LME as part of meta-analysis

2009-03-28 Thread Emmanuel Charpentier
Le vendredi 27 mars 2009 à 15:30 -0700, mrint a écrit :
 Hi, I'm having a problem using LME and hopefully I can get some help.  Here
 is what I'm doing:
 
 I have a model like this:
 
 yi = alpha + theta * xi + beta * zi + error, errors are normally distributed
 mean 0, var sigma^2
 
 xi and zi are generated from normal distributions within a specified range.  
 
 True values of alpha, theta, beta and sigma^2 are chosen with a specific
 mean and variane tau^2.
 
 I have a function which generates the yi from the other variables, then a
 function that does a linear regression using lm to create the estimates.  
 
 I then want to use this data to do a meta-anaylsis with the above repeated
 between 10-20 times.  Within this, I want to use lme to create estimates for
 the average true value, sample mean and average standard error for alpha,
 theta, beta and the respective tau^2 values for each of these.  For the lme
 part, I'm using this
 
 a-summary(lme(alp~1,random=~1|alp, weights=varFixed(~staalp^2)))
 
 This is the one for alpha.  This isn't producing the type of results I would
 expect, can anyone see where I'm going wrong?  

(I suppose that your simulation aims to assess a specific model)

This, and closely related subjects, have already been discussed on this
very list. To make a long story short : lme doesn't (currently) accepts
means and variances of groups as an input, it needs individual data.
Someone (that should be Wolfgang Vischbauer, but I'm almost surely
mutilating his name's spelling ; apologies, Wolfgang !) has written,
specifically for meta-regression purposes, a mima function that does
what you're requesting. Wolfgang has stated his intentions to turn this
function into a full-fledged R package (with calling conventions closer
to what other regression functions use), but the mima function
available on his site still his 2 years old 0.4 version. For further
details, look for mima or for meta-regression in the list archives.
RSiteSearch() is your friend...

However, if what you're interested with is strictly speaking a
meta-analysis of 2-samples comparisons (i. e. your theta is scalar and
your x_i are logicals), (at least) two R packages available on CRAN are
built for this purpose : rmeta and meta. Both offer separate analyses
for boolean or continuous dependent variables (i. e. y_i logical or
continuous).

If your theta is scalar but your x_i is continuous (i. e. you're
meta-analysing a single regression coefficient), both package offer a
variant for meta-analysis of effects, that might be relevant for you.

A more general solution would be to enhance the forthcoming lme4 package
to accept an alternative specification of random effects
variances-covariances, which would allow general meta-regression. But
I understand that Douglas Bates has already way too much work and not
too much time on his hands, and I doubt he might be coaxed to work in
this direction right now...

A suggestion : you might forward your question to the r-mixed-models
SIG mailing list with some profit...

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate directions between points

2009-03-28 Thread Emmanuel Charpentier
Le samedi 28 mars 2009 à 16:06 +, Wanja Mathar a écrit :
 Hi everyone,
 
 i have a matrix of points and want to calculate the direction and length of 
 each vector between the points.
 
 I can easily calculate the length with dist(testdata), but is their a 
 function that returns a matrix of the direction of the vectors between points 
 (angles 0-360°)?
 
 testdata - matrix(1:8, nrow=4, ncol=2,
 dimnames=list(c(A,B,C,D), c(x,y))

?atan2 ; ?outer

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mean of difftime vectors : code infelicity or intended behaviour ?

2009-03-17 Thread Emmanuel Charpentier
Dear list,

+ (and -) being defined for difftime class, I expected mean() to 
return something sensible. This is only half-true :

 mean(c(1:5, 5:1),na.rm=TRUE)
[1] 3
 mean(as.difftime(c(1:5, 5:1),unit=mins),na.rm=TRUE)
Time difference of 3 mins

Fine so far. However :

 mean(c(1:5, NA,5:1),na.rm=TRUE)
[1] 3
 mean(as.difftime(c(1:5, NA,5:1),unit=mins),na.rm=TRUE)
Time difference of NA mins

Ouch !

Curiously, var(), max() and min() behave as expected. What's so special 
with mean() ?

 RSiteSearch(mean difftime)
[ ... doesn't return anything relevant ]

NB : this isn't done for the hell of it. I intended to replace some 
missing dates, with something computed from other dates and mean time 
intervals).

Any thoughs ?

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mean of difftime vectors : code infelicity or intended behaviour ?

2009-03-17 Thread Emmanuel Charpentier
Oops. Clicked Send too fast (don't shoot, Brian !). I forgot :

 sessionInfo()
R version 2.8.1 (2008-12-22) 
i486-pc-linux-gnu 

locale:
LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;
LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;
LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base 

loaded via a namespace (and not attached):
[1] grid_2.8.1 lattice_0.17-20lme4_0.999375-28   
Matrix_0.999375-21
[5] nlme_3.1-90   
 


On Tue, 17 Mar 2009 06:53:20 +, Emmanuel Charpentier wrote :

 Dear list,
 
 + (and -) being defined for difftime class, I expected mean() to
 return something sensible. This is only half-true :
 
 mean(c(1:5, 5:1),na.rm=TRUE)
 [1] 3
 mean(as.difftime(c(1:5, 5:1),unit=mins),na.rm=TRUE)
 Time difference of 3 mins
 
 Fine so far. However :
 
 mean(c(1:5, NA,5:1),na.rm=TRUE)
 [1] 3
 mean(as.difftime(c(1:5, NA,5:1),unit=mins),na.rm=TRUE)
 Time difference of NA mins
 
 Ouch !
 
 Curiously, var(), max() and min() behave as expected. What's so special
 with mean() ?
 
 RSiteSearch(mean difftime)
 [ ... doesn't return anything relevant ]
 
 NB : this isn't done for the hell of it. I intended to replace some
 missing dates, with something computed from other dates and mean time
 intervals).
 
 Any thoughs ?
 
   Emmanuel Charpentier
 
 __ R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
 guide http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Emmanuel Charpentier
On Sat, 07 Mar 2009 18:04:24 -0500, David Winsemius wrote :

[ Snip ... ]
 Nonetheless, I do think the relative numbers of package downloads might
 be interpretable, or at the very least, the basis for discussions over
 beer.

*Anything* might be the basis for discussions over beer (obvious 
corollary to Thermogoddamics' second principle).

More seriously : I don't think relative numbers of package downloads can 
be interpreted in any reasonable way, because reasons for package 
download have a very wide range from curiosity (what's this ?), fun 
(think fortunes...), to vital need tthink lme4 if/when a consensus on 
denominator DFs can be reached :-)...). What can you infer in good faith 
from such a mess ?

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Emmanuel Charpentier
Dear Barry,

As far as I understand, you're telling us that having a bit of data
mining does not harm whatever the data. Your example of pop music charts
might support your point (although my ears disagree ...) but I think it
is bad policy to indulge in white-noise analysis without a well-reasoned
motive to do so. It might give bad ideas to potential statistics
patrons (think a bit about the sorry state of financial markets :-().

More generally, I tend to be extremely wary about over-interpretation of
belly grumbles as the Voice of the Spirit ... which is a very powerful
urge of many statisticians and statistician's clients. Data mining can
be fine for exploratory musings, but a serious study needs a model, i.
e. a set of ideas and a way to reality-stress them.

As far as I can see (but I might be nearsighted), I see no model linking
package download to package use(s). Data may or may not become available
with more or less of an effort, but I can't see the point.

Emmanuel Charpentier

Le dimanche 08 mars 2009 à 16:08 +, Barry Rowlingson a écrit :
  I think the situation is worse than messy.  If a client comes in with data
  that doesn't address the question they're interested in, I think they are
  better served to be told that, than to be given an answer that is not
  actually valid.  They should also be told how to design a study that
  actually does address their question.
 
  You (and others) have mentioned Google Analytics as a possible way to
  address the quality of data; that's helpful.  But analyzing bad data will
  just give bad conclusions.
 
  As long as we say 'package Foo is the most downloaded package on
 CRAN', and not 'package Foo is the most used package for R', we can
 leave it to the user to decide if the latter conclusion follows from
 the former. In the absence of actual usage data I would think it a
 good approximation. Not that I would risk my life on it.
 
  Pop music charts are now based on download counts, but I wouldn't
 believe they represent the songs that are listened to the most times.
 Nor would I go so far as to believe they represent the quality of the
 songs...
 
  Should R have a 'Would you like to tell CRAN every time you do
 library(foo) so we can do usage counts (no personal data is
 transmitted blah blah) ?'? I don't think so
 
 Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] popular R packages

2009-03-08 Thread Emmanuel Charpentier
Le dimanche 08 mars 2009 à 13:22 -0500, Dirk Eddelbuettel a écrit :
 On 8 March 2009 at 13:27, Duncan Murdoch wrote:
 | But we don't even have that data, since CRAN is distributed across lots 
 | of mirrors.
 
 On 8 March 2009 at 19:01, Emmanuel Charpentier wrote:
 | As far as I can see (but I might be nearsighted), I see no model linking
 | package download to package use(s). Data may or may not become available
 
 Which is why Debian (and Ubuntu) use the _opt-in package_ popularity-contest
 that collects data on packages used and submits that to a host collecting the
 data.  This drives the so-called 'popcon' statistics.
 
 Yes, and there are many ways in which one can criticise this data collection
 process.   But I fail to see how __not having any data__ leads to more
 informed decisions.
 
 Once you have data, you have an option of using or discarding it. But if you
 have no data, you have no option.  How is that better?

I question 1) the usefulness of the effort necessary to get the data ;
and 2) the very concept of data mining, which seems to be the rationale
for this proposed effort.

Furthermore (but this is seriously off-topic), I seriously despise the
very idea of popularity in scientific debates... Everybody does it
is *not* a valid argument. Nor Everyone knows

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OT: A test with dependent samples.

2009-02-15 Thread Emmanuel Charpentier
 a control. That's because this trial has a *pragmatic*
goal : checking the acceptability of the administration of a drug on an
a priori set of criteria. It does not allow inferences on the effect of
the drug, and *postulates* that the non-administration of the drug will
result in nothing of interest. This allows us to pull a realistic,
interesting, null hypothesis out of our hats.

On the other hand, the controlled plans, by virtue of having a control,
allow us to be analytic, and separate the effect of the administration
from the effect of the drug itself : this latter one might indeed be
zero, the associated null hypothesis isn't nonsensical and the test of
this null isn't worthless.

 End of the loosely related methodological rant 

In any case, my point is : hypothesis testing is *not* the alpha and
omega of biostatistics, and other methods of describing and analysing
experimental results are often much more interesting, nonwhistanding the
fetishes of journal referees. Furthermore, testing of impossible or
worthless hypotheses lead to worthless conclusions. Corollary : do not
test for the sake of testing, because everybody does it or because a 
referee started a tantrum ; test realistic hypotheses, whose rejection
has at least some relation to your subject matter.

The two cents of someone tired of reading utter nonsense in prestigious
journals...

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Request for advice on character set conversions (those damn Excel files, again ...)

2008-09-08 Thread Emmanuel Charpentier
On Mon, 08 Sep 2008 01:45:51 +0200, Peter Dalgaard wrote :

 Emmanuel Charpentier wrote:
 Dear list,
 

[ Snip ... ]

 This looks reasonably sane, I think. The last loop could be d[] -
 lapply(d, conv1, from, to), but I think that is cosmetic. You can't
 really do much better because there is no simple way of distinguishing
 between the various 8-bit character sets.

Thank you Peter !

Could you point me to some not-so-simple (or even doubleplusunsimple) 
ways ? I get the problem not so rarely, and I'd like to pull this chard 
outta my poor tired foot one and for all... and I suppose that I am not 
alone in this predicament.

   You could presumably setup
 some heuristics. like the fact that the occurrence of 0x82 or 0x8a
 probably indicates cp437, but it gets tricky. (At least, in French, you
 don't have the Danish/Norwegian peculiarity that upper/lowercase o-slash
 were missing in cp437, and therefore often replaced yen and cent symbols
 in matrix printer ROMs. We still get the occational parcel addressed to
 ¥ster Farimagsgade.)

Peter, you're gravely underestimating the ingenuity of some Excel 
l^Husers... (and your story is a possible candidate for a fortune() 
entry...).

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Request for advice on character set conversions (those damn Excel files, again ...)

2008-09-07 Thread Emmanuel Charpentier
Dear list,

I have to read a not-so-small bunch of not-so-small Excel files, which 
seem to have traversed Window 3.1, Windows95 and Windows NT versions of 
the thing (with maybe a Mac or two thrown in for good measure...).
The problem is that 1) I need to read strings, and 2) those 
strings may have various encodings. In the same sheet of the same file, 
some cells may be latin1, some UTF-8 and some CP437 (!).

read.xls() alows me to read those things in sets of dataframes. my 
problem is to convert the encodings to UTF8 without cloberring those who 
are already (looking like) UTF8.

I came to the following solution :

foo-function(d, from=latin1,to=UTF-8){
  # Semi-smart conversion of a dataframe between charsets.
  # Needed to ease use of those [EMAIL PROTECTED] Excel files
  # that have survived the Win3.1 -- Win95 -- NT transition,
  # usually in poor shape..
  conv1-function(v,from,to) {
condconv-function(v,from,to) {
  cnv-is.na(iconv(v,to,to))
  v[cnv]-iconv(v[cnv],from,to)
  return(v)
}
if (is.factor(v)) {
  l-condconv(levels(v),from,to)
  levels(v)-l
  return(v)
}
else if (is.character(v)) return(condconv(v,from,to))
else return(v)
  }
  for(i in names(d)) d[,i]-conv1(d[,i],from,to)
  return(d)
}

Any advice for enhancement is welcome...

Sincerely yours,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sending ... to a C external

2008-08-22 Thread Emmanuel Charpentier
Le vendredi 22 août 2008 à 15:16 -0400, John Tillinghast a écrit :
 I'm trying to figure this out with Writing R Extensions but there's not a
 lot of detail on this issue.
 I want to write a (very simple really) C external that will be able to take
 ... as an argument.
 (It's for optimizing a function that may have several parameters besides the
 ones being optimized.)

!!! That's a hard one. I have never undertaken this kind of job, but I expect 
that your ... argument, if you can reach it from C (which I don't know) will 
be bound to a Lisp-like structure, notoriously hard to decode in C. Basically,
you'll have to create very low level code (an duplicate a good chunk of the R
parser-interpreter...). 

I'd rather treat the ... argument in a wrapper that could call the relevant 
C function with all arguments interpreted and bound... This wrapper would
probably be an order of magnitude slower than C code, but two orders of 
magnitude
easier to write (and maintain !). Since ... argument parsing would be done 
*once*
before the grunt work is accomplished by C code, the slowdown would (probably)
be negligible...

 I got the showArgs code (from R-exts) to compile and install, but it only
 gives me error messages when I run it. I think I'm supposed to pass it
 different arguments from what I'm doing, but I have no idea which ones.
 
 What exactly are CAR, CDR, and CADR anyway? Why did the R development team
 choose this very un-C-like set of commands? 

'Cause they are fundamental to the Lisp-like language that is S/R. Read on...

 They are not explained much in
 R-exts.

At the risk of incurring the R-help deities wrath :

In the beginning, John Mc Carthy created Lisp 1.5, who begat MACLISP who begat 
..., who begat Scheme ... . Nonwhistanding a long inheritance story, full of 
sound,
fury, holy wars and bastardry (sometimes evoking European royalty lines...), 
all 
Lisp-like language bear one mark (dominant allele) of their common ancestor, 
which
is the list structure. This structure is implemented as a pair of pointers, the
first one  pointing to the first element of the list, the second pointing to ...
the list of the rest of the members (or nothing, aka NIL).

In McCarthy's implementation, which was machine code on an IBM 704 (we were in 
1957, 
mind you...) such  word was spanned among different parts of a CPU word, known 
in the
technical documentation of this dinosaur as address register and decrement 
register
respectively. Thus the mnemonic names of Content of the Address Register and 
Content
of Decrement Register (aka CAR and CDR) for these two pointers, names 
which were
used for the (lisp) functions allowing to access them.

Those names have stuck mainly for historical (sentimental ? hysterical ? ) 
reasons.
Even when reasonable synonyms were introduced (e. g. first and rest in 
Common Lisp),
all the old hands (and young dummies who wanted to emulate them) kept car and 
cdr close
to their hearts.

So, fifty one years after McCarthy stroke of genius, this piece of CS snobbery 
is still
with us (and probably will 'til 64-bit Linux time counters roll over...).

Despite its C-like syntax, its huge collection of array and array-like 
structures,
its loops, S and R are fundamentally Lisp-like languages. Most notably, the 
representation
of executable code is accessible by the code itself : one can create a function 
which
computes another function. Lisp was the first language explicitly created for 
this
purpose, and  it is no happenstance that it (or one of his almost uncountable 
dialects)
that many (most ?) of such language use Lisp fundamentals. Many of R/S common 
way of
doing things have a Lisp smell : for example, it is no chance that 
{s|t|l}apply(),
outer() and suchlike are *way* more efficient than for(), and they are strongly
reminescent of Lisp's mapcar...

BTW, this ability to compute the language is probably the most fundamental 
point
distinguishing S/R from all the rest of statistical packages (SPSS, Stata, 
SAS and
the rest of the crowd...

Now I have to say that I'm not old enough to have first-hand knowledge of this 
history.
I was born, but not weaned, when McCarthy unleashed Lisp on an unsuspecting 
world ...
I learned that ca. 1978-9, while discovering VLisp. 

While I can't really help you (I still think that processing ... at C level 
is either
hubris of the worst sort, pure folly or a desperate case), I hope to have 
entertained you.

Sincerely,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice: problem using panel.superpose and panel.groups

2008-08-20 Thread Emmanuel Charpentier
Le dimanche 17 août 2008 à 09:36 +, Dieter Menne a écrit :

[ Snip .. ]
 Trellis graphics are a bit like hash functions: you can be close to the 
 target, but get a far-off result.

Nice candidate for a fortune() entry ...

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FYI: APL in R

2008-08-20 Thread Emmanuel Charpentier
Le mardi 19 août 2008 à 23:52 -0700, Jan de Leeuw a écrit :
 http://idisk.mac.com/jdeleeuw-Public/utilities/apl/apl.R

Thank you ! I was beginning to think I was the last of the dinosaurs...
ISTR that someone else on the list is also an (un)repentant APLer...

 Although this is all prefix and no infix, we could easily
 recreate some of the infamous APL one-liners that nobody
 can possibly understand or reproduce.

:-)!!! Having implemented a (primitive) stats package in APL, I know
what you mean... A liberal use of lamp was ... well, enlightening...

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Possible buglet (wart ?) of odfWeave 0.7.6 (with workaround)

2008-08-12 Thread Emmanuel Charpentier
Dear List,

I have had problems inserting some (not all !) figures via odfWeave
(using print(someLatticeFunction)...). The figure was correctly
displayed in a R device window but the resulting ODF document displayed
the correct space for the figure and an empty frame with a broken
image icon and a read error mention.

Exploration of the odf (.odt, in my case) file showed that the relevant
files (.eps, which is a vector format much more apt than bitmaps to
cutting/pasting on various media) existed in the Pictures directory of
the file and displayed fine.

Then lightning struck : the picture files were named an follows :
content_,  serial number, chunk name, .eps ; it occured to me that
OpenOffice 2.4.1 (as packaged in Debian) displayed a Read error *when
the chunk name had accented characters in it* (my system works in UTF8).
Plain ASCII chunk names work OK.

I do not know ODF specification well enough to be sure that this is an
odfWeave problem (some chunk name mangling might be necessary) rather
than an OpenOffice infelicity (non-implementation of a
specification-requested feature) : checking with another ODF-compliant
word processor might be useful, but I don't have any on hand.

In any case, this problem is not obvious after re-reading the odfWeave
documentation, and does not seem to have been already reported to r-help
(yes, Pr Ripley, I did my homework...). This report might spare some
poor non-English-writer sod some time ...

For what it's worth :

 sessionInfo()
R version 2.7.1 (2008-06-23) 
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base 

other attached packages:
[1] odfWeave_0.7.6  XML_1.96-0  lattice_0.17-13

loaded via a namespace (and not attached):
[1] grid_2.7.1  tools_2.7.1

Hoping this might help,

Emmanuel Charpentier

PS : don't try to answer to my subscribed address (charpent (at) bacbuc
(dot) dyndns (dot) org), which is the apparent source of this message :
bacbuc (dot) dyndns (dot) org is down and will quite like likely
remain so for at least two to three more weeks. I'm playing fast and
loose with Gmane by putting my registered address as a Reply-to address
and use my actual present address (emm (dot) charpentier (at) free (dot) fr)
as source. I do not know what will survive Gmane, Mailman and Martin's scripts
mauling...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Possible buglet (wart ?) of odfWeave 0.7.6 (with workaround)

2008-08-12 Thread Emmanuel Charpentier
Dear List,

I have had problems inserting some (not all !) figures via odfWeave
(using print(someLatticeFunction)...). The figure was correctly
displayed in a R device window but the resulting ODF document displayed
the correct space for the figure and an empty frame with a broken
image icon and a read error mention.

Exploration of the odf (.odt, in my case) file showed that the relevant
files (.eps, which is a vector format much more apt than bitmaps to
cutting/pasting on various media) existed in the Pictures directory of
the file and displayed fine.

Then lightning struck : the picture files were named an follows :
content_,  serial number, chunk name, .eps ; it occured to me that
OpenOffice 2.4.1 (as packaged in Debian) displayed a Read error *when
the chunk name had accented characters in it* (my system works in UTF8).
Plain ASCII chunk names work OK.

I do not know ODF specification well enough to be sure that this is an
odfWeave problem (some chunk name mangling might be necessary) rather
than an OpenOffice infelicity (non-implementation of a
specification-requested feature) : checking with another ODF-compliant
word processor might be useful, but I don't have any on hand.

In any case, this problem is not obvious after re-reading the odfWeave
documentation, and does not seem to have been already reported to r-help
(yes, Pr Ripley, I did my homework...). This report might spare some
poor non-English-writer sod some time ...

For what it's worth :

 sessionInfo()
R version 2.7.1 (2008-06-23) 
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=C;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base 

other attached packages:
[1] odfWeave_0.7.6  XML_1.96-0  lattice_0.17-13

loaded via a namespace (and not attached):
[1] grid_2.7.1  tools_2.7.1

Hoping this might help,

Emmanuel Charpentier

PS : don't try to answer to my subscribed address (charpent (at) bacbuc
(dot) dyndns (dot) org), which is the apparent source of this message :
bacbuc (dot) dyndns (dot) org is down and will quite like likely
remain so for at least two to three more weeks. I'm playing fast and
loose with Gmane by putting my registered address as a Reply-to address
and use my actual present address (emm (dot) charpentier (at) free (dot) fr)
as source. I do not know what will survive Gmane, Mailman and Martin's scripts
mauling...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read HUGE data sets?

2008-02-28 Thread Emmanuel Charpentier
Jorge Iván Vélez a écrit :
 Dear R-list,
 
 Does somebody know how can I read a HUGE data set using R? It is a hapmap
 data set (txt format) which is around 4GB. After read it, I need to delete
 some specific rows and columns. I'm running R 2.6.2 patched over XP SP2
 using a 2.4 GHz Core 2-Duo processor and 4GB RAM. Any suggestion would be
 appreciated.

Hmmm... Unless you're running a 64-bits version of XP, you might be SOL
(nonwhistanding the astounding feats of the R Core Team, which managed
to be able to use about 3,5 GB of memory under 32-bits Windows) : your
*raw* data will eat more than the available memory. You might be lucky
if some of them can be abstracted (e. g. long character chains that can
be reduced to vectors), or get unlucky (large R storage overhead of
nonreducible data).

You might consider changing machines : get a 64-bit machine with gobs of
memory and cross your fingers. Note that, since R pointers are 64-bits
wide instead of 32-bits, data storage needs will inflate...

Depending of the real meaning of your data and the processing they need,
you might also consider storing your raw data in a SQL DBMS, reduce them
in SQL and read in R only the relevant part(s). There also  are some
contributed packages that might help in special situations : biglm, birch.

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odfWeave and xtable

2008-01-19 Thread Emmanuel Charpentier
宋时歌 a écrit :
 David,
 
 The value of odfWeave is not limited to newbie users. It is vastly
 useful for researchers in fields that do not accept LaTeX for journal
 paper submission (for example, sociology, demography).

... and for those of us living in the salt mines of administration (in
my case), industry or economics|finance,, where damn Microsoft toolchain
is mandatory and anything else anathemous : we have to disguise our
outputs into something palatable to these tools, and odfWeave is a boon
to us ! Given one of Max's former mail addresses (pointing to a big
Pharma firm's mail server), I suspect that this was one of its
motivations. Thanks again, Max !

Thot is not to say that this tool cannot be perfected. I started to work
on cross-reference abilities but got interrupted bu some Real Live
nonmazskable interrupts, including some unplanned emergency works, the
loss of an USB key with my current work on this and some
non-unsignificant sickness...

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem when extacting columns of a data frame in a new data frame

2008-01-08 Thread Emmanuel Charpentier
Delphine Fontaine a écrit :

 Dear R-users,
 
 I would like to create a new data frame composed of 2 columns of another
 data frame. But it does not give me what I want...
 
 casesCNST[1:10,]
case X1 X2 X3 X4 expected
 1A1  0  0  0  0E
 2A2  0  0  0  1C
 3A3  0  0  0  2C
 4A4  0  0  0  3C
 5A5  0  0  0  4C
 6A6  0  0  1  0C
 7A7  0  0  1  1C
 8A8  0  0  1  2C
 9A9  0  0  1  3C
 10  A10  0  0  1  4C
 
 expectedCNST - data.frame (cbind (casesCNST$case, casesCNST$expected))
 
 expectedCNST[1:10,]
 X1 X2
 11  4
 2  112  3
 3  223  3
 4  334  3
 5  445  3
 6  556  3
 7  593  3
 8  604  3
 9  615  3
 10   2  3
 
 Why does the variables change ?!?

 'Cause you build your new data frame from vectors with no name
(casesCNST$case is a vector with no name). to keep the original names,
you should subset the original data frame, with casesCNST[,c(1,6)] or
casesCNST[,c(case,expected)].

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing R on ubuntu dapper

2008-01-07 Thread Emmanuel Charpentier
hadley wickham a écrit :
 I followed the instructions at
 http://cran.r-project.org/bin/linux/ubuntu/README.html, but I'm
 getting the following error:
 
 ~: sudo apt-get install r-base
 Reading package lists... Done
 Building dependency tree... Done
 Some packages could not be installed. This may mean that you have
 requested an impossible situation or if you are using the unstable
 distribution that some required packages have not yet been created
 or been moved out of Incoming.
 
 Since you only requested a single operation it is extremely likely that
 the package is simply not installable and a bug report against
 that package should be filed.
 The following information may help to resolve the situation:
 
 The following packages have unmet dependencies:
   r-base: Depends: r-base-core (= 2.6.1-1dapper0) but it is not installable
   Depends: r-recommended (= 2.6.1-1dapper0) but it is not
 going to be installed
 E: Broken packages

 Huh ?

Ubuntu i386 packages install with no problem. However, x86_64 is more
problematic : these packages are not built in the CRAN repositories.
IIRC, r-base is an all architectures packages (mostly a meta-package),
while r-base-core and r-recommended are binaries. Therefore, r-base is
indeed available, while r-base-core and r-recommended are not. Hence the
jam...

I have been able to rebuild x86-64 packages on Ubuntu Gutsy starting
from the Debian sources ; a cursory check lets me think that these
builds are correct. Let me know if you want them (no guarantees : caveat
emptor).

I plan to check them extensively and propose them to the SIG-Debian
mailing list, but this will have to wait...

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odfWeave cross-reference

2007-12-23 Thread Emmanuel Charpentier
Chris.H. Snow a écrit :
 How can I insert cross-references to odfWeave generated figures in my
 source odf before the graphic has been created with odfWeave?

In the current (0.6.0) version, you can't. I wrote some modifications to
odfWeave in this direction, planning to send them back to Max Kuhn after
some testing, but am currently unable to work on them.

Basically, you have to separate the creation of a reference to a
table|figure from the creation of the table|figure itself. The simplest
way is to be able 1) to create some forme of R object containing the
necessary reference info, 2) to use such an R object in ordre to insert
the XML code for reference and|or 3) to pass such an object as an
optional argument to table|figure caption creation.

My current set of functions allows all of the above, but is not
correctly integrated in the package, and has not been tested extensively.

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple problems

2007-12-06 Thread Emmanuel Charpentier
Gabor Grothendieck a écrit :
 On Dec 6, 2007 3:26 PM, marciarr [EMAIL PROTECTED] wrote:

[ Snip ]

 2- how do I solve a simple equation? Considering the equation y= exp(-x)^12,
 I would like to find the values of x for,  for example, y=0.01, so
 exp(-x)^12=0.01. How do I do that using R?
 I know those a probably very, very simple questions, but for which I do not
 seem to find the answer.
 
 Search between 0 and 1 for the root of the indicated function:
 
 uniroot(function(x) 0.1 - exp(-x)^12, 0:1)

I beg your pardon ?

I'd rather use high-school algebra/analysis :
log(exp(-x)^12)=log(0.01)
12log(exp(-x)=log(0.01)
-12x=log(0.01)
x=-log(0.01)/12=log(100)/12

Rushing for a sophisticated numerical tool without thinking for an
explicit solution is easy, fast, lazy (I do that every day ...), but
deprives you of the process of understanding the problem.

Which was probably the point of this (probable) homework...

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using eval(parse(text)) , gsub(pattern, replacement, x) , to process code within a loop/custom function

2007-12-06 Thread Emmanuel Charpentier
Thomas Pujol a écrit :
 R-help users,
   Thanks in advance for any assistance ... I truly appreciate your expertise. 
  I searched help and could not figure this out, and think you can probably 
 offer some helpful tips. I apologize if I missed something, which I'm sure I 
 probably did. 

   I have data for many samples. (e.g. 1950, 1951, 1952, etc.)
 
   For each sample, I have many data-frames. (e.g. temp.1952, births.1952, 
 gdp.1952, etc.)
 
   (Because the data is rather large (and for other reasons), I have chosen 
 to store the data as individual files, as opposed to a list of data frames.)

   I wish to write a function that enables me to run any of many custom 
 functions/processes on each sample of data.
 
   I currently accomplish this by using a custom function that uses:
 eval(parse(t=text.i2)) , and gsub(pat, rep, x) (this changes the sample 
 number for each line of text I submit to eval(parse(t=text.i2)) ).
 
   Is there a better/preferred/more flexible way to do this?

Beware : what follows is the advice of someone used to use RDBMS and SQL
to work with data ; as anyone should know, everything is a nail to a man
with a hammer. Caveat emptor...

Unless I misunderstand you, you are trying to treat piecewise a large
dataset made of a large number of reasonably-sized independent chunks.

What you're trying to do seems to me a bit reinventing SAS macro
language. What's the point ?

IMNSHO, large datasets that are used only piecewise are much better
handled in a real database (RDBMS), queried at runtime via, for example,
Brian Ripley's RODBC.

In your example, I'd create a table births with all your data + the
relevant year. Out of the top of my mind :

# Do that ONCE in the lifetime of your data : a RDBMS is probably more
# apt than R dataframes for this kind of management

library(RODBC)
channel-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS)

sqlSave(channel, tablename=Births,
rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))),
births.1952),
  cbind(data.frame(Year=rep(1953,nrow(births.1953))),
births.1953),
# ... ^W^Y ad nauseam ...
))

rm(births.1951, births.1952, ...) # get back breathing space

Beware : certain data types may be tricky to save ! I got bitten by
Dates recently... See RODBC documentation, your DBMS documentation and
the R Data Import/Export guide...

At analysis time, you may use the result of the relevant query exactly
as one of your dataframes. instead of :
foo(... data=birth.1952, ...)
type :
foo(... data=sqlQuery(channel,select * from \Births\ where
\Year\=1952;, ...) # Syntax illustrating talking to a picky DBMS...

Furthermore, the variable Year bears your d information. Problem
(dis)solved.

You may loop (or even sapply()...) at will on d :
for(year in 1952:1978) {
  query-sprintf(select * from \Births\ where \Year\=%d;,year)
  foo(... data=sqlQuery(channel,query), ...)
  ...
}

If you already use a DBMS with some connection to R (via RODBC or
otherwise), use that. If not, sqlite is a very lightweight library that
enables you to use a (very considerable) subset of SQL92 to manipulate
your data.

I understand that some people of this list have undertaken the creation
of a sqlite-based package dedicated to this kind of large data management.

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odfWeave error

2007-12-03 Thread Emmanuel Charpentier
Cleber N. Borges a écrit :
 hello all,
 I trying to use the package 'odfWeave'
 and I get the follow error:
 
 
 
 ###  error message
 #
 ...
   Removing content.xml
 
   Post-processing the contents
 Error in .Call(RS_XML_Parse, file, handlers, as.logical(addContext),  :
   Error in the XML event driven parser for content_1.xml: error parsing 
 attribute name
 
 
 
 The piece of the code is:
 
 ###  code
 
 ...
 
 makeGraph, echo=T, results=xml=
 fileplot='C:\\DADOS\\tmp\\plotx.emf'
 win.metafile(fileplot)
 plot( rnorm(300), col='red', t='l', lwd=2 ); grid()
 dev.off()
 @

 This chunk is pure R code and shouldn't output anything directly in
the output file. Since you are telling echo=T', whatever is output by
your code is sent to your output file, and since you assert
results=xml,this output is interpreted as xml ; since this isn't XML,
your're getting guff from the XML interpreter.

I'd rather do :
makeGraph, echo=FALSE=
invisible({
# Whatever you did ...
})
@
instead. Your second chunk :

 insertGraph, echo=T, results=xml=
 odfInsertPlot(file=fileplot, height=5, width=5 )
 @

should insert your plot in the output file.

[Snip ... ]

BTW : Windows (enhanced) metafile are Windows-specific. As far as I can
tell, recent versions of Office and OpenOffice.org correctly render
Encapsulated Postcript files, thus freeing you from another Windows
demendency. Unless you *have* to have an EMF output (it happens, I know
...), youd'better use use this format directly.

HTH

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] meta-analysis on diagnostic tests (bivariate approach)

2007-11-30 Thread Emmanuel Charpentier
Pedro Emmanuel Alvarenga Americano do Brasil a écrit :
 Hello friends of R list,
 
 Im a physician and Im not that good in statistics. I have posted similar
 email in the epi-sig list but no one aswered so far. Im cunducting a
 systematic review on two diagnostic test for a particular tropical disease.
 I found a free software that make most of the analysis called MetaDiSc.
 However there is a paticular analysis that I wuould like to try that it is
 not implemented in this software. I looked in R for meta-analysis functions
 but the packeges available dont work with diagnostic tests.
 
 Im trying for a while to adapt in R a function develped in SAS and published
 in J.B. Reitsma et al. / Journal of Clinical  Epidemiology 58 (2005)
 982?990, wich is a bivariate approach to meta-analysis of diagnotic tests.

lmer (in package lme4, IIRC), which *does* treat logistic regression
(among other GLMs) directly, should do the trick. Probably with more ease...

 Since I do not have any experience with SAS, Im having a hard time trying to
 do so.
 
 There is an appendix at the original text with some step by step (in SAS
 syntax) on how to do it but Im stuck on a more advanced stuff which I cant
 find a solution. In a step it is mentioned a Van Houwelingen (Statist. Med.
 2002; 21:589–624) approach to incorporate between and within study variances
 and using the restricted maximum likelihood estimation.

This is a large tutorial paper, which goes well beyond your goals.

 This is the script (or function) so far
 
 bivar-function(TP,FP,FN,TN){
 
# find a way to sum 0.5 to each null cell if any
 
   Se-TP/(TP+FN)
   Sp-TN/(TN+FP)
   logSe-log(Se/(1-Se))
   logSp-log(Sp/(1-Sp))
   vlogSe-1/(Se*(1-Se)*(TP+FN))
   vlogSp-1/(Sp*(1-Sp)*(TN+FP))
   bsvlogSp-0
   cblog-0
   bsvlogSp-0
   sdata-rbind(bsvlogSp,cblog,bsvlogSp,vlogSe,vlogSp)
 
 Does anyone know if there is any package or function with this bivariate
 random effect model that I cuold try to use? Neither a I could find any
 apckage or function to aply the restricted maximum likelihood estimation.
 
 Any tip that helps me go on or hand in the development of this script
 (function) would be most welcome.


I'll try to have a look on this problem one of these days (I already
have a serious backlog in daylight life...).

Out of the top of (what I use as) my mind (assuming that technique and
study are factors correctly identifying studies and compared techniques
(and forgetting +0.5 in empty cells which can be done along the lines of
TP-TP+0.5*(TP==0), etc ...)) :

data-rbind(data.frame(suc=TP, fai=FN,
   situation=rep(Diseased,nrow(TP)),
   study=study, technique=technique),
data.frame(suc=TN, fai=FP,
   situation=rep(NonDiseased,nrow(TN)),
   study=study, technique=technique))
data$situation-as.factor(as.character(data$situation))

Now,

model-lmer(cbind(suc,fai)~situation:technique-1+(1|situation %in%
study), data=data, family=binomial(link=logit))

should be close to what you're aiming at. But asking for Douglas Bates'
advice (in the R-SIG mixed model list) would be much better...

However, I am not a sanguine as Reitsma  al about the whole ROC
methodology. It has sound foundations for one specific problem :
modeling (a simplified type of) signal perception. Its use in medical
diagnostic evaluation is perfecly justified for problems of this kind
(e. g. choosing, for example, a cutoff point to interpret a biological
measurement in clinical terms, or comparing two numerical diagnostic
indices).

However, this approach ignores the costs and utilities of a false
positive and a false negative, which may lead to very different choices
and judgement criteria (I think that Frank Harrell has much better
credentials than mine to explain this point).

Furthermore, I have reservations about the (ab?)use of this model in
situation when there is no numerical measure for which a cutoff point is
to be choosen (e. g. ROC methodology in radiology) : while this is
curent practice, I am not sure what real world meaning such a ROC
curve has : as far as I can tell, a physician does *not* choose
(reliably) his/her sensitivity level, and therefore cannot move
his/her operating point along this hypothetical ROC curve.

If one accepts the hypothesis that sensitivities and specificities are
the only relevant data, the DOR estimation makes more sense (less
untestable hypotheses...).

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unweighted meta-analysis

2007-11-26 Thread Emmanuel Charpentier
Roy Sanderson a écrit :
 Hello
 
 I'm very much a beginner on meta-analysis, so apologies if this is a
 trivial posting.  I've been sent a set data from separate experimental
 studies, Treatment and Control, but no measure of the variance of effect
 sizes, numbers of replicates etc.  Instead, for each study, all I have
 is the mean value for the treatment and control (but not the SD). 

With possibly three very special kind of exceptions, what you've been
sent is insufficient for any kind of analysis (meta- or otherwise) : no
way to assess variability, hence no way to assess relative importance of
noise to data or relative importance of different set of data.

One possible exception is when the very nature of the experiment imply
that your data come from a truly one-parameter distribution. I'm
thinking, for example, of count data of rare events, which might, under
not-so-unreasonable(-in-special-circumstances) conditions, come from a
Poisson distribution.

Another possible set of exception is that when the second (and
following) parameter(s) can be deduced from obvious general knowledge.
For example (set in a semi-imaginary setup), one may give you the number
of people using a given service at least once during a specified period,
*provided* that in order to use this service, people have to register
with the service provider first. The data might be a simple number (no
valid indication of variability, if service use is too ferquent to be
modeled by a Poisson distribution), but looking up the number of
registered users in some data bank might provide you with a valid
proportion and population size, which is enough to meta-analyze.

But the third possibility is of course that your means are indeed the
result of experimenting on *ONE* experimental unit of each group. This
is highly dependent of what is measured and how (an example of this
might be industrial production per unit time with two different set of
tools/machines in various industrial setups : here, the experimental
unit is the industrial setup, and your mean is *one* measure of
speed). Then, you have *individual* data, that you should analyze
accordingly (e. g. t-test or Wilcoxon test if there is no relationship
between treated and control experimental unit, paired t-test or
paired Wilcoxon test if you are told that the means may be related,
etc ...). This is not a meta-analysis, but an analysis.

Outside these possibilities, I see no point of meta-analysing anything
that isn't analysable by itself.

 As
 far as I can tell, this forces me into an unweighted meta-analysis, with
 all the caveats and dangers associated with it. 

As far as I can tell, you're forced to tell your sponsor/tutor/whatever
 either that he doesn't know what he asks for or that he's trying to
fool you (and you saw it !) ; which might lead you to ask him to rethink
his question, give you more informatin about the measumements and
experimental setup, to provide you (or help you find) the missing data,
to stop playing silly games or to go fly a kite...

  Two possible approaches
 might be:
 
 a) Take the ln(treatment/control) and perform a Fisher's randomisation
 test (and also calculate +/- CI).
 b) Regress the treatment vs control values, then randomise (with or
 without replacement?) individual values, comparing the true regression
 coefficient with the distribution of randomisation regression
 coefficients.

I haven't the foggiest idea of what you're trying to do here :
introducing artficial variability in order to separate it for variation
between groups ?

Unless you are in the case of one experiment = one experimental unit
per group (see above) with no information about variability, the only
information you can use is the *sign* of the difference
Experimental-Control : if all or almost all of them go in the
same direction, one might be tempted to conclude that this direction is
not random (that's the sign test) . But this is only valid if the
hypothesis Direction is 50/50 random under H0 has validity under the
experimental setup, which your story doesn't tell...

 Both approaches would appear to be fraught with risks; for example in
 the regression approach, it is probable that the error distribution of
 an individual randomised regression might not be normal - would this
 then invalidate the whole set of regressions?

Again, you'd work against an artificial variability that you'd have
introduced yourself : what is the point ?

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pass lm( ) a char vector as the variables to be included

2007-11-26 Thread Emmanuel Charpentier
Gavin Simpson a écrit :
 On Mon, 2007-11-26 at 14:17 +, [EMAIL PROTECTED] wrote:
 Here are the codes of the example of lm( ):

 ## Annette Dobson (1990) An Introduction to
 Generalized Linear Models.
 ## Page 9: Plant Weight Data.
 ctl -
 (4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
 trt -
 (4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
 group - gl(2,10,20, labels=c(Ctl,Trt))
 weight - c(ctl, trt)
 anova(lm.D9 - lm(weight ~ group))
 lm.D90 - lm(weight ~ group - 1) # omitting intercept

 What I am doing is let the variable name group
 stored in a vector, say, g - group. The question is
 how to strip the quotation marks when we call lm( )
 through g?
 Try: 
 w = weight
 g = group
 form = as.formula(paste(w,g,sep=~))
 lm(form)

 Regards,
 Richie.
 
 For more complicated automation, the ideas and examples from Bill
 Venables Programmer Niche article in the R newsletter from a few years
 ago might be of use:
 
 [39] Bill Venables. Programmer's niche. R News, 2(2):24-26, June 2002.
 [ bib | PDF | http ]
 
 The PDF is available here:
 
 http://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf

Another possibility is to create macro (library(gtools) ; ? defmacro).
See Thomas Lumley's paper in R News 2001-3 (Programmer’s Niche: Macros
in R\n Overcoming R’s virtues).

HTH,

Emmanuel Charpentier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >