Re: [R] symbols in a data frame

2014-07-09 Thread Claudia Beleites
Hi Sam,

 But this may not be the important issue here at all. If k means the
 value is left censored at k -- i.e. we know it's less than k but not
 how much less -- than Sarah's proposal is not what you want to do.
 Exactly what you do want to do depends on context, and as it concerns
 statistical methodology, is not something that should be discussed
 here. Consult a local statistician if this is a correct guess.
I'd like to chime in with Bert's advise here. Unless the  LOQs are
very few*, they have the potential to seriously mess up any further data
analysis. 

Actually, I'd recommend you go one step back and ask the analysis lab
whether they can supply you with the uncensored data, specifying the
LOQ separately. 

A while ago I posted some illustrations about such censoring
at LOQ situations on cross validated, which may help you in forming a
decision how to go on:
http://stats.stackexchange.com/a/30739/4598

Claudia (Analytical Chemist  Chemometrician)


*or you know that they'll not matter for the particular data analysis
you want to do




-- 
Claudia Beleites, Chemist
Spectroscopy/Imaging
Leibniz Institute of Photonic Technology 
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Needed: Beta Testers and Summer Camp Students

2013-04-23 Thread Claudia Beleites
Hi Paul,

I skimmed over the pdf.

I have comments on the discusssion about centering. I'm from a
completely different field (chemometrics). Of course, I also have to
explain centering. However, the argumentation I use is somewhat
different from the one you give in your pdf. 

One argument I have in favour of (mean) centering is numerical
stability, depending on the algorithm of course.  

I generally recommend that if data is centered, there should be an
argument why the *chosen* center is *meaningful*, emphasizing that
centering actually involves decisions, and that the center can have a
meaning. 
While I agree that a centered model with the center chosen without any
thought about its meaning is exactly the same in every important way
compared to not centering, I disagree with the generality of your
claim. 

A natural center of the data may exist. And in this case, using this
appropriate center will ease the interpretation. Examples:
- In analytical chemistry / chemometrics e.g. we can often use blanks
  (samples without analyte) as coordinate origin. Centering to the
  blank removes the influence of some parts of the instrumentation,
  like sample holders, cuvettes, etc.
- Many of our samples (sample in the meaning of physical specimen) have
  a so-called matrix (a common composition/substance in which different
  other substances/things are observed), or is measured in a solvent.
- I also work with biological specimen. There we often have controls
  (either control specimen/patients or for example normal tissue [vs.
  diseased tissues]) which are another type of natural coordinate
  origin.
- I can even imagine problems where mean centering is meaningful:
  if the problem involves modeling properties that are deviations from a
  mean (I'm thinking of process analytics).  However, mean centering
  will always need careful attention about the sampling procedure.

Looking from the opposite point of view, some problems of *mean*
centering become apparent. If the data comes from different groups, the
mean may not be meaningful (I once heard a biologist arguing that the
average human has one ovary and one testicle - this gets your audience
awake and usually convinces immediately). And the mean may be
influenced by the different proportions of the groups in your data.
Which is what you do *not* want: what you want is a stable center.

Best,

Claudia

-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology 
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Needed: Beta Testers and Summer Camp Students

2013-04-23 Thread Claudia Beleites
Hi Paul,

I skimmed over the pdf.

I have comments on the discusssion about centering. I'm from a
completely different field (chemometrics). Of course, I also have to
explain centering. However, the argumentation I use is somewhat
different from the one you give in your pdf. 

One argument I have in favour of (mean) centering is numerical
stability, depending on the algorithm of course.  

I generally recommend that if data is centered, there should be an
argument why the *chosen* center is *meaningful*, emphasizing that
centering actually involves decisions, and that the center can have a
meaning. 
While I agree that a centered model with the center chosen without any
thought about its meaning is exactly the same in every important way
compared to not centering, I disagree with the generality of your
claim. 

A natural center of the data may exist. And in this case, using this
appropriate center will ease the interpretation. Examples:
- In analytical chemistry / chemometrics e.g. we can often use blanks
  (samples without analyte) as coordinate origin. Centering to the
  blank removes the influence of some parts of the instrumentation,
  like sample holders, cuvettes, etc.
- Many of our samples (sample in the meaning of physical specimen) have
  a so-called matrix (a common composition/substance in which different
  other substances/things are observed), or is measured in a solvent.
- I also work with biological specimen. There we often have controls
  (either control specimen/patients or for example normal tissue [vs.
  diseased tissues]) which are another type of natural coordinate
  origin.
- I can even imagine problems where mean centering is meaningful:
  if the problem involves modeling properties that are deviations from a
  mean (I'm thinking of process analytics).  However, mean centering
  will always need careful attention about the sampling procedure.

Looking from the opposite point of view, some problems of *mean*
centering become apparent. If the data comes from different groups, the
mean may not be meaningful (I once heard a biologist arguing that the
average human has one ovary and one testicle - this gets your audience
awake and usually convinces immediately). And the mean may be
influenced by the different proportions of the groups in your data.
Which is what you do *not* want: what you want is a stable center.

Best,

Claudia

-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology 
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OrgMassSpecR peak area issue

2013-03-18 Thread Claudia Beleites
Hi Chris,

 I am having an issue with the OrgMassSpecR package.  I run my HPLC
 using a DAD detector. 
You are on a statistics IDE related mailing list. Have mercy with
people from other fields and tell them that you are using a diode array
to measure UV/VIS absorption. (And possibly let them know that you
expect the absorbance A = lg I_0 - lg I ~ c.)

 My raw data is exported form chemstation as a
 csv file.  I then upload the csv into Rstudio no problem.  Using the
 DrawChromatogram function, I get a nice chromatogram, and my
 retention time, peak area, and apex intensity values are given as
 well.
 
 The problem comes with the peak area value given. The peak area is
 much smaller than a value that would make sense.
How do you know that (see next comment)?

 My peak area value is actually less than my apex intensity value. 
This is not a good criterion to determine what area value would
actually make sense: area and intensity have different units!

Possible solution: a glance on the code in DrawChromatogram reveals
that really the polygon area is calculated (as the manual specifies). 

Thus the area will be in counts*s or counts*min, and of course 1
count*min = 60 counts*s.  How long does your analyte
take to elute? Unless it is  2 min (if time is in min) or  2 s (for
time scale in s), the numeric value of the area should be  A_max
(approximating the peak as triangule).

Your apex (max) absorbance should ideally be a bit below 1, so a rough
guesstimate for peak area would be 1/2 A_max * Δt which will be quite
below 1 if you measure time in minutes.

If you detect by mass spec, you get ion counts which are large
numbers, so areas are likely to be  1 (regardless of min or s time
scale).


 Is this because I am using a DAD detector rather than an MS? If so,
 is there a simply way to edit the peak area equation so that it will
 also work with absorbance values?
Most probably you just want to get your units right!

Hope that helps, 

Claudia

PS: for future questions of this sort, you may want to consider asking
on stackoverflow.com (or chemistry.stackexchange.com) where you can
post nicely formatted code, calculation results and images with your
question.

-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology 
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OrgMassSpecR peak area issue

2013-03-18 Thread Claudia Beleites
Hi Bryan,

Division by 2 is correct, comes from trapezoid calculation. 
The modulo line is a funny way of producing c (2 : n, 1)

Best,

Claudia


Am Mon, 18 Mar 2013 15:00:06 -0400
schrieb Bryan Hanson han...@depauw.edu:

 If you type DrawChromatogram you can see the method used to calculate
 the peak area.  Looks to me like you could easily hack it if you
 wanted.  The relevant part about peak areas is this:
 
 for (j in 1:n) {
 k - (j%%n) + 1
 x[j] - peakTime[j] * peakIntensity[k] - peakTime[k] * 
 peakIntensity[j]
 }
 peakArea[i] - abs(sum(x)/2)
 
 which looks pretty standard to me, though I'm not clear right off the
 top of my head why they are dividing by 2.  You can always contact
 the maintainer.
 
 Bryan
 
 On Mar 18, 2013, at 1:34 PM, Christopher Beaver
 christopher.bea...@gmail.com wrote:
 
  Hello!
  
  I am having an issue with the OrgMassSpecR package.  I run my HPLC
  using a DAD detector.  My raw data is exported form chemstation as
  a csv file.  I then upload the csv into Rstudio no problem.  Using
  the DrawChromatogram function, I get a nice chromatogram, and my
  retention time, peak area, and apex intensity values are given as
  well.
  
  The problem comes with the peak area value given. The peak area is
  much smaller than a value that would make sense.  My peak area
  value is actually less than my apex intensity value.  Is this
  because I am using a DAD detector rather than an MS? If so, is
  there a simply way to edit the peak area equation so that it will
  also work with absorbance values?
  
  Any help is greatly appreciated.
  
  Thanks for your time.
  
  Chris Beaver
  
  [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.



-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology 
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transpose a big data file and write to a new file

2013-03-07 Thread Claudia Beleites
Hi Yao He,

this doesn't sound like R to me. I'd go for perl (or awk).

See e.g. here:
http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash

HTH

Claudia

Am Wed, 6 Mar 2013 22:37:14 +0800
schrieb Yao He yao.h.1...@gmail.com:

 Dear all:
 
 I have a big data file of 6 columns and 6 rows like that:
 
 AA AC AA AA ...AT
 CC CC CT CT...TC
 ..
 .
 
 I want to transpose it and the output is a new like that
 AA CC 
 AC CC
 AA CT.
 AA CT.
 
 
 AT TC.
 
 The keypoint is  I can't read it into R by read.table() because the
 data is too large,so I try that:
 c-file(silygenotype.txt,r)
 geno_t-list()
 repeat{
   line-readLines(c,n=1)
   if (length(line)==0)break  #end of file
   line-unlist(strsplit(line,\t))
 geno_t-cbind(geno_t,line)
 }
  write.table(geno_t,xxx.txt)
 
 It works but it is too slow ,how to optimize it???
 
 Thank you
 
 Yao He
 —
 Master candidate in 2rd year
 Department of Animal genetics  breeding
 Room 436,College of Animial ScienceTechnology,
 China Agriculture University,Beijing,100193
 E-mail: yao.h.1...@gmail.com
 ——
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.



-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology 
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] set working directory to current source directory

2012-08-14 Thread Claudia Beleites
Hi Sachin,

 Is there a way to get cran R to set the working directory to be
 wherever the source file is? Each time I work on a project on
 different computers I keep having to set the working directory which
 is getting quite annoying.

a while ago I asked a somewhat similar question on stackoverflow:

http://stackoverflow.com/questions/8835426/get-filename-and-path-of-sourced-file

You may want to have a look at the suggestions I got.

Best,

Claudia 




-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology 
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hyperSpec user survey

2012-07-25 Thread Claudia Beleites
Dear all,

I'm looking for users of the hyperSpec package for handling
(hyper)spectral or spectroscopic data in R which I maintain.

First of all, I made a few announcements concerning the further
development which can be found in the hyperSpec-help mailing list an on
which I hope to get user feedback: see
http://lists.r-forge.r-project.org/pipermail/hyperspec-help/2012-July/thread.html.

My second issue is:
Once in a while I have to convince people (adminstration of the
institute) that hyperSpec now has a considerable user base. Obviously,
this is  important for getting funding to go on with the development.

It would be extremely helpful if the hyperSpec users among you could
drop me a short email saying
- what kind of spectroscopy you use hyperSpec for
- where you are (country, if possible city and institution/company)
Of course, I'll treat the answers confidentially and I won't use names
etc. and I won't sell any information.

The goal is to have a few slides (geographical distribution of users -
always a nice and fancy thing to show, statistics on the kind of
spectroscopy etc.) which I'll also put on the hyperSpec homepage so you
can use them as well.

Thanks a lot,

Claudia Beleites
hyperSpec.r-forge.r-project.org

PS: please excuse if you get this request multiple times, I try to get
to my users in different ways...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to know perfect execution of function ? if error occurred in execution, how to report it?

2012-03-23 Thread Claudia Beleites
In addition, if you need to dig down why the error occurs:

?traceback
?recover

HTH Claudia


Am 23.03.2012 10:29, schrieb Jim Holtman:
 ?try
 
 Sent from my iPad
 
 On Mar 23, 2012, at 3:32, sagarnikam123 sagarnikam...@gmail.com wrote:
 
 i have one for loop,in which i am dealing with time series  arima function,
 while iterating at some stage there is a error, like

 Error in arima(x, c(p, 0, q)) : non-stationary AR part from CSS

 i want to know at which step this error occurred  print that iterating
 number

 e.g.
 x-c(1:10)
 for (i in 1:5 ){
 z-arima(x[i])
 print(z)
 }

 if error occurred in arima function at i=3 step, it should report  execute
 complete loop until i=5

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-to-know-perfect-execution-of-function-if-error-occurred-in-execution-how-to-report-it-tp4498037p4498037.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remoting ESS/R with tramp

2012-01-13 Thread Claudia Beleites
Tom,

what happens with:

(Emacs)
M-x ssh
t
(you should have the remote shell buffer now)
R
(once R is started)
M-x ess-remote
r

?



Claudia


-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Base function for flipping matrices

2012-01-02 Thread Claudia Beleites
Hadley,

I started to throw some functions that I needed to be extended to arrays
together as package arrayhelpers. If you consider that a good home for
the new functions, they would be more than welcome.

Currently I have the package at r-forge, but I wouldn't mind github,
either (so far I just use git-svn). Unit tests use svUnit, not testthat,
though.

Happy new year to everyone,

Claudia




Am 02.01.2012 18:38, schrieb Richard M. Heiberger:
 Hadley,
 
 Your request is reminding me of the analysis of aray functions in Philip S
 Abrams dissertation
 http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-r-114.pdf
 AN APL MACHINE
 
 The section that starts on page 17 with this paragraph is the one that
 immediately applies
 
 C. The Standard Form for Select Expressions
 
 In this section the selection operators considered are take, drop, reversal,
 transpose, and subscripting by scalars or _J-vectors. Because of the
 similarity
 among the selection operators, we might expect that an expression
 consisting only
 of selection operators applied to a single array could be expressed
 equivalently in
 terms of some simpler set of operators. This expectation is fulfilled in the
 standard form for select expressions, to be discussed below.
 
 I look forward to seeing where you take this in R.
 
 Rich
 
 On Mon, Jan 2, 2012 at 8:38 AM, Hadley Wickham had...@rice.edu wrote:
 
 But if not,  it seems to me that it should be added as an array method
 to ?rev with an argument specifying which indices to rev() over.

 Yes, agreed.  Sometimes arrays seem like something bolted onto R that
 is missing a lot of functionality.

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to get intecerpt standard error in PLS

2011-10-24 Thread Claudia Beleites

Am 24.10.2011 09:07, schrieb Jeff Newmiller:

Insufficient problem specification. Read the posting guide and try again with 
reproducible code and platform identification.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us  Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---
Sent from my phone. Please excuse my brevity.

arunkumarakpbond...@gmail.com  wrote:

Hi

how do we get intercepts standard error. I'm using the package pls.
i got the coefficient but not able to get the stabdard error


I think the answer is just along the lines of Bjørn-Helge Mevik's answer 
to your previous question.


That being said, maybe you could report the variation (std. dev, IQR, 
...) of the intercept observed during bootstrap or iterated (repeated) 
cross validation/jackknife instead of the standard error.


Claudia











--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-get-intecerpt-standard-error-in-PLS-tp3932104p3932104.html
Sent from the R help mailing list archive at Nabble.com.

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package snow: is there any way to check if a cluster is acticve

2011-10-13 Thread Claudia Beleites

Sören,

have a look at package snowfall which provides sfIsRunning.

HTH

Claudia


Am 13.10.2011 06:34, schrieb Søren Højsgaard:

Is there a 'proper' way of checking if cluster is active. For example, I create 
a cluster called .PBcluster


str(.PBcluster)

List of 4
  $ :List of 3
   ..$ con :Classes 'sockconn', 'connection'  atomic [1:1] 3
   .. .. ..- attr(*, conn_id)=externalptr
   ..$ host: chr localhost
   ..$ rank: int 1
   ..- attr(*, class)= chr SOCKnode
  $ :List of 3


Then I stop it with

stopCluster(.PBcluster)
.PBcluster

[[1]]
$con
Error in summary.connection(x) : invalid connection


str(.PBcluster)

List of 4
  $ :List of 3
   ..$ con :Classes 'sockconn', 'connection'  atomic [1:1] 3
   .. .. ..- attr(*, conn_id)=externalptr
   ..$ host: chr localhost
   ..$ rank: int 1
   ..- attr(*, class)= chr SOCKnode

- but is there a way in which I can check if the cluster is active??

Regards
Søren
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up this algorithm (apply-fuction / 4D array)

2011-10-06 Thread Claudia Beleites

here's another one - which is easier to generalize:

x - array(rnorm(50 * 50 * 50 * 91, 0, 2), dim=c(50, 50, 50, 91))
y - x [,,,1:90] # decide yourself what to do with slice 91, but
 # 91 is not divisible by 3
system.time ({
dim (y) - c (50, 50, 50, 3, 90 %/% 3)
y - aperm (y, c (4, 1:3, 5))
v2 - colMeans (y)
})
   User  System verstrichen
   0.320.080.40

(my computer is a bit slower than Bill's:)
 system.time (v1 - f1 (x))
   User  System verstrichen
  0.360   0.030   0.396

Claudia


Am 05.10.2011 20:24, schrieb William Dunlap:

I corrected your code a bit and put it into a function, f0, to
make testing easier.  I also made a small dataset to make
testing easier.  Then I made a new function f1 which does
what f0 does in a vectorized manner:

   x- array(rnorm(50 * 50 * 50 * 91, 0, 2), dim=c(50, 50, 50, 91))
   xsmall- array(log(seq_len(2 * 2 * 2 * 91)), dim=c(2, 2, 2, 91))

   f0- function(x) {
   data_reduced- array(0, dim=c(dim(x)[1:3], trunc(dim(x)[4]/3)))
   reduce- seq(1, dim(x)[4]-1, by=3)
   for( i in 1:length(reduce) ) {
   data_reduced[ , , , i]- apply(x[ , , , reduce[i] : (reduce[i]+2) ], 
1:3, mean)
  }
  data_reduced
   }

   f1- function(x) {
  reduce- seq(1, dim(x)[4]-1, by=3)
  data_reduced- (x[, , , reduce] + x[, , , reduce+1] + x[, , , reduce+2]) 
/ 3
  data_reduced
   }

The results were:

 system.time(v1- f1(x))
  user  system elapsed
 0.280   0.040   0.323
 system.time(v0- f0(x))
  user  system elapsed
73.760   0.060  73.867
 all.equal(v0, v1)
   [1] TRUE


I thought apply would already vectorize, rather than loop over every 
coordinate.

No, you have that backwards.  Use *apply functions when you cannot figure
out how to vectorize.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Martin Batholdy
Sent: Wednesday, October 05, 2011 10:40 AM
To: R Help
Subject: [R] speed up this algorithm (apply-fuction / 4D array)

Hi,


I have this sample-code (see above) and I was wondering wether it is possible 
to speed things up.



What this code does is the following:

x is 4D array (you can imagine it as x, y, z-coordinates and a time-coordinate).

So x contains 50x50x50 data-arrays for 91 time-points.

Now I want to reduce the 91 time-points.
I want to merge three consecutive time points to one time-points by calculating 
the mean of this three
time-points for every x,y,z coordinate.

The reduce-sequence defines which time-points should get merged.
And the apply-function in the for-loop calculates the mean of the three 
3D-Arrays and puts them into a
new 4D array (data_reduced).



The problem is that even in this example it takes really long.
I thought apply would already vectorize, rather than loop over every coordinate.

But for my actual data-set it takes a really long time ... So I would be really 
grateful for any
suggestions how to speed this up.




x- array(rnorm(50 * 50 * 50 * 90, 0, 2), dim=c(50, 50, 50, 91))



data_reduced- array(0, dim=c(50, 50, 50, 90/3))

reduce- seq(1,90, 3)



for( i in 1:length(reduce) ) {

data_reduced[ , , , i]-apply(x[ , , , reduce[i] : (reduce[i]+3) ], 
1:3, mean)
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package for Neural Network Classification Quality

2011-09-27 Thread Claudia Beleites

Alejandro,


Hi, somebody knows about one R-package which i can evaluate quality
(recall, precision, accuracy, etc) of Neural network classification
with more than 2 classes. I found ROCT package,
http://cran.r-project.org/web/packages/ROCR/index.html, but it only
workes with binary classifications,

I guess that is because strictly these measures are defined for binary
problems (though I expand them to multi-class situations by using
class-A ./. not-class-A binary measures which comes quite naturally for
my classes).

In case you need something that takes soft or fuzzy class measures: I
put my ideas about that into package softclassval and would much
appreciate feedback.

Best,

Claudia




Best regards,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predictive accuracy

2011-05-26 Thread Claudia Beleites

Al,

I'd redo everything and report in the paper that your peculiar predictor 
was contributing strongly to models that were built without excluding 
this predictor. This is an important information: your models get 
confused by the predictor (I'd consider this a lack of a certain kind 
of robustness, but I'm not a statistician).


HTH

Claudia

Am 26.05.2011 14:42, schrieb El-Tahtawy, Ahmed:

I am trying to develop a prognostic model using logistic regression.   I
built a full , approximate models with the use of penalization - design
package. Also, I tried Chi-square criteria, step-down techniques. Used
BS for model validation.



The main purpose is to develop a predictive model for future patient
population.   One of the strong predictor pertains to the study design
and would not mean much for a clinician/investigator in real clinical
situation and have been asked to remove it.



Can I propose a model and nomogram without that strong -irrelevant
predictor?? If yes, do I need to redo model calibration, discrimination,
validation, etc...?? or just have 5 predictors instead of 6 in the
prognostic model??



Thanks for your help

Al

.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with 2-D plot of k-mean clustering analysis

2011-05-18 Thread Claudia Beleites

Hi Meng,


  I would like to use R to perform k-means clustering on my data which
included 33 samples measured with ~1000 variables. I have already used
kmeans package for this analysis, and showed that there are 4 clusters in my
data. However, it's really difficult to plot this cluster in 2-D format
since the huge number of variables. One possible way is to project the
multidimensional space into 2-D platform, but I could not find any good way
to do that. Any suggestions or comments will be really helpful!
For suggestions it would be extremely helpful to tell us what kind of 
variables your 1000 variables are.


Parallel coordinate plots plot values over (many) variables. Whether 
this is useful, depends very much on your variables: E.g. I have 
spectral channels, they have an intrinsic order and the values have 
physically the same meaning (and almost the same range), so the parallel 
coordinate plot comes naturally (it produces in fact the spectra).


Claudia




Thanks,

Meng

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Outlier removal by Principal Component Analysis : error message

2011-05-05 Thread Claudia Beleites

Dear Boule,

thank you for your interest in hyperSpec.
In order to look into your *problem* I need some more information.

I suggest that we solve the error off-list. Please note also that 
hyperSpec has its own help mailing list:

hyperspec-h...@lists.r-forge.r-project.org
(due to the amount of spam I got to moderate, you need to subscribe 
first here: 
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/hyperspec-help)


- Which version of hyperSpec do you use? If it is the version from CRAN, 
could you please update to the development version at r-forge with

install.packages(hyperSpec,repos=http://R-Forge.R-project.org;)
?

- Next, if the problem persists with the latest build, could you send me 
the raw data file so that I can exactly reproduce your problem?


- Also, for tracking down the exact source of the error, please execute
traceback ()
after you got the error and email me its output.

It is basically impossible to give general recommendations about 
*Outlier detection*: a few spectra that are very different from all 
other spectra may be outliers or they may be the target of a study...
This is also why the example in the vignette uses a two step procedure: 
PCA only identifies suspects, i.e. spectra that have very different 
scores than all others for some principal components. The second step is 
a manually supervised decision whether the spectrum is really an outlier.


The first step could be replaced by other measures that however depend 
on your data. E.g. if you expect/know your data to consist of different 
clusters, suspects could be spectra that are too far away from any 
cluster. If your data comes from a mixture of a few components, spectra 
that cannot be modeled decently by a few PLS components could be 
suspicious. Or spectra that require an own component, ...
Some kinds of outliers are actually well-defined in a spectroscopic 
sense, e.g. contamination by fluorescent lamp light.


The second step could be replaced by an automatic decision, e.g. with a 
distance threshold.
Personally, I rather use the term filtering for such automatic rules. 
And there you can think about any number of rules your spectra must 
comply with in order to be acceptable: signal to noise ratio, minimal 
and maximal intensity, original offset (baseline) less than, ...


Hope that helps,

Claudia



I am currently analysis Raman spectroscopic data with the hyperSpec package.
I consulted the documentation on this package and I found an example
work-flow dedicated to Raman spectroscopy (see the address :
http://hyperspec.r-forge.r-project.org/chondro.pdf)

I am currently trying to remove outliers thanks to PCA just as they did in
the documentation, but I get a message error I can't explain. Here is my
code :

#import the data :
T=read.table('bladder bis concatenation colonne.txt',header=TRUE)
spec=new(hyperSpec,wavelength=T[,1],spc=t(T[,-1]),data=data.frame(sample=colnames(T[,-1])),label=list(.wavelength=Raman
shift (cm-1),spc=Intensity (a.u.)))

#baseline correction of the spectra
spec=spec[,,500~1800]
bl=spc.fit.poly.below(spec)
spec=spec-bl

#normalization of the spectra
spec=sweep(spec,1,apply(spec,1,mean),'/')

#PCA
pca=prcomp(~ spc,data=spec$.,center=TRUE)
scores=decomposition(spec,pca$x,label.wavelength=PC,label.spc=score/a.u.)
loadings=decomposition(spec,t(pca$rotation),scores=FALSE,label.spc=laoding
I/a.u.)

#plot the scores of the first 20 PC against all other to have an idea where
to find the outliers
pairs(scores[[,,1:20]],pch=19,cex=0.5)

#identify the outliers thanks to map.identify
out=map.identify(scores[,,5])
Erreur dans `[.data.frame`(x@data, , j, drop = FALSE) :
   undefined columns selected

Does anybody understand where the problem comes from ?
And does anybody know another mean to find spectra outliers ?

Thank you in advance.

Boule

--
View this message in context: 
http://r.789695.n4.nabble.com/Outlier-removal-by-Principal-Component-Analysis-error-message-tp3496023p3496023.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.belei...@ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROCR - best sensitivity/specificity tradeoff?

2011-04-07 Thread Claudia Beleites

Christian,


My questions concerns the ROCR package and I hope somebody here on
the list can help - or point me to some better place.

When evaluating a model's performane, like this:


pred1- predict(model, ..., type=response) pred2-
prediction(pred1, binary_classifier_vector) perf- performance(pred,
sens, spec)

(Where prediction and performance are ROCR-functions.)

How can I then retrieve the cutoff value for the
sensitivity/specificity tradeoff with regard to the data in the model
(e.g. model = glm(binary_classifier_vector ~ data, family=binomial,
data=some_dataset)? Perhaps I missed something in the manual? Or do I
need an entirely different approach for this? Or is there an
alternative solution?


a) look into the performance object, you find all values there

b) have a look at this thread
https://stat.ethz.ch/pipermail/r-help/attachments/20100523/51ec813f/attachment.pl
http://finzi.psych.upenn.edu/Rhelp10/2010-May/240021.html
http://finzi.psych.upenn.edu/Rhelp10/2010-May/240043.html

Claudia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Element by element mean of a list of matrices

2011-03-15 Thread Claudia Beleites

Peter,

as the matrices in the list have the same shape, you can unlist them 
into an array and then use rowMeans.


HTH

Claudia

Am 15.03.2011 21:17, schrieb hihi:

Hi All,
is there any effiective and dense/compact method to calculate the mean of a 
list of - of course coincident - matrices on an element by element basis? The 
resulting matrix' [i, j]-th element is the mean of the list's matrices' [i, 
j]-th elements respectively...
Iterating by for statement is quite straightforward, but I am seeking for a 
more elegant solution, and my attempt with the apply family of functions did 
not work.

Thank you,
by
Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Large dataset operations

2011-03-11 Thread Claudia Beleites

Haakon,

as replicates imply that they all have the same data type, you can put 
them into a matrix which is often faster and needs less memory (though 
whether that can really matter depends of the number of replicates you 
have: for small no of replicates you won't have much effect anyways).

But I find it handy to have the matrix of replicates with data$rep.

data - data.frame (plateNo = a, Well = b, rep = I (cbind (c, d, e)))
 data
   plateNo Well rep.c rep.d rep.e
11  A01  1312   963  1172
21  A02 10464  6715  5628
31  A03  3301  3257  3281
41  A04  3895  3350  3496
51  A05  8731  7389  5701
62  A01  7893  6748  5920
72  A02  2912  2385  2586
82  A03   985   785   809
92  A04  1346  1018  1001
10   2  A05   794   314   486
 dim (data)
[1] 10  3

Then:
data$norm - data$rep / apply (data$rep, 2, ave, plateNo = data$plateNo)

you can also do the division into the apply:
data$norm - apply (data$rep, 2, function (x) x / ave(x, plateNo = 
data$plateNo))



If you always have the sampe number of wells per plate, you could also 
fold the data$rep matrix into an array:

arep - array (data$rep, dim = c (2, 5, 3))
anorm - arep / rep (colMeans (arep), each = 2)
dim (anorm) - dim (data$rep)
data$norm - anorm


Here are some microbenchmark results:
Unit: nanoeconds
 min  lq  median  uq max
[1,] 1525160 1561280 1627620 1685020 3575719
[2,] 1505641 1539500 1560301 1649081 3538001
[3,]  113321  115041  115821  116881  155681
[4,] 2589800 2627280 2662540 2794920 4646399

1 and 2 are the two apply versions above.
3 is the array
4 are your loops

HTH

Claudia


Am 11.03.2011 18:38, schrieb hi Berven:


Hello all,

I'm new to R and trying to figure out how to perform calculations on a large dataset (300 000 
datapoints). I have already made some code to do this but it is awfully slow. What I want to do is 
add a new column for each rep_  column where I have taken each value and divide it by 
the mean of all values where PlateNo is the same. My data is in the following format:


data





PlateNo

Well

rep_1

rep_2

rep_3


1

A01

1312

963

1172


1

A02

10464

6715

5628


1

A03

3301

3257

3281


1

A04

3895

3350

3496


1

A05

8731

7389

5701


2

A01

7893

6748

5920


2

A02

2912

2385

2586


2

A03

985

785

809


2

A04

13462

1018

1001


2

A05

794

314

486

To generate it copy:
a- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
b- c(A01, A02, A03, A04, A05, A01, A02, A03, A04, A05)
c- c(1312, 10464,  3301,  3895,  8731,  7893,  2912,   985,  1346,   794)
d- c(963, 6715, 3257, 3350, 7389, 6748, 2385, 785, 1018,  314)
e- c(1172, 5628, 3281, 3496, 5701, 5920, 2586,  809, 1001,  486)
data- data.frame(plateNo = a, Well = b, rep_1 = c, rep_2 = d, rep_3 = e)

Here is the code I have come up with:

 rows- length(data$plateNo)
 reps- 3
 norm- list()
 for (rep in 1:reps) {
 x- paste(rep_,rep,sep=)
 normx- paste(normalised_,rep,sep=)
 for (row in 1:rows) {
 plateMean- 
mean(data[[x]][data$plateNo == data$plateNo[row]])
 wellData- data[[x]][row]
 norm[[normx]][row]- wellData 
/ plateMean
 }
 }


Any help or tips would be greatly appreciated!
Thanks,
Haakon  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot with same font like in LaTeX

2011-03-02 Thread Claudia Beleites

Jonas,
have a look at tikzdevice

Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Incorrectness of mean()

2011-02-28 Thread Claudia Beleites

On 02/28/2011 11:07 AM, zbynek.jano...@gmail.com wrote:

I have found following problem: I have a vector:

a- c(1.04,1.04,1.05,1.04,1.04)

I want a mean of this vector:

mean(a)

[1] 1.042 which is correct, but:

mean(1.04,1.04,1.05,1.04,1.04)

[1] 1.04 gives an incorrect value. how is this possible?

the x that is averaged is only the first 1.04, the other numbers go into mean's
... argument and are ignored.

Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re-arranging data to create an image (or heatmap)

2011-02-28 Thread Claudia Beleites

Pierz,

- easy approximation: use coulours with alpha value for plotting points
 x - runif (1000)
 y - runif (1000) * x + x
 plot (x, y)
 plot (x, y, pch = 20, col = #FF20)

- more sophisticated:
 have a look at hexbin (package hexbin), levelplot (package lattice), package 
ggplot2 with stat_sum, stat_binhex, or stat_density2d (e.g. 
http://had.co.nz/ggplot2/stat_sum.html)


HTH Claudia

On 02/28/2011 03:21 PM, pierz wrote:

Let me start by introducing myself as a biologist with only a little
knowledge about programming in matlab and R. In the past I have succesfully
created my figures in matlab using the hist3d command, but I have not access
to matlab right now and would like to switch to R.

I have used the plot command to create a figure of my data and it does
almost what I want it to do.

My data matrix looks like this (these are the first few lines from it,
copied from R console):

  Time Abs
[1,]  0.0971428624
[2,]  0.1942857124
[3,]  0.1942857124
[4,]  0.2914285724
[5,]  0.3885714323
[6,]  0.3885714322
[7,]  0.4857142923
[8,]  0.5828571421
[9,]  0.5828571421
   [10,]  0.680023
   [11,]  0.680025
   [12,]  0.680023
   [13,]  0.7771428623
   [14,]  0.7771428623
   [15,]  0.8742857121
   [16,]  0.8742857120
   [17,]  0.8742857122
   [18,]  1.0685714323
   [19,]  1.0685714325

The example shows that some of the plotted points appear more than once. I
would like to use a heatmap to show that these points have more weight, but
I have difficulty arranging the data to be plotted correctly using the
image() or heatmap() command.

So what I would want to do is to get the same figure as when I use the plot
command, but have colors representing the weight of the plotted points
(wether they occur once, twice or multiple times).

I have tried searching this forum and also used google, but I seem to keep
going in circles. I think that the image() command fits my needs, but that
my input data is not in the correct format.

Attached I have an image example from R and an image example from matlab.
This is how far I got using R:
http://r.789695.n4.nabble.com/file/n3327986/example_R.jpg

This is the result I am aiming for:
http://r.789695.n4.nabble.com/file/n3327986/example_matlab.jpg





--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems using unique function and !duplicated

2011-02-28 Thread Claudia Beleites

Jon,

you need to combine the conditions into one logical value, e.g. cond1  cond2, 
e.g. !duplicated(test$date)  !duplicated(test$var2)


However, I doubt that this is what you want: you remove too many rows (rows 
whose single values appeared already, even if the combination is unique).


Have a look at the wiki, though: 
http://rwiki.sciviews.org/doku.php?id=tips:data-frames:count_and_extract_unique_rows


Claudia


On 02/28/2011 04:51 PM, JonC wrote:

Hi, I am trying to simultaneously remove duplicate variables from two or more
variables in a small R data.frame. I am trying to reproduce the SAS
statements from a Proc Sort with Nodupkey for those familiar with SAS.

Here's my example data :

test- read.csv(test.csv, sep=,, as.is=TRUE)

test

   date var1 var2 num1 num2
1 28/01/11a1  213   71
2 28/01/11b1  141   47
3 28/01/11c2  867  289
4 29/01/11a2  234   78
5 29/01/11b2  666  222
6 29/01/11c2  912  304
7 30/01/11a3  417  139
8 30/01/11b3  108   36
9 30/01/11c2  288   96

I am trying to obtain the following, where duplicates of date AND var2 are
removed from the above data.frame.

datevar1var2num1num2
28/01/2011  a   1   21371
28/01/2011  c   2   867289
29/01/2011  a   2   23478
30/01/2011  c   2   28896
30/01/2011  a   3   417139



If I use the !duplicated function with one variable everything works fine.
However I wish to remove duplicates of both Date and var2.

  test[!duplicated(test$date),]
 date var1 var2 num1 num2
1 0011-01-28a1  213   71
4 0011-01-29a2  234   78
7 0011-01-30a3  417  139

test2- test[!duplicated(test$date),!duplicated(test$var2),]
Error in `[.data.frame`(test, !duplicated(test$date),
!duplicated(test$var2),  :   undefined columns selected

I get an error ?
I got different errors when using the unique() function.

Can anybody solve this ?

Thanks in advance.

Jon





--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The L Word

2011-02-24 Thread Claudia Beleites

On 02/24/2011 11:20 AM, Prof Brian Ripley wrote:

On Thu, 24 Feb 2011, Tal Galili wrote:


Thank you all for the answers.

So if I may extend on the question -
When is it important to use 'Literal integer'?
Under what situations could not using it cause problems?
Is it a matter of efficiency or precision or both?


Efficiency: it avoids unnecessary type conversions. For example

length(x)  1

has to coerce the lhs to double. We have converted the base code to use integer
constants because such small efficiency gains can add up.

Integer vectors can be stored more compactly than doubles, but that is not going
to help for length 1:


object.size(1)

48 bytes

object.size(1L)

48 bytes
(32-bit system).

see:
n - 0L : 100L
szi - sapply (n, function (n) object.size (integer (n)))
szd - sapply (n, function (n) object.size (double (n)))
plot (n, szd)
points (n, szi, col = red)








Thanks,
Tal




Contact
Details:---
Contact me: tal.gal...@gmail.com | 972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--





On Wed, Feb 23, 2011 at 6:15 PM, Tsjerk Wassenaar tsje...@gmail.com wrote:


Hi Gene,

It means 'Literal integer'.
So 1L is a proper integer 1, and 0L is a proper integer 0.

Hope it helps,

Tsjerk

On Wed, Feb 23, 2011 at 5:08 PM, Gene Leynes gleyne...@gmail.com wrote:

I've been wondering what L means in the R computing context, and was
wondering if someone could point me to a reference where I could read

about

it, or tell me what it's called so that I can search for it myself. (L

by

itself is a little too general for a search term).

I encounter it in strange places, most recently in the save

documentation.


save(..., list = character(0L),

file = stop('file' must be specified),
ascii = FALSE, version = NULL, envir = parent.frame(),
compress = !ascii, compression_level,
eval.promises = TRUE, precheck = TRUE)



I remember that you can also find it when you step inside an apply

function:



sapply(1:10, function(x)browser())
Called from: FUN(1:10[[1L]], ...)



I apologize for being vague, it's just something that I would like to
understand about the R language (the R word).

Thank you!

Gene

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Tsjerk A. Wassenaar, Ph.D.

post-doctoral researcher
Molecular Dynamics Group
* Groningen Institute for Biomolecular Research and Biotechnology
* Zernike Institute for Advanced Materials
University of Groningen
The Netherlands

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The L Word

2011-02-24 Thread Claudia Beleites

On 02/24/2011 05:14 PM, Hadley Wickham wrote:

Note however that I've never seen evidence for a *practical*
difference in simple cases, and also of such cases as part of a
larger computation.
But I'm happy to see one if anyone has an interesting example.

E.g., I would typically never use  0L:100L  instead of 0:100
in an R script because I think code readability (and self
explainability) is of considerable importance too.


But : casts to integer anyway:
I know - I just thought that on _this_ thread I ought to write it with L ;-) and 
I don't think I write 1L : 100L in real life.


I use the L far more often as a reminder than for performance. Particularly in 
function definitions.





str(0:100)

  int [1:101] 0 1 2 3 4 5 6 7 8 9 ...

And performance in this case is (obviously) negligible:


library(microbenchmark)
microbenchmark(as.integer(c(0, 100)), times = 1000)

Unit: nanoeconds
   min  lq median  uq   max
as.integer(c(0, 100)) 712 791813 896 15840

(mainly included as opportunity to try out microbenchmark)



So you save ~800 ns but typing two letters probably takes 0.2 s (100
wpm, ~ 5 letters per word + space = 0.1s per letter), so it only saves
you time if you're going to be calling it more than 125000 times ;)
calling 125000 times happens in my real life. I have e.g. one data set with 2e5 
spectra (and another batch of that size waiting for me), so anything done for 
each spectrum reaches this number each time the function is needed.

Also of course, the conversion time goes with the length of the vector.
On the other hand, in  95 % of the cases taking an hour to think about the 
algorithm will have much larger effects ;-).


Also, I notice that the first few measures of microbenchmark are often much 
longer (for fast operations). Which may just indicate that the total speed 
depends much more on whether the code allows caching or not. And that may mean 
that any such coding details may or may not help at all: A single such 
conversion may take disproportionally much more time.


I just (yesterday) came across a situation where the difference between numeric 
and integer does matter (considering that I do that with ≈ 3e4 x 125 x 6 array 
size): as.factor

 microbenchmark (i = as.factor (1:1e3), d = as.factor ((1:1e3)+0.0))
Unit: nanoeconds
   min  lq  median  uq max
i   884039  891106  895847  901630 2524877
d  2698637 2770936 2778271 2807572 4266197

but then:
 microbenchmark (
sd = structure ((1:1e3)+0.0, .Label = 1:100, class = factor),
si = structure ((1:1e3)+0L, .Label = 1:100, class = factor))
Unit: nanoeconds
   min  lq  median  uq max
sd   52875   53615   54040   54448 1385422
si   45904   46936   47332   47778   65360



Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The L Word

2011-02-23 Thread Claudia Beleites

On 02/23/2011 05:08 PM, Gene Leynes wrote:

I've been wondering what L means in the R computing context, and was
wondering if someone could point me to a reference where I could read about
it, or tell me what it's called so that I can search for it myself.  (L by
itself is a little too general for a search term).

It means that the number is an integer (a _L_ong integer of 32 bit actually)





I encounter it in strange places, most recently in the save documentation.

save(..., list = character(0L),

  file = stop('file' must be specified),
  ascii = FALSE, version = NULL, envir = parent.frame(),
  compress = !ascii, compression_level,
  eval.promises = TRUE, precheck = TRUE)



I remember that you can also find it when you step inside an apply function:


sapply(1:10, function(x)browser())
Called from: FUN(1:10[[1L]], ...)



I apologize for being vague, it's just something that I would like to
understand about the R language (the R word).

Thank you!

Gene

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort a 3 dimensional array across third dimension ?

2011-02-18 Thread Claudia Beleites

Dear James,

this is what I understood your sorting along the third dimension to be:
 x - array(c(9, 9, 7, 9, 6, 5, 4, 6, 2, 1, 3, 2), dim = list(2, 2, 3))

 y - apply (x, 1:2, sort)
 y
, , 1

 [,1] [,2]
[1,]21
[2,]65
[3,]99

, , 2

 [,1] [,2]
[1,]32
[2,]46
[3,]79


The results of apply are length (result of function) x [shape of x without the 
dimensions you hand to apply).


Thus, your specified result needs rearranging the dimensions:

 y - aperm (y, c(2, 3, 1))
 y
, , 1

 [,1] [,2]
[1,]23
[2,]12

, , 2

 [,1] [,2]
[1,]64
[2,]56

, , 3

 [,1] [,2]
[1,]97
[2,]99


HTH Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] {Spam?} Re: sort a 3 dimensional array across third dimension ?

2011-02-18 Thread Claudia Beleites

On 02/18/2011 04:11 PM, Maas James Dr (MED) wrote:

Hi Claudia,

It does help a lot, but not quite there yet ... I'm sure you are correct and
is much appreciated, I need some sort of generalized form, actual arrays in
my case are 3x3x1000.  Do you suspect it could be done in one step with
sapply?

Why sapply?

Sure you can to it in one step:
y- aperm (apply (x, 1:2, sort), c(2, 3, 1))
I just think two lines are more readable.

Note that all these numbers are the directions of the array and don't have 
anything to do with the actual size. Just try it out with different array sizes.


 a - array (runif (9000), c (3, 3, 1000))
 a [,,1:2]
, , 1

   [,1]   [,2][,3]
[1,] 0.8721 0.5102 0.47370
[2,] 0.7721 0.5744 0.98281
[3,] 0.9357 0.1969 0.08784

, , 2

   [,1]   [,2]   [,3]
[1,] 0.1485 0.6878 0.1018
[2,] 0.3784 0.3864 0.9814
[3,] 0.9219 0.5664 0.4565

 y- aperm (apply (a, 1:2, sort), c(2, 3, 1))
 y [,,1:2]
, , 1

  [,1]  [,2]  [,3]
[1,] 1.121e-03 1.517e-03 0.0008285
[2,] 7.118e-05 3.303e-04 0.0003870
[3,] 7.445e-04 2.461e-05 0.0005980

, , 2

 [,1]  [,2] [,3]
[1,] 0.001375 0.0049272 0.004581
[2,] 0.002204 0.0004947 0.001148
[3,] 0.004214 0.0006355 0.001610

 y [,,999:1000]
, , 1

   [,1]   [,2]   [,3]
[1,] 0.9989 0.9980 0.9998
[2,] 0.9982 0.9973 0.9994
[3,] 0.9994 0.9978 0.9993

, , 2

   [,1]   [,2]   [,3]
[1,] 0.9997 0.9992 0.
[2,] 0.9986 0.9981 0.9997
[3,] 0.9998 0.9988 0.9996


BTW: as your MARGINS are short, only 3 x 3 = 9 calls to FUN are necessary. I 
don't think you can gain much time here. The calculation with 3 x 3 x 1000 on my 
computer had 3 ms elapsed, and increasing every direction by a factor of 10 
still needs 1/3 s.




Claudia





Regards

J


=== Dr. Jim Maas Research Associate in Network
Meta-Analysis School of Medicine, Health Policy and Practice CD Annex, Room
1.04 University of East Anglia Norwich, UK NR4 7TJ

+44 (0) 1603 591412



From: Claudia Beleites [mailto:cbelei...@units.it]

Dear James,

this is what I understood your sorting along the third dimension to be:

x- array(c(9, 9, 7, 9, 6, 5, 4, 6, 2, 1, 3, 2), dim = list(2, 2,

3))


y- apply (x, 1:2, sort) y

, , 1

[,1] [,2] [1,]21 [2,]65 [3,]99

, , 2

[,1] [,2] [1,]32 [2,]46 [3,]79


The results of apply are length (result of function) x [shape of x without
the dimensions you hand to apply).

Thus, your specified result needs rearranging the dimensions:


y- aperm (y, c(2, 3, 1)) y

, , 1

[,1] [,2] [1,]23 [2,]12

, , 2

[,1] [,2] [1,]64 [2,]56

, , 3

[,1] [,2] [1,]97 [2,]99


HTH Claudia

-- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste

phone: +39 0 40 5 58-37 68 email: cbelei...@units.it



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] segfault during example(svm)

2011-02-18 Thread Claudia Beleites

Dear Jürgen,

did you update.packages (checkBuilt = TRUE) ?
I recently had segfaults, too on 64bit linux (with rgl, though) and they 
disappeared only after updating with checkBuilt (including also the packages 
originally installed via Dirk's .deb packages.


HTH,

Claudia



On 02/18/2011 09:32 PM, Juergen Rose wrote:

Am Freitag, den 18.02.2011, 11:53 -0800 schrieb Peter Ehlers:

On 2011-02-18 11:16, Juergen Rose wrote:

If do:

library(e1071)
example(svm)


I get:


svm   data(iris)

svm   attach(iris)

svm   ## classification mode
svm   # default with factor response:
svm   model- svm(Species ~ ., data = iris)

svm   # alternatively the traditional interface:
svm   x- subset(iris, select = -Species)

svm   y- Species

svm   model- svm(x, y)

svm   print(model)

Call:
svm.default(x = x, y = y)


Parameters:
 SVM-Type:  C-classification
   SVM-Kernel:  radial
 cost:  1
gamma:  0.25

Number of Support Vectors:  51


svm   summary(model)

Call:
svm.default(x = x, y = y)


Parameters:
 SVM-Type:  C-classification
   SVM-Kernel:  radial
 cost:  1
gamma:  0.25

Number of Support Vectors:  51

   ( 8 22 21 )


Number of Classes:  3

Levels:
   setosa versicolor virginica




svm   # test with train data
svm   pred- predict(model, x)

svm   # (same as:)
svm   pred- fitted(model)

svm   # Check accuracy:
svm   table(pred, y)
  y
pred setosa versicolor virginica
setosa 50  0 0
versicolor  0 48 2
virginica   0  248

svm   # compute decision values and probabilities:
svm   pred- predict(model, x, decision.values = TRUE)

svm   attr(pred, decision.values)[1:4,]
setosa/versicolor setosa/virginica versicolor/virginica
1  1.196152 1.0914600.6705626
2  1.064621 1.0563320.8479934
3  1.180842 1.0745340.6436474
4  1.110699 1.0531430.6778595

svm   # visualize (classes by color, SV by crosses):
svm   plot(cmdscale(dist(iris[,-5])),
svm+  col = as.integer(iris[,5]),
svm+  pch = c(o,+)[1:150 %in% model$index + 1])

   *** caught segfault ***
address (nil), cause 'unknown'

Traceback:
   1: .Call(La_rs, x, only.values, PACKAGE = base)
   2: eigen(-x/2, symmetric = TRUE)
   3: cmdscale(dist(iris[, -5]))
   4: plot(cmdscale(dist(iris[, -5])), col = as.integer(iris[, 5]),
pch = c(o, +)[1:150 %in% model$index + 1])
   5: eval.with.vis(expr, envir, enclos)
   6: eval.with.vis(ei, envir)
   7: source(tf, local, echo = echo, prompt.echo = paste(prompt.prefix,
getOption(prompt), sep = ), continue.echo = paste(prompt.prefix,
getOption(continue), sep = ), verbose = verbose, max.deparse.length
= Inf, encoding = UTF-8, skip.echo = skips, keep.source = TRUE)
   8: example(svm)

Possible actions:
1: abort (with core dump, if enabled)
..

I did already update.packages(), what can I still do.


Works just fine for me. What's your sessionInfo()?
Here's mine:
sessionInfo()
R version 2.12.1 Patched (2010-12-27 r53883)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] e1071_1.5-24 class_7.3-3

loaded via a namespace (and not attached):
[1] tools_2.12.1




sessionInfo()

R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base

It is working at some of my systems and is failing at the most.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] segfault during example(svm)

2011-02-18 Thread Claudia Beleites
 ...
** testing if installed package can be loaded

* DONE (e1071)

The downloaded packages are in
‘/tmp/RtmpRJM5aT/downloaded_packages’
Updating HTML index of packages in '.Library'
Making packages.html  ... done

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] amount of data R can handle in a single file

2011-02-17 Thread Claudia Beleites

On 02/17/2011 10:16 AM, Nasila, Mark wrote:

Dear Sir/Madam,



   I would like to know what is the maximum number of observations a
single file must have when using R. I am asking this because am trying

Dear Mark,


to do research on banking transactions and i have around 49million
records. Can R handle this? Advise with regard to this.

I think R can address up to a length of 2^32 ≈ 4.3e9 elements.
2^32 elements (numeric) = 32 GB per vector (matrix, array).

For me, the available RAM is the more important limit:
I work without problem with (numeric) matrices of size 2e5 x 250 = 5e7 elements 
(380 MB) that were produced from 5e4 x 2500 = 1.25e8 elements (≈ 1GB) raw data. 
The raw data is the practical limit on my 8 GB (64 bit linux) machine:
During the processing it becomes complex, thus ≈ 2 GB, and with that I had to be 
very careful not to copy the matrix too often. This and a bunch of gc() calls 
let me process the data without swapping. :-)
Note that 2 GB corresponds quite nicely to the rule of thumb that the end of fun 
is reached with variable sizes of 1/3 of the RAM.


If you are concerned about your data set, I'd recommend reading a fraction of 
the data set and have a look at the object.size() and also on how the RAM use is 
during data analysis of that partial data set. Then extrapolate to the complete 
data set.


HTH Claudia














Mark Nasila
Quantitative Analyst
CBS Risk Management

Personal Banking
7th Floor, 2 First Place,
Cnr Jeppe and Simmonds Street,
Johannesburg,
2000
Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118
e-mail mnas...@fnb.co.zamailto:mnas...@fnb.co.za

www.fnb.co.zahttp://www.fnb.co.za/   www.howcanwehelpyou.co.za
http://www.howcanwehelpyou.co.za/

First National Bank - a division of FirstRand Bank Limited.
An Authorised Financial Services and Credit Provider (NCRCP20).

'Consider the effect on the environment before printing this email.'




To read FirstRand Bank's Disclaimer for this email click on the following 
address or copy into your Internet browser:
https://www.fnb.co.za/disclaimer.html

If you are unable to access the Disclaimer, send a blank e-mail to
firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] When is *interactive* data visualization useful to use?

2011-02-11 Thread Claudia Beleites
 (which could easily lead to data dredging
(were the scope of the multiple comparison needed for correction is not even
clear).

Sure, yet:
- Isn't that what validation was invented for (I mean with a proper, new, 
[double] blind test set after you decided your parameters)?
- Summarizing a whole data set into a few numbers, without having looked at the 
data itself may not be safe, either:
- The few comparisons shouldn't come at the cost of risking a bad modeling 
modelling strategy and fitting parameters because the data was not properly 
examined.


My 2 ct,

Claudia (who in practice warns far more frequently of multiple comparisons and 
validation sets being compromised (not independent) than of too few data 
exploration ;-) )


--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spline interpolation

2011-02-05 Thread Claudia Beleites

Hi,

Just pure curiosity:
may I ask why you want to do spline interplation on fluorescence intensity as 
function of concentration?

Particularly as it looks quite typical for an unknown problem's calibration 
plot?

Claudia






On 02/05/2011 03:29 PM, Asan Ramzan wrote:

Hello R-help
I have the following data for a standard curve
concentration(nM),fluorescence
0,48.34
2,58.69
5,70.83
10,94.73
20,190.8
50,436.0
100,�957.9
�
(1)Is there function in R�to plot a spline.
(2)How can I interpolation,say 1000 point from 0nM-100nM and store this as a
data frame of concentration,fluorescence
(3)How can I modify�the code below so that instead of retrieving a concentration
with the exact value of fluorescence, it gives me concentration for the value
that is closest to that fluorescence.
�
subset(df,fluorescence==200.3456,select=concentration)



[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix' with Functions

2011-02-03 Thread Claudia Beleites

Seems funny to me:

 f - list (mean, sd, median, sum)
 dim (f) - c (2, 2)

or in one line:
 f - structure (.Data=list (mean, sd, median, sum), dim = c(2,2))
 f
 [,1] [,2]
[1,] ??
[2,] ??
 f [1,1]
[[1]]
function (x, ...)
UseMethod(mean)
environment: namespace:base

 f [[1,1]] (1:3)
[1] 2
 f [[2,1]] (1:3)
[1] 1
 f [[1,2]] (1:3)
[1] 2
 f [[2,2]] (1:3)
[1] 6

HTH

Claudia



On 02/03/2011 05:33 PM, Alaios wrote:

Dear R members,
I have a quite large of function that are named like that
f11,f12,...f15
f21,f22,...f25
..
f51,f52,...f52

These are static (hard-coded) functions that the only common they have is that 
they take the same number and type of input fij(a,b,c,d). As you might 
understand this is really close to the notion of matrix only that my 'matrix' 
contains functions. It would be great if I can address all these function using 
a numbering-scheme like F(i,j) where for example
F(1,1) will return the f11(a,b,c,d,).

I am sure that this might be quite complex to implement so could you please 
refer me to some book/tutorial that addresses this kind of topics?

I would like to thank you in advance for your help
Best Regards
Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] slightly off topic...

2011-02-02 Thread Claudia Beleites

The fact that the attachment to someone with a .de domain has them labeled
mit de Wort Teil suggests that the labeling is being done at the final
destination, since the server is set up with English messages.


wie er scharfsinnig bemerkte[1]

:-)

Looking at the ASCII version of one such email reveals:
- the part or Teil is how my thunderbird (English at work, German at home)
announces the parts of multipart emails. Neither part 1.2 nor Teil 1.2 is
written anywhere in the source.

- the r-help list footer somehow ended up as separate part of the email

Cheers,

Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pdf greek letter typos

2011-01-27 Thread Claudia Beleites

Eduardo,

On 01/27/2011 12:53 PM, Philipp Pagel wrote:

caused by a problem with font substitution in some version of
the poppler library which is uses by many LINUX PDF viewers. Try to
view the file in acrobat reader and possibly other viewers.


I'm running Ubuntu, and uninstalling package ttf-symbols-replacement did the 
trick for evionce  Co. on my system (acrobat reader was never affected; but 
used to show pdfs with transparency quite ugly - there was a discussion with 
solutions to both problems on the ggplot2 list last fall).


HTH,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] puzzled with plotmath

2011-01-20 Thread Claudia Beleites

Dear all,

I'm puzzled with matrix indices in plotmath.

I'm plotting matrix elements: Z [i, i], and I'd like to put that as label. I'll 
describe what I want and what I get in LaTeX-notation.


The output should look like Z_{i, i}, and my first try was
plot (1, 1, ylab = expression (Z[i, i]))

That, however, gives me Z_{i} (no comma, no second i) although the expression 
looks OK to me:

 a - expression (Z[i, i])
 a [[1]]
Z[i, i]
 str (as.list (a [[1]]))
List of 4
 $ : symbol [
 $ : symbol Z
 $ : symbol i
 $ : symbol i

I'm able to tweak the ouput looking as I want:
plot (1, 1, ylab = expression (Z[i][, ][i]))
which is, however, logically very far from what I want to express.

What am I missing?

I'm almost sure this has been discussed before, but I can't find it: can anyone 
point me to good search terms? Is it possible to search for the terms being 
close to each other in RSiteSearch and/or RSeek? I get lots of introductory 
documents as they point to plotmath and discuss matrices...


Thanks a lot for your help,

Claudia


--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] puzzled with plotmath II

2011-01-20 Thread Claudia Beleites

sorry, I forgot my sessionInfo: please see below.


 Original Message 
Subject: puzzled with plotmath
Date: Thu, 20 Jan 2011 12:48:18 +0100
From: Claudia Beleites cbelei...@units.it
To: R Help r-help@r-project.org

Dear all,

I'm puzzled with matrix indices in plotmath.

I'm plotting matrix elements: Z [i, i], and I'd like to put that as label. I'll
describe what I want and what I get in LaTeX-notation.

The output should look like Z_{i, i}, and my first try was
plot (1, 1, ylab = expression (Z[i, i]))

That, however, gives me Z_{i} (no comma, no second i) although the expression
looks OK to me:

a - expression (Z[i, i])
a [[1]]

Z[i, i]

str (as.list (a [[1]]))

List of 4
 $ : symbol [
 $ : symbol Z
 $ : symbol i
 $ : symbol i

I'm able to tweak the ouput looking as I want:
plot (1, 1, ylab = expression (Z[i][, ][i]))
which is, however, logically very far from what I want to express.

What am I missing?

I'm almost sure this has been discussed before, but I can't find it: can anyone
point me to good search terms? Is it possible to search for the terms being
close to each other in RSiteSearch and/or RSeek? I get lots of introductory
documents as they point to plotmath and discuss matrices...

Thanks a lot for your help,

Claudia

 sessionInfo ()
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C  LC_TIME=en_US.utf8
 [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8   LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] ggplot2_0.8.8 proto_0.3-8   reshape_0.8.3 plyr_1.2.1

loaded via a namespace (and not attached):
[1] digest_0.4.2 tools_2.12.1

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] puzzled with plotmath for matrix indices

2011-01-20 Thread Claudia Beleites

Gerrit,

thanks  viele Grüße nach Oberhessen :-)


plot (1, 1, ylab = expression (Z[list(i,i)]))
though that cannot be evaluated, either (due to [ not knowing what to do with an 
index list)


for future searches: probably the easiest cheat is, of course,
plot (1, 1, ylab = expression (Z[i, i]))

Anyways, I put the how to into the R Wiki page on plotmath.

And I suggest that it should be mentioned in the plotmath help = email to 
r-devel.

Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] puzzled with plotmath for matrix indices

2011-01-20 Thread Claudia Beleites

On 01/20/2011 02:11 PM, Uwe Ligges wrote:



On 20.01.2011 14:08, Claudia Beleites wrote:

Gerrit,

thanks  viele Grüße nach Oberhessen :-)


plot (1, 1, ylab = expression (Z[list(i,i)]))

though that cannot be evaluated, either (due to [ not knowing what to do
with an index list)



Works for me with a recent R version.
Sorry, my comment wasn't clear: sure it produces the desired output, what I 
meant is:

 Z
 [,1] [,2]
[1,]13
[2,]24
 i - 2
 eval (expression (Z[list(i,i)]))
Error in Z[list(i, i)] : invalid subscript type 'list'

whereas:
 eval (expression (Z[i,i]))
[1] 4

(and of course all the text-based solutions also lack the beauty of the 
expression actually meaning in R what the output looks like)



for future searches: probably the easiest cheat is, of course,
plot (1, 1, ylab = expression (Z[i, i]))


which is less convenient since you could not replace i by a dynamically
calculated number, for example.

good point.

Thanks, I learn a lot here :-)

Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] puzzled with plotmath II

2011-01-20 Thread Claudia Beleites

Peter,


Look for 'comma-separated list' on the help page!!

Yes, seeing the solution I also understand why list is the solution.
The special meaning of list () in plotmath was only in my passive vocabulary - 
and after this discussion I think it is upgraded to active ;-)


I have to admit that my coming from matlab (as opposed to lisp) still catches me 
once in a while: though I was aware that I would somehow need to change the tree 
of the expression, I went astray because c() still feels to me the more basic 
function to put things together than list ().
A second aspect that put me a bit off the track is that both create expressions 
that do have a meaning but don't mean in R what I want to express:

Z [c(a, b)] is meaningful, but not the same as Z [a, b]
Z [list (a, b)] is syntactically correct, but `[.matrix` doesn't accept lists 
for parameter i)


Anyways, thanks a lot for the patience everyone:
problem is solved, solutions (including bquote) are to be found in the Wiki, and 
instead of creating more fuss by unclear emails I'll fetch a coffee before I go 
on plotting my confusion matrix elements...


Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Claudia Beleites

On 01/07/2011 06:13 AM, Spencer Graves wrote:
  A more insidious problem, that may not affect the work of Jonah 
Lehrer, is political corruption in the way research is funded, with 
less public and more private funding of research 
Maybe I'm too pessimistic, but the term _political_ corruption reminds 
me that I can just as easily imagine a funding bias* in public 
funding. And I'm not sure it is (or would be) less of a problem just 
because the interests of private funding are easier to spot.


* I think of bias on both sides: the funding agency selecting the 
studies to support and the researcher subconsciously complying to the 
expectations of the funding agency.


On 01/07/2011 08:06 AM, Peter Langfelder wrote:

 From a purely statistical and maybe somewhat naive point of view,
published p-values should be corrected for the multiple testing that
is effectively happening because of the large number of published
studies. My experience is also that people will often try several
statistical methods to get the most significant p-value but neglect to
share that fact with the audience and/or at least attempt to correct
the p-values for the selection bias.
Even if the number of all the tests were known, I have the impression 
that the corrected p-value would be kind of the right answer to the 
wrong question. I'm not particularly interested in the probability of 
arriving at  the presented findings if the null hypothesis were true. 
I'd rather know the probability that the conclusions are true. Switching 
to the language of clinical chemistry, this is: I'm presented with the 
sensitivity of a test, but I really want to know the positive predictive 
value. What is still missing with the corrected p-values is the 
prevalence of good ideas of the publishing scientist (not even known 
for all scientists).  And I'm not sure this is not decreasing if the 
scientist generates and tests more and more ideas.
I found my rather hazy thoughts about this much better expressed in the 
books of Beck-Bornholdt and Dubben (which I'm afraid are only available 
in German).


Conclusion: try to be/become a good scientist: with a high prevalence of 
good ideas. At least with a high prevalence of good ideas among the 
tested hypotheses. Including thinking first which hypotheses are the 
ones to test, and not giving in to the temptation to try out more and 
more things as one gets more familiar with the experiment/data set/problem.
The latter I find very difficult. Including the experience of giving a 
presentation where I explicitly talked about why I did not do any 
data-driven optimization of my models. Yet in the discussion I was very 
prominently told I need to try in addition these other pre-processing 
techniques and these other modeling techniques - even by people whom I 
know to be very much aware and concerned about optimistically biased 
validation results. Which were of course very valid questions (and easy 
to comply), but I conclude it is common/natural/human to have and want 
to try out more ideas.
Also, after several years in the field and with the same kind of samples 
of course I run the risk of my ideas being overfit to our kind of 
samples - this is a cost that I have to pay for the gain due to 
experience/expertise.


Some more thoughts:
- reproducibility: I'm analytical chemist. We have huge amounts of work 
going into round robin trials in order to measure the natural 
variability of different labs on very defined systems.
- we also have huge amounts of work going into calibration transfer, 
i.e. making quantitative predictive models work on a different 
instrument. This is always a whole lot of work, and for some fields of 
problems at the moment considered basically impossible even between two 
instruments of the same model and manufacturer.

The quoted results on the mice are not very astonishing to me... ;-)

- Talking about (not so) astonishing differences between between 
replications of experiments:
I find myself moving from reporting ± 1 standard deviation to reporting 
e.g. the 5th to 95th percentiles. Not only because my data distributions 
are often not symmetric, but also because I find Im not able to directly 
perceive the real spread of the data from a standard deviation error 
bar. This is all about perception, of course I can reflect about the 
meaning. Such a reflection also tells me that one student having a 
really unlikely number of right guesses is unlikely but not impossible. 
There is no statistical law stating that unlikely events happen only 
with large sample sizes/number of tests. Yet the immediate perception is 
completely different.


- I happily agree with the ideas of publishing findings (conclusions) as 
well as the data and data analysis code I used to arrive there. But I'm 
aware that part of this agreement is due to the fact that I'm quite 
interested in the data analytical methods (I'd say as well as in the 
particular chemical-analytical problem at hand, but rather 

Re: [R] Removing Corrupt file

2010-12-17 Thread Claudia Beleites

Vikrant,

if you execute the code inside a function like

jpegplotfun - function (a, b){
   jpeg(mygraph.jpeg)
   plot(a,b)
   dev.off()
}

the dev.off () is not executed if an error occurs before. So the problem is 
basically that the jpeg file is still open (you may noticed open devices in R as 
left overs of these errors).


See ? try  and ? on.exit for ways to deal with situations where you need to 
clean up after errors.


HTH,

Claudia

On 12/17/2010 08:24 AM, vikrant wrote:


Hi
  I am generating a graph jpeg file using a function R.  I m using this
script
a- 1:10
b- 1:10
jpeg(mygraph.jpeg)
{
  plot(a,b)
}
dev.off()


If by some chance I do miss some values suppose for a , the file gets
created initially and then we do not plot anything in it. This file now
becomes corrupted and we cannot delete this file from current R Session.
I Have tried commands like file.remove() and unlink to remove the corrupt
file from current  R session.
Is there any other way inorder to remove such files??



Thanks  Regards,
Vikrant





--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summary (Re: (S|odf)weave : how to intersperse (\LaTeX{}|odf) comments in source code ? Delayed R evaluation ?)

2010-12-13 Thread Claudia Beleites

Dear Emmanuel and dear list,

Therefore, I let this problem to sleep. However, I Cc this answer (with
the original question below) to Max Kuhn and Friedrich Leisch, in the
(faint) hope that this feature, which does not seem to have been missed
by anybody in 8 years,
I've been missing it every once in a while, but till now I could always rephrase 
the problem with expand = FALSE or functions, and the chunk that does the actual 
calculation at the end.


Most often, however, I'm just lazy and use R comments. If math should go in 
there, I use listings instead of fancyvrb with the modified Sweave.sty that 
hopefully is attached (if not, see below).


Here's an example chunk:
keep.source=TRUE=
1 / 2 # $\frac{1}{x}$
4 + 4 # Here may come lots of explanations, that are in a \LaTeX\ 
paragraph\footnote{blabla}: even long lines are properly broken.\\ Though the 
new lines start at the beginning of the line. \\[6pt] And a line break in the 
chunk source will of course be interpreted as R again: so no new paragraphs 
inside the same comment.

# But there can be new commented lines.
3 + 6
# Note that comment only lines at the end of a code chunk seem to be lost.
# Not only one but all that aren't followed by R code
@
(the second line should be very long, I somehow can't keep thunderbird from 
inserting line breaks)



Hope that helps a bit,

Claudia

=== modified Sweave.sty ===
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{Sweave}{}

\RequirePackage{ifthen}
\newboolean{swe...@gin}
\setboolean{swe...@gin}{true}
\newboolean{swe...@ae}
\setboolean{swe...@ae}{true}

\declareoption{nogin}{\setboolean{swe...@gin}{false}}
\declareoption{noae}{\setboolean{swe...@ae}{false}}
\ProcessOptions

\RequirePackage{graphicx,listings}
\IfFileExists{upquote.sty}{\RequirePackage{upquote}}{}

\ifthenelse{\boolean{swe...@gin}}{\setkeys{gin}{width=0.8\textwidth}}{}%
\ifthenelse{\boolean{swe...@ae}}{%
  \RequirePackage[T1]{fontenc}
  \RequirePackage{ae}
}{}%

\lstnewenvironment{Sinput}{\lstset{language=R,basicstyle=\sl,texcl, 
commentstyle=\upshape}}{}

\lstnewenvironment{Soutput}{\lstset{language=R}}{}
\lstnewenvironment{Scode}{\lstset{language=R,basicstyle=\sl}}{}

\newenvironment{Schunk}{}{}

\newcommand{\Sconcordance}[1]{%
  \ifx\pdfoutput\undefined%
  \csname newcount\endcsname\pdfoutput\fi%
  \ifcase\pdfoutput\special{#1}%
  \else\immediate\pdfobj{#1}\fi}


--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{Sweave}{}

\RequirePackage{ifthen}
\newboolean{swe...@gin}
\setboolean{swe...@gin}{true}
\newboolean{swe...@ae}
\setboolean{swe...@ae}{true}

\declareoption{nogin}{\setboolean{swe...@gin}{false}}
\declareoption{noae}{\setboolean{swe...@ae}{false}}
\ProcessOptions

\RequirePackage{graphicx,listings}
\IfFileExists{upquote.sty}{\RequirePackage{upquote}}{}

\ifthenelse{\boolean{swe...@gin}}{\setkeys{gin}{width=0.8\textwidth}}{}%
\ifthenelse{\boolean{swe...@ae}}{%
  \RequirePackage[T1]{fontenc}  
  \RequirePackage{ae}
}{}%

\lstnewenvironment{Sinput}{\lstset{language=R,basicstyle=\sl,texcl, commentstyle=\upshape}}{}
\lstnewenvironment{Soutput}{\lstset{language=R}}{}
\lstnewenvironment{Scode}{\lstset{language=R,basicstyle=\sl}}{}

\newenvironment{Schunk}{}{}

\newcommand{\Sconcordance}[1]{%
  \ifx\pdfoutput\undefined%
  \csname newcount\endcsname\pdfoutput\fi%
  \ifcase\pdfoutput\special{#1}%
  \else\immediate\pdfobj{#1}\fi}
\documentclass{article}
\begin{document}
keep.source=TRUE=
1 / 2 # $\frac{1}{x}$
4 + 4 # Here may come lots of explanations, that are in a \LaTeX\ 
paragraph\footnote{blabla}: even long lines are properly broken.\\ Though the 
new lines start at the beginning of the line. \\[6pt] And a line break in the 
chunk source will of course be interpreted as R again: so no new paragraphs 
inside the same comment.
# But there can be new commented lines.
3 + 6
# Note that comment only lines at the end of a code chunk seem to be lost.
# Not only one but all that aren't followed by R code
@
\end{document}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using ``-'' in function argument

2010-12-03 Thread Claudia Beleites

On 12/03/2010 06:54 AM, Berwin A Turlach wrote:

On Thu, 2 Dec 2010 23:34:02 -0500
David Winsemiusdwinsem...@comcast.net  wrote:


[...] Erik is telling you that your use of ncol-4 got evaluated to
4 and that the name of the resulting object was ignored, howevert the
value of the operation was passed on to matrix which used positional
matching since = was not used.


Sounds like a fair summary of what Erik said, but it is subtly wrong.
R has lazy evaluation of its arguments.  There is nothing that forces
the assignment to be evaluated and to pass the result into the
function.  On the contrary, the assignment takes place when the
function evaluates the argument.
Let's say: as no argument name was given, the positional matching applied. And 
evaluation took place when argument no. 2 was required.


Of course, you could give an argument name:
 matrix(ncol - 4)
 [,1]
[1,]4
 matrix(nrow = ncol - 4)
 [,1]
[1,]   NA
[2,]   NA
[3,]   NA
[4,]   NA


Claudia


--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Performance tuning tips when working with wide datasets

2010-11-24 Thread Claudia Beleites

Dear Richard,


Does anyone have any performance tuning tips when working with datasets that
are extremely wide (e.g. 20,000 columns)?

The obvious one is: use matrices – and take care that they don't get converted
back to data.frames.


In particular, I am trying to perform a merge like below:

merged_data- merge(data1, data2,
by.x=date,by.y=date,all=TRUE,sort=TRUE);

This statement takes about 8 hours to execute on a pretty fast machine.  The
dataset data1 contains daily data going back to 1950 (20,000 rows) and has 25
columns.  The dataset data2 contains annual data (only 60 observations),
however there are lots of columns (20,000 of them).

I have to do a lot of these kinds of merges so need to figure out a way to
speed it up.

I have tried  a number of different things to speed things up to no avail.
I've noticed that rbinds execute much faster using matrices than dataframes.
However the performance improvement when using matrices (vs. data frames) on
merges were negligible (8 hours down to 7).

which is astonishing, as merge (matrix) uses merge.default, which boils down to
merge(as.data.frame(x), as.data.frame(y), ...)


 I tried casting my merge field
(date) into various different data types (character, factor, date).  This
didn't seem to have any effect. I tried the hash package, however, merge
couldn't coerce the class into a data.frame.  I've tried various ways to
parellelize computation in the past, and found that to be problematic for a
variety of reasons (runaway forked processes, doesn't run in a GUI
environment, doesn't run on Macs, etc.).

I'm starting to run out of ideas, anyone?  Merging a 60 row dataset shouldn't
take that long.


Do I understand correctly that the result should be a 2 x 20025 matrix,
where the additional 25 columns are from data2 and end up in the rows of e.g.
every 1st of January?

In that case, you may be much faster producing tmp - matrix (NA, 2, 2),
fill the values of data2 into the correct rows, and then cbind data1 and tmp.
Make sure you have enough RAM available: tmp is about 1.5 GB. If you manage to
do this without swapping, it should be reasonably fast.

If you end up writing a proper merge function for matrics, please let me know:
I'd be interested in using it...

Claudia



Thanks, Richard __
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal,
self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot inside function does not work

2010-11-22 Thread Claudia Beleites

Alex, this may be FAQ 7.22

Claudia

On 11/22/2010 02:19 PM, Alaios wrote:

Hello everyone,
when I commit a plot using console(command line?) plot works fine. I have 
created a function that plots based on the input. This function is called 
plot_shad. When I call this function alone in the command line I get my plot.

Then I tried to use another function as depicted_below to do some calculation 
before calling the function that does the plotting.

plot_shad_map-function(f,CRagent,agentid){
� for (i in c(1:nrow(shad_map))){
��� for (j in c(1:ncol(shad_map))){
 # Do something
��� }
� }
� plot_shad_f(shad_map) # This plots fine when used in command line. But inside 
this #function does not
� return(shad_map)
}

Unfortunately I get no plot . What might be the problem?

One more question how to get more plots at the same time. It seems that when I 
issue a new plot replaces the old plot.

I would like to thank you in advance for you help
Regards
Alex




[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a rounding option for \Sexpr{}?

2010-11-19 Thread Claudia Beleites

Dear all,


I don't like the current behaviour, but that change could break a lot of
existing documents. Since you can easily wrap your Sexpr arguments in a call to
whatever formatting function you want, why force all of those users to change
their documents?
I'm someone who would change a whole lot of \Sexpr{}s: I could get rid of all 
those round() and format()s...


Currently almost don't use \Sexpr as I find the advantage of just having a tiny 
little R expression in the text is lost if half a line of formatting code is 
required. Particularly, as one has to be careful not to have a line break in the 
\Sexpr{} as Sweave doesn't recognize those.
At the moment, I tend to use chunks with result=latex instead – which is not the 
nicest thing to read in the source as it breaks the flow of a sentence quite 
badly. But currently, it is much faster to type for me. On the other hand, maybe 
it's just about time to write a template/snippet for \Sexpr{format (, digits = 
3)}...


An alternative of course would be introducing a new kind of those commands. If 
that's going to happen, I'd vote for something really short like the brew 
syntax. But maybe I just didn't understand the advantage of \Sexpr{} and 
\VignetteXXX{} looking like Latex commands although they aren't (particularly as 
Latex source code highlighting without taking into account Sweave syntax is 
anyways messed up by $ in the \Sexpr{}.
Also, very subjectively, I'd find a syntax with angle brackets more consistent 
as the code chunks start with angle brackets anyways.


My 2 ct,

Claudia


--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stacking consecutive columns

2010-11-17 Thread Claudia Beleites

Dear Gregory,


Is there an easier, cleaner way to do this?  Thanks.

There are of course several ways...

(assuming yearmonth to be a data.frame)

--- 1 ---

year - colnames (yearmonth) [-1]
year - gsub (^[^[:digit:]]*([[:digit:]]*[^[:digit:]]*$), \\1, year)
year - as.numeric (year)

month - yearmonth$month

precip - as.matrix (yearmonth [, -1])

long.df - data.frame (month = rep (month, length (year)),
   year = rep (year, each = nrow (yearmonth)),
   precipitation = as.numeric (precip))


If you're about to do this more often:
--- 2 ---
package hyperSpec (outdated on CRAN, if you want to install it use the version 
on rforge)

has a function array2df which helps with this transformation:

long.df - array2df (precip, label.x = precipitation,
 levels = list (month = month, year = year)

--- 3 ---
depending on your file (are the column names numbers without the Xs?)
you may be able to abuse a hyperSpec object to read your data easily:
x - read.txt.wide (filename, ...more options...)
then
as.long.df (x)
is about what you want.
(You'd probably want to rename the columns)

HTH Claudia



Gregory A. Graves, Lead Scientist
Everglades REstoration COoordination and VERification (RECOVER)
Restoration Sciences Department
South Florida Water Management District
Phones:  DESK: 561 / 682 - 2429
CELL:  561 / 719 - 8157

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Xapply question

2010-11-12 Thread Claudia Beleites

Dear list,

I'm stuck with looking for a function of the *apply family, which I suppose 
exists already – just I can't find it:


What I'm looking for is somewhere between sweep and mapply that does a 
calculation vectorized over a matrix and a vector:


It should work complementary to sweep: for each row of the matrix, a different 
value of the vector should be handed over.
Close to mapply because I need to go through different variables in a parallel 
fashion (at the moment a matrix and a vector).


Kind of a mapply that hands over array slices.

Maybe it is easiest with an example. This loop does what I want, but
 A - matrix (rnorm (12), 3)
 A
[,1][,2][,3] [,4]
[1,]  0.1286  0.2888 -0.4435 -0.90966
[2,] -1.6000 -1.0884  1.3736  0.07754
[3,]  0.4581  1.5413  0.6133 -0.12131
 v - 1 : 3

 f - function (x, y) { # some function depending on vector x and skalar y
+c (sum (x^2), y)
+ }

 result - matrix (NA, nrow = nrow (A), ncol = 2)
 for (r in 1 : nrow (A))
+result [r,] - f (A [r,], v [r])
 result
  [,1] [,2]
[1,] 1.1241
[2,] 5.6372
[3,] 2.9763

The matrix will easily be in the range of 1e4 - 1e5 rows x 1e2 - 1e3 columns, so 
I do not want to split it into a list and combine it afterwards.


The reason why I ask for a function is partly also because I want to overload 
the functionality for a specific class and I don't think it's a good idea to 
invent a name for something that probably already exists.


If this function does not exist, any ideas how I should call it?

Thanks a lot,

Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] arrays of arrays

2010-11-10 Thread Claudia Beleites

Hi Sachin,

I guess there are several different possibilities that are more or less handy 
depending on your data:


- lists were mentioned already, and I think they are the most natural 
representation of ragged arrays. Also very flexible, e.g. you can introduce more 
dimensions. But they can get terribly slow and very memory consuming if you have 
many rows.


- If you have many rows and they have almost the same number of elements, you 
may be better off using a normal matrix and setting the unused elements to NA.


- There are also sparse matrices in package Matrix. I've never used them, but I 
guess they may be what you are after.


This here:
new(dgCMatrix
, i = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 0L, 1L, 0L, 1L, 2L, 3L, 4L, 5L)
, p = c(0L, 4L, 7L, 9L, 15L)
, Dim = c(6L, 4L)
, Dimnames = list(NULL, NULL)
, x = c(0, 0, 1, 1, 1, 3, 5, 4, 4, 7, -1, 8, 9, 10, 6)
, factors = list()
)

Is the transposed of your example:
6 x 4 sparse Matrix of class dgCMatrix

[1,] 0 1 4  7
[2,] 0 3 4 -1
[3,] 1 5 .  8
[4,] 1 . .  9
[5,] . . . 10
[6,] . . .  6

The numeric versions do not store the zeros, and will return 0 for for the 
elements marked with '.' in the print.


You won't get any benefit from this representation in terms of memory (over a 
normal matrix) unless the total number of elements is smaller than

nrow * max (elements per row) / 2 - nrow - some more overhead

The Matrix () function will give you a hint: check whether it produces a dense 
or a sparse matrix.


- if you are terribly tight with memory you'll program your own representation 
that just stores a vector of your values and start indices for each row.

You index then with rowstart [i] + j

Here's a comparison:

# list
 l - structure(list(V1 = c(0, 0, 1, 1), V2 = c(1, 3, 5), V3 = c(4,
4), V4 = c(7, -1, 8, 9, 10, 6)), .Names = c(V1, V2, V3,
V4))
 str (l)
List of 4
 $ V1: num [1:4] 0 0 1 1
 $ V2: num [1:3] 1 3 5
 $ V3: num [1:2] 4 4
 $ V4: num [1:6] 7 -1 8 9 10 6

 object.size (l)
736 bytes

# sparse matrix
 s - new(dgCMatrix
, i = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 0L, 1L, 0L, 1L, 2L, 3L, 4L, 5L)
, p = c(0L, 4L, 7L, 9L, 15L)
, Dim = c(6L, 4L)
, Dimnames = list(NULL, NULL)
, x = c(0, 0, 1, 1, 1, 3, 5, 4, 4, 7, -1, 8, 9, 10, 6)
, factors = list()
)
 s
6 x 4 sparse Matrix of class dgCMatrix

[1,] 0 1 4  7
[2,] 0 3 4 -1
[3,] 1 5 .  8
[4,] 1 . .  9
[5,] . . . 10
[6,] . . .  6
 object.size (s)
1640 bytes
# there's a lot of overhead for the sparse matrix

# matrix
 m - structure(c(0, 1, 4, 7, 0, 3, 4, -1, 1, 5, NA, 8, 1, NA, NA,
9, NA, NA, NA, 10, NA, NA, NA, 6), .Dim = c(4L, 6L))
 m
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]0011   NA   NA
[2,]135   NA   NA   NA
[3,]44   NA   NA   NA   NA
[4,]7   -189   106
 object.size (m)
392 bytes


# own representation
 o - structure(c(0, 0, 1, 1, 1, 3, 5, 4, 4, 7, -1, 8, 9, 10, 6), rowstart = 
c(0, 4, 7, 9)) # index of end of row before saves subtracting 1 all the time


 o
 [1]  0  0  1  1  1  3  5  4  4  7 -1  8  9 10  6
attr(,rowstart)
[1] 0 4 7 9
 object.size (o)
352 bytes

 o [attr (o, rowstart) [2] + 3 ]
[1] 5

Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] force apply not to drop the dimensions of FUN results ?

2010-11-10 Thread Claudia Beleites

Dear Yves,

You may not need to do more than set the dim attribute correctly:

dim (test) - c (dim (myArray) [c (3 : 4, 1 : 2)]
or
dim (test) - c (dim (myArray) [c (4 : 3, 1 : 2)]

Claudia


--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prcomp function

2010-11-10 Thread Claudia Beleites

I think PCA decomposes matrix A according to A'A, not to COV (A).

But if A is centered then A'A = (n + 1) COV (A).

So for non-centered A, you want to look at A'A instead:

 crossprod(A) %*% evec[,1] / (nrow (A) - 1) - eval [1] * evec [,1]
  [,1]
[1,] 0.000e+00
[2,] 0.000e+00
[3,] 1.066e-14

If I'm telling crap, someone please correct me!

Hope that helps,

Claudia



On 11/10/2010 02:41 PM, kicker wrote:


Hello,

I have a short question about the prcomp function. First I cite the
associated help page (help(prcomp)):

Value:
...
SDEV the standard deviations of the principal components (i.e., the square
roots of the eigenvalues of the covariance/correlation matrix, though the
calculation is actually done with the singular values of the data matrix).
ROTATION the matrix of variable loadings (i.e., a matrix whose columns
contain the eigenvectors). The function princomp returns this in the element
loadings.
...

Now please take a look at the following easy example:

first I  define a matrix A

A-matrix(c(0,1,4,1,0,3,4,3,0),3,3)

then I apply PCA on A

trans-prcomp(A,retx=T,center=F,scale.=F,tol=NULL)



eval-trans$sdev*trans$sdev #eval is the vector of the eigenvalues of

cov(A) (according to the cited help text above)

evec-trans$rotation #evec is the matrix with the eigenvectors of cov(A) as

columns  (according to the cited help text above)

now the eigenvalue equation should be valid, i.e. it should hold
cov(A)%*%ev[,1]=ew[1]*ev[,1]. But it doesn´t, my result:
cov(A)%*%ev[,1]= t(-0.8244927, -0.8325664,0.8244927)
ew[1]*ev[,1]=t(-8.695427,-7.129314,-10.194816)

So my question is : why does the eigenvalue equation not hold ?

The eigenvalue equation holds when I set center=T in the options of the
prcomp function. But as far as I know and as I understand the help text it
should have no influence on the eigenvalue equation whether the data are
centered or not. I know about the advantages of centered date but I want to
understand how the prcomp function works in the case of uncentered data.

Thank you very much for your efforts.




--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and Matlab

2010-10-29 Thread Claudia Beleites

Dear Henrik,

sorry for bothering you with a report hastily pasted together and not 
particularly nice for you as I used my toy data flu from a non-standard package. 
I should have better used e.g. the iris.


I'm aware that writeMat doesn't deal with S4 objects. In fact, if I'd overlook 
the error message, there's the 2nd chance to see that the file size is 0B.
In fact the attempt to save flu directly was a classical autopilot error, 
that's why I tried to save the x afterwards.


So the problem here was the unnamed storing of x.


I intentionally do not try to infer the name x from
writeMat(flu.mat, x), basically because I think using substitute()
should be avoided as far as possible, but also because it is unclear
what the name should be in cases such as writeMat(flu.mat, 1:10).
I was just going to suggest a patch that assigns the names of type Vnumber to 
the unnamed objects - but when I wanted to get the source I realized your 
version with the warning is already out.


I think, however, you may forgot a nchar?: any (nchar (names) == 0)

So here's my suggestion for l. 775-777 of writeMat.R:

  if (is.null(names) || any (nchar (names) == 0L)) {
names [nchar (names) == 0L] - paste (V, which (nchar (names) == 0L), sep 
= )

names (args) - names
warning(All objects written have to be named, e.g. use writeMat(..., x=a, 
y=y) and not writeMat(..., x=a, y): , deparse(sys.call()), \nDummy names have 
been assigned.);

  }


After all, e.g. data.frame () will also rather create dummy names for unnamed 
columns. And, I think, a warning should make the user aware that he's doing 
something that _may_ not work out as intendet. But here I think it is _most 
likely_ not working as intended.




MISCELLANEOUS:
Note that writeMat() cannot write compressed MAT files.  It is
documented in help(readMat), and will be so in help(writeMat) in
the next release.  Package Rcompression, loaded or not, has no effect
on writeMat().  It is only readMat() that can read them, if
Rcompression is installed.  You do not have to load it
explicitly/yourself - if readMat() detects a compress MAT file, it
will automatically try to load it;

OK, good to know.

Thanks a lot for your explanation in spite of my bad report.

Claudia


--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and Matlab

2010-10-28 Thread Claudia Beleites

I am looking for ways to use R and Matlab. Doing the data transformations in
R and using the data in Matlab to analyze with some pre-defined scripts.
Any good ways to transfer the data into matlab in its most recent version?
I tried using R.matlab but the writeMat output is not readable by Matlab.
It used to work, but I didn't need it for quite a while (a year or so ago, and 
with Matlab either 2007 or 2008a).


I just tried, and neither does it work for me.
You should notify the maintainer of R.matlab and include an example (code and 
data, e.g. with dput).


I noticed that library (R.matlab) does not load the Rcompression package, but 
also after library (Rcompression), the resulting file was not read by Matlab.


I tried loading a saved data.frame in Matlab 2008b on an Win XP computer: it 
doesn't find any variables inside the .mat file (and whos -file ...) doesn't 
show a variable.


The other way round with a stupid little vector it worked.

An R session (with only the 2nd try, after library (Rcompression)) is attached 
below.




I just need to output a data.frame and read it as is into matlab where I can
do any needed transformations on the variables.

If you need to transfer the data right NOW, there's always csv.

Claudia



 library (hyperSpec)
Loading required package: lattice
Package hyperSpec, version 0.95

To get started, try
   vignette (introduction, package = hyperSpec)
   package?hyperSpec
   vignette (package = hyperSpec)

If you use this package please cite it appropriately.
   citation(hyperSpec)
will give you the correct reference.

The project is hosted on http://r-forge.r-project.org/projects/hyperspec/

 sessionInfo ()
R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C  LC_TIME=en_US.utf8
 [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8   LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] hyperSpec_0.95lattice_0.19-13   R.matlab_1.3.3R.oo_1.7.4 
R.methodsS3_1.2.1


loaded via a namespace (and not attached):
[1] grid_2.12.0
 library (Rcompression)
 x = flu[[]]
 writeMat (flu.mat, flu)
Error in dim(x) - length(x) : invalid first argument
 writeMat (flu.mat, x)
 sessionInfo ()
R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C  LC_TIME=en_US.utf8
 [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8   LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Rcompression_0.8-0 hyperSpec_0.95 lattice_0.19-13R.matlab_1.3.3 
R.oo_1.7.4

[6] R.methodsS3_1.2.1

loaded via a namespace (and not attached):
[1] grid_2.12.0



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and Matlab

2010-10-28 Thread Claudia Beleites

On 10/28/2010 03:16 PM, Thomas Levine wrote:

Is there a particular reason you can't use csv?

(Not sure whether I'm meant - as I also suggested csv to Santosh)

But:

- It used to work, so there may be code existing that is broken now (e.g. I do
have such code, but at least for the moment it doesn't matter). Thus the 
information may very well be of interest for the maintainer.


- csv is fine for a matrix (or a vector) or a data.frame. How about arrays,
lists, more than one variable?

I think the default file format changed to v7.3 (though I'm not sure whether
that is just for  large variables).
Unfortunately -v switch of load (that used e.g. to allow reading of V4 files) is
gone, and I can't see anything to specify the .mat file format version.
The curious thing is that readMat does accept the file produced by Matlab 2008b.
If it is a matter of writeMat writing an old file format, I'd have expected that
rather load should still be able to read the writeMat generated file than
readMat being able to read Matlab's .mat file.

my 2 ct

Claudia



write.csv() in R

It seems that you can read csv in Matlab with this
http://www.mathworks.com/help/techdoc/ref/importdata.html

Tom

2010/10/28 Claudia Beleitescbelei...@units.it:

I am looking for ways to use R and Matlab. Doing the data transformations
in
R and using the data in Matlab to analyze with some pre-defined scripts.
Any good ways to transfer the data into matlab in its most recent version?
I tried using R.matlab but the writeMat output is not readable by Matlab.


It used to work, but I didn't need it for quite a while (a year or so ago,
and with Matlab either 2007 or 2008a).

I just tried, and neither does it work for me.
You should notify the maintainer of R.matlab and include an example (code
and data, e.g. with dput).

I noticed that library (R.matlab) does not load the Rcompression package,
but also after library (Rcompression), the resulting file was not read by
Matlab.

I tried loading a saved data.frame in Matlab 2008b on an Win XP computer: it
doesn't find any variables inside the .mat file (and whos -file ...) doesn't
show a variable.

The other way round with a stupid little vector it worked.

An R session (with only the 2nd try, after library (Rcompression)) is
attached below.



I just need to output a data.frame and read it as is into matlab where I
can
do any needed transformations on the variables.


If you need to transfer the data right NOW, there's always csv.

Claudia




library (hyperSpec)

Loading required package: lattice
Package hyperSpec, version 0.95

To get started, try
   vignette (introduction, package = hyperSpec)
   package?hyperSpec
   vignette (package = hyperSpec)

If you use this package please cite it appropriately.
   citation(hyperSpec)
will give you the correct reference.

The project is hosted on http://r-forge.r-project.org/projects/hyperspec/


sessionInfo ()

R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C  LC_TIME=en_US.utf8
  [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C
LC_MESSAGES=en_US.utf8
  [7] LC_PAPER=en_US.utf8   LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] hyperSpec_0.95lattice_0.19-13   R.matlab_1.3.3R.oo_1.7.4
R.methodsS3_1.2.1

loaded via a namespace (and not attached):
[1] grid_2.12.0

library (Rcompression)
x = flu[[]]
writeMat (flu.mat, flu)

Error in dim(x)- length(x) : invalid first argument

writeMat (flu.mat, x)
sessionInfo ()

R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C  LC_TIME=en_US.utf8
  [4] LC_COLLATE=en_US.utf8 LC_MONETARY=C
LC_MESSAGES=en_US.utf8
  [7] LC_PAPER=en_US.utf8   LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=CLC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Rcompression_0.8-0 hyperSpec_0.95 lattice_0.19-13R.matlab_1.3.3
R.oo_1.7.4
[6] R.methodsS3_1.2.1

loaded via a namespace (and not attached):
[1] grid_2.12.0



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127

Re: [R] clustering on scaled dataset or not?

2010-10-28 Thread Claudia Beleites

John,



Hi, just a general question: when we do hierarchical clustering, should we
compute the dissimilarity matrix based on scaled dataset or non-scaled dataset?




daisy() in cluster package allow standardizing the variables before calculating
dissimilarity matrix;


I'd say that should depend on your data.

- if your data is all (physically) different kinds of things (and thus 
different orders of magnitude), then you should probably scale.


- On the other hand, I cluster spectra. Thus my variates are all the 
same unit, and moreover I'd be afraid that scaling would blow up 
noise-only variates (i.e. the spectra do have low or no intensity 
regions), thus I usually don't scale.


- It also depends on your distance. E.g. Mahalanobis should do the 
scaling by itself, if think correctly at this time of the day...


What I do frequently, though, is subtracting something like the minimum 
spectrum (in practice, I calculate the 5th percentile for each variate - 
it's less noisy). You can also center, but I'm strongly for having a 
physical meaning, and for my samples that's the minimum spectrum is 
better interpretable (it represents the matrix composition).



but dist() doesn't have that option at all. Appreciate if
you can share your thoughts?

but you could call scale () and then dist ().

Claudia




Thanks

John




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Forest AUC

2010-10-23 Thread Claudia Beleites

Dear List,

Just curiosity (disclaimer: I never used random forests till now for 
more than a little playing around):


Is there no out-of-bag estimate available?
I mean, there are already ca. 1/e trees where a (one) given sample is 
out-of-bag, as Andy explained. If now the voting is done only over the 
oob trees, I should get a classical oob performance measure.
Or is the oob estimate internally used up by some kind of optimization 
(what would that be, given that the trees are grown till the end?)?


Hoping that I do not spoil the pedagogic efforts of the list in teaching 
Ravishankar to do his homework reasoning himself...


Claudia

Am 23.10.2010 20:49, schrieb Changbin Du:

I think you should use 10 fold cross validation to judge your performance on
the validation parts. What you did will be overfitted for sure, you test on
the same training set used for your model buliding.


On Sat, Oct 23, 2010 at 6:39 AM, mxkuhnmxk...@gmail.com  wrote:


I think the issue is that you really can't use the training set to judge
this (without resampling).

For example, k nearest neighbors are not known to over fit, but  a 1nn
model will always perfectly predict the training data.

Max

On Oct 23, 2010, at 9:05 AM, Liaw, Andyandy_l...@merck.com  wrote:


What Breiman meant is that as the model gets more complex (i.e., as the
number of trees tends to infinity) the geneeralization error (test set
error) does not increase.  This does not hold for boosting, for example;
i.e., you can't boost forever, which nececitate the need to find the
optimal number of iterations.  You don't need that with RF.


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of vioravis
Sent: Saturday, October 23, 2010 12:15 AM
To: r-help@r-project.org
Subject: Re: [R] Random Forest AUC


Thanks Max and Andy. If the Random Forest is always giving an
AUC of 1, isn't
it over fitting??? If not, how do you differentiate this from over
fitting??? I believe Random forests are claimed to never over
fit (from the
following link).

http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.hthttp://www.stat.berkeley.edu/%7Ebreiman/RandomForests/cc_home.ht
m#features


Ravishankar R
--
View this message in context:
http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3008157.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] superscript characters in title with '+'

2010-10-22 Thread Claudia Beleites

On 10/22/2010 03:15 PM, DrCJones wrote:


Hi, Thanks for all of your replies!

David, a slightly modified version of what you gave did the trick:

hist(X,main = expression([*Ca**^paste(2,+)*]i~'onsets'))

here you put the 2+ into the superscript of a superscript.

compare these four:
hist(X,main = expression(Ca**2~Ca^2~Ca**^2~Ca^^2))

** and ^ both mean power


But I prefer the way '2+' is italicized in the solution Dennis gave:

hist(X, main = bquote('[Ca'^'2+'*']i'~'onsets'), xlab = 'sec')

I think I understand it now - the '^' symbol must be followed by the '*'
symbol to signify the end of font italicization;

no, the ^ is the power function infix operator, as in x^2 = x².
* is multiplication, but the multiplication is written without dot:
x * y = x ⋅ y = xy

It is abused here to connect terms into one expression.

and '~' must be used to

signify spaces.

yes


The only thing I still don't get is why square brackets
rather than quotation marks surround the 'i' in the solution Claudia gave:

hist(X, main=expression([ * Ca^2+ * ] [i]~'onsets'),  xlab = 'sec')

the square bracket operator marks indices, which are written as subscript. So
x[i] produces what in LaTeX would be x_i

Being a chemist, it seemed natural to me to put the i after the concentration 
brackets into a subscript - though you didn't say you want that.


A more correct expression would be:

group ([, Ca^'2+', ]) [i]~onsets

where you can easier see that the [ and ] are special left and right 
delimiters. Note that the only term that needs to be hidden as character is 
the charge, as R doesn't know this way of writing ion charges and supposes + to 
be an infix operator.



Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Accuracy/Goodness of fit of nnet

2010-10-21 Thread Claudia Beleites

Raji,

you first need to tell us what kind of accuracy you mean.

The term accuracy has different meanings in different areas of science.
However, in classification it usually refers to something along the line number 
of correctly predicted samples / total number of samples (possibly weighted 
according to the number of samples per class).


Procedures:
You can calculate that for different types of test samples:

- prediction of the training samples gives you a goodness of fit. If you have 
(too) many variates you have in your model, this measure is close to useless. 
Useless, because most people are not interested in the goodness of fit anyways 
but want to know the performance for new samples.


- prediction of unknown (statistically independent) samples: this is usually 
what is of interest. You may use resampling schemes (out-of-bootstrap  Co., 
(iterated) cross validation).

There's package boot (though I never used it as it does not properly fit my 
data)

- Resampling schemes usually cannot tell you the performance for /future/ 
samples: for that you need a test set that is acquired later (and as close as 
possible to the real data to predict).
You need to do this if you want to take into account things like instrument 
drift etc.


There's tons of literature around, what to do depends somewhat on your field. I 
can point you to chemometric literature.


Calculating:
- package ROCR calculates all sorts of classifier performance measures for 
binary classification

.
- I'm developing a package that gives performance measures directly for 
continuous predictions (such as predict.mulitnom with type = probs). You are 
welcome to be a test user: just let me know if you want to try it out.



Hope that helps,

Claudia




On 10/21/2010 05:37 AM, Raji wrote:


Hi R-Helpers , am working on nnet package.Multinom() has an option for
finding the goodness of fit by giving the AIC value. Does nnet also gives
some value to determine the accuracy. If not, can you guide me with some
procedure to figure out the accuracy/goodness of fit of nnet model?

Thanks in advance.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about Density Plot

2010-10-18 Thread Claudia Beleites

Dear Ignacio,

if you want it hexagonal (as I gather from the hexbin_demo, have a look at the 
hexbin package.
Otherwise, lattice's levelplot is your friend. Or, if you prefer ggplot: 
geom_tile or geom_hex.


UIf you play a bit with findFn from package sos, e.g.

findFn (plot 2d density)
findFn (plot 2d histogram)

you'll find more related functions.

Claudia




I've attached an example about something I want to do in R. This example was
done in a Fortran application called ASGL. Here's an example in matplotlib

http://matplotlib.sourceforge.net/examples/pylab_examples/hexbin_demo.html

Basically, it's like a scatter plot, but have several additional things. One
thing are the grids inside the graph, and the other is a density bar used as a
reference to evaluate the frequency of the points.

The command that I've always used in R for scatter plots is.

  plot(l1, l2)

I need to know if there is something similar in a library of R, or if I could
implement it on my own.

Greetings

Ignacio



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] confusion matrix

2010-10-08 Thread Claudia Beleites

Dear Greg,

If it is only the NA that worries you: function table can deal with that.
? table
and:
example (table)

If you want to make a confusion matrix that works also with fractional answers 
(e.g. 50% A, 50% B, a.k.a soft classification) then you can contact me and 
become test user of a package that I'm just writing (you can also wait until it 
is published to CRAN, but that will take a while).


Best regards,

Claudia

Gregory Ryslik wrote:
Hi Everyone, 


In follow up to my previous question, I wrote some code that correctly makes a confusion 
matrix as I need it. However, it only works when the numbers are between 1 and n. If the 
possible outcomes are between 0 and n, then I can't reference row 0 of the 
matrix and the code breaks. Does anyone have any easy fixes for this? I've attached the 
entire code to this email.


As always, thank you for your help!

Greg

Code:

answers-matrix(c(4,2,1,3,2,1),nrow =6)
mat1- matrix(c(3,3,4,NA,4,2),nrow = 6)
mat2-matrix(c(3,2,1,4,2,3),nrow = 6)
mat3-matrix(c(4,2,2,2,1,1),nrow = 6)
mat4-matrix(c(4,2,1,3,1,4),nrow = 6)
mat5-matrix(c(2,3,1,4,2,3),nrow = 6)

matrixlist- list(mat1,mat2,mat3,mat4,mat5)
predicted.values- matrix(unlist(matrixlist),nrow = dim(mat1)[1])
 
confusion.matrix-matrix(0, nrow = length(as.vector(unique(answers))),ncol = length(as.vector(unique(answers


for(i in 1:dim(predicted.values)[1]){
for(j in 1: dim(predicted.values)[2]){

predicted.value- predicted.values[i,j]
if(!is.na(predicted.value)){
true.value- answers[i,] 
confusion.matrix[true.value, predicted.value] - 
confusion.matrix[true.value,predicted.value]+1
}
}
}

class.error- diag(1- prop.table(confusion.matrix,1))
confusion.matrix-cbind(confusion.matrix,class.error)
confusion.data.frame-as.data.frame(confusion.matrix)
names(confusion.data.frame)[1:length(as.vector(unique(answers)))]- 
1:length(as.vector(unique(answers)))
names(confusion.data.frame)[length(as.vector(unique(answers)))+1]- 
class.error
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] confusion matrix

2010-10-08 Thread Claudia Beleites

Gregory Ryslik wrote:

Hi,

Thank you for the help! Would this imply then that if my answers and
predicted are both matrices, I need to first make them into factors? I was
hoping to avoid that step...
Why are they matrices? What is the additional dimension? And: what should become 
of the additional dimension? with 2d reference and prediction, do you want to 
produce 3d or 4d confusion matrices?



Thank you again!

You are welcome.

Claudia



Kind regards, Greg On Oct 8, 2010, at 10:04 AM, Claudia Beleites wrote:


Gregory Ryslik wrote:

Hi, I played with the table option but I seem to be only able to give me
counts for numbers that exist. For example, if I don't have any 4's that
are predicted, that number is skipped!

Well, you need to tell the function that there _could_ be a 4 :


ref - factor (1 : 3) ref - factor (1 : 4) pred - factor (c (1 : 3, 1),
levels = levels (ref)) ref

[1] 1 2 3 4 Levels: 1 2 3 4

pred

[1] 1 2 3 1 Levels: 1 2 3 4

table (ref, pred)

pred ref 1 2 3 4 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 1 0 0 0

Claudia


Thanks, Greg Sent via BlackBerry by ATT -Original Message- From:
Claudia Beleites cbelei...@units.it Date: Fri, 08 Oct 2010 15:38:31 To:
Gregory Ryslikrsa...@comcast.net Cc: R Helpr-help@r-project.org
Subject: Re: [R] confusion matrix Dear Greg, If it is only the NA that
worries you: function table can deal with that. ? table and: example
(table) If you want to make a confusion matrix that works also with
fractional answers (e.g. 50% A, 50% B, a.k.a soft classification) then
you can contact me and become test user of a package that I'm just
writing (you can also wait until it is published to CRAN, but that will
take a while). Best regards, Claudia Gregory Ryslik wrote:

Hi Everyone, In follow up to my previous question, I wrote some code
that correctly makes a confusion matrix as I need it. However, it only
works when the numbers are between 1 and n. If the possible outcomes
are between 0 and n, then I can't reference row 0 of the matrix and
the code breaks. Does anyone have any easy fixes for this? I've
attached the entire code to this email. As always, thank you for your
help! Greg Code: answers-matrix(c(4,2,1,3,2,1),nrow =6) mat1-
matrix(c(3,3,4,NA,4,2),nrow = 6) mat2-matrix(c(3,2,1,4,2,3),nrow = 6)
mat3-matrix(c(4,2,2,2,1,1),nrow = 6) mat4-matrix(c(4,2,1,3,1,4),nrow
= 6) mat5-matrix(c(2,3,1,4,2,3),nrow = 6) matrixlist-
list(mat1,mat2,mat3,mat4,mat5) predicted.values- 
matrix(unlist(matrixlist),nrow = dim(mat1)[1]) 
confusion.matrix-matrix(0, nrow =

length(as.vector(unique(answers))),ncol =
length(as.vector(unique(answers for(i in
1:dim(predicted.values)[1]){ for(j in 1: dim(predicted.values)[2]){
predicted.value- predicted.values[i,j] if(!is.na(predicted.value)){
true.value- answers[i,] confusion.matrix[true.value, predicted.value]
- confusion.matrix[true.value,predicted.value]+1 } } } class.error-
diag(1- prop.table(confusion.matrix,1))
confusion.matrix-cbind(confusion.matrix,class.error)
confusion.data.frame-as.data.frame(confusion.matrix)
names(confusion.data.frame)[1:length(as.vector(unique(answers)))]- 
1:length(as.vector(unique(answers)))
names(confusion.data.frame)[length(as.vector(unique(answers)))+1]- 
class.error [[alternative HTML version deleted]] 
__ R-help@r-project.org

mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html and
provide commented, minimal, self-contained, reproducible code.


-- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali 
Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste


phone: +39 0 40 5 58-37 68 email: cbelei...@units.it





--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROCR predictions

2010-08-17 Thread Claudia Beleites

Dear Assa,



I am having a problem building a ROC curve with my data using the ROCR
package.

I have 10 lists of proteins such as attached (proteinlist.xls). each of the

your file didn't make it to the list.



lists was calculated with a different p-value.
The goal is to find the optimal p-value for the highest number of true
positives as well as lowaest number of false positives.



As far as I understood the explanations from the vignette of ROCR, my data
of TP and FP are the labels of the prediction function. But I don't know how
to assign the right predictions to these labels.


I assume the p-values are different cutoffs that you use for hardening (= 
making yes/no predictions) from some soft (= continuous class membership) output 
of your classifier.


Usually, ROCR calculates the curves as function of the cutoff/threshold itself 
from the continuos predictions. If you have these soft predictions, let ROCR do 
the calculation for you.


If you don't have them, ROCR can calculate your characteristics (sens, spec, 
precision, recall, whatever) for each of the p-values. While you could combine 
the results by hand into a ROCR-performance object and let ROCR do the 
plotting, it is then probably easier if you plot directly yourself.


Don't be shy to look into the prediction and performance objects, I find them 
pretty obvious. Maybe start with the objects produced by the examples.


Also, note ROCR works with binary validation data only. If your data has more 
than one class, you need to make two-class-problems first (e.g. protein xy ./. 
not protein xy).




BTW, Is there a way of finding the optimum in the curve? I mean to find the
exact value in the ROC curve (see sheet 2 in the excel file for the ROC
curve).


Someone asked for optimum on ROC a couple of months ago, RSiteSearch on the 
mailing list with ROC and optimal or optimum should get you answers.




I would like to thank for any help in advance

You're welcome.

Claudia

--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROCR data input

2010-08-17 Thread Claudia Beleites

Anneley,


Sorry, I'm new to R, and relatively new to statistics too so I'm still a bit
unclear.

That's OK - everyone started some time and was new.

However, it is really important to post a reproducible example here. If you are 
so new that you don't know how to do that exactly, you should probably write 
into your email that you tried but don't know how to do. Your chances to get an 
answer will probably increase quite a bit by that.


Also, I'd suggest you to go thoroughly through some introduction for R. There's 
a lot available on cran, the web and in many libraries.

E.g. a collection divided into more or less than 100 pages
http://cran.r-project.org/other-docs.html
r-project.org also has links to books, and to non-english material.


The values in the post were only a sample of around 8400 rows. The
label has 1 or 0 (I thought this was the two classes needed).

yes.


Each label row
has an equivalent probability. This is the data that I output from the
logistic regression analysis, but it is seemingly not the right format for
ROC curve analysis.

It is the right format.


There is a difference in how R displays the data, when I
type ROCR.simple it is in the format:

$predictions
  [1] 0.612547843 0.364270971 0.432136142...
$labels
  [1] 1 1 0 0 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 ... etc.

whereas mine is in columns, e.g.

ID, labels, probs
8930 0 0.00070
8931 0 0.00036
8932 1 0.0
8933 1 0.2
8934 0 0.1
etc.

Look up the difference between list and data.frame.
Also: you can find out a lot about variables with class () and str (), and maybe 
 summary ()



That is why I think it is a format issue, but being new to R, I'm not sure
what I need to do to rectify it.
 I have attached the text file if this helps.
No, we don't need it to reproduce your error - I think it's all more or less 
about typos:


 prediction(prob$probabilities, prob$label)
Error in prediction(prob$probabilities, prob$label) :
  Number of classes is not equal to 2.
ROCR currently supports only evaluation of binary classification tasks.

Now, if you need to trace down such an error, it is really a good idea to check 
what the arguments are that you hand over:


As many errors come from typos, it is a good idea to copy and paste literally 
what you put into the function:

 prob$probabilities
[1] prob$probabilities
 prob$label
[1] prob$label

See the difference between what your argument evaluates to and
what you thought to hand over?

Does this get you on the right track? I don't want to be nasty, but if you 
discover the mistakes yourself, you'll be much faster finding such things next time.


So: try with these hints, and if it doesn't work, you can ask again.

HTH,

Claudia
--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROCR predictions

2010-08-17 Thread Claudia Beleites

Dear Assa,

you need to call prediction with continuous predictions and a _binary_ true 
class label.


You are the only one who can tell whether the p-values are actually predictions 
 and what the class labels are. For the list readers p is just the name of 
whatever variable, and you didn't even vaguely say what you try to classify, nor 
did you offer any explanation of what the columns are.


The only information we get from your table is that p-value has small and 
continuous values. From what I see the p-values could also be fitting errors of 
the predictions (e.g. expressed as a probability that the similarity to the 
predicted class is random).


Claudia

Assa Yeroslaviz wrote:

Dear Claudia,

thank you for your fast answer.
I add again the table of the data as an example.

Protein ID 	Pfam Domain 	p-value 	Expected 	Is Expected 	True Postive 
False Negative 	False Positive 	True Negative

NP_11.2 APH 1.15E-05APH TRUE1   0   0   0
NP_11.2 MutS_V  0.0173  APH FALSE   0   0   1   0
NP_62.1 CBS 9.40E-08CBS TRUE1   0   0   0
NP_66.1 APH 3.83E-06APH TRUE1   0   0   0
NP_66.1 CobU0.009   APH FALSE   0   0   1   0
NP_66.1 FeoA0.3975  APH FALSE   0   0   1   0
NP_66.1 Phage_integr_N  0.0219  APH FALSE   0   0   1   0
NP_000161.2 Beta_elim_lyase 6.25E-12Beta_elim_lyase 
TRUE1   0   0   0
NP_000161.2 Glyco_hydro_6   0.002   Beta_elim_lyase FALSE   0   
0   1   0
NP_000161.2 SurE0.0059  Beta_elim_lyase FALSE   0   0   
1   0
NP_000161.2 SapB_2  0.0547  Beta_elim_lyase FALSE   0   0   
1   0
NP_000161.2 Runt0.1034  Beta_elim_lyase FALSE   0   0   
1   0
NP_000204.3 EGF 0.004666118 EGF TRUE1   0   0   0
NP_000229.1 PAS 3.13E-06PAS TRUE1   0   0   0
NP_000229.1 zf-CCCH 0.2067  PAS FALSE   0   1   1   0
NP_000229.1 E_raikovi_mat   0.0206  PAS FALSE   0   0   0   0
NP_000388.2 NAD_binding_1   8.21E-24NAD_binding_1   TRUE1   
0   0   0
NP_000388.2 ABM 1.40E-08NAD_binding_1   FALSE   0   0   
1   0
NP_000483.3 MMR_HSR11.98E-05MMR_HSR1TRUE1   
0   0   0
NP_000483.3 DEAD2.30E-05MMR_HSR1FALSE   0   0   
1   0
NP_000483.3 APS_kinase  1.80E-09MMR_HSR1FALSE   0   
0   1   0
NP_000483.3 CbiA0.0003  MMR_HSR1FALSE   0   0   1   0
NP_000483.3 CoaE1.28E-07MMR_HSR1FALSE   0   0   
1   0
NP_000483.3 FMN_red 4.61E-08MMR_HSR1FALSE   0   
0   1   0
NP_000483.3 Fn_bind 0.3855  MMR_HSR1FALSE   0   0   
1   0
NP_000483.3 Invas_SpaK  0.2431  MMR_HSR1FALSE   0   0   
1   0
NP_000483.3 PEP-utilizers   0.127   MMR_HSR1FALSE   0   0   
1   0
NP_000483.3 NIR_SIR_ferr0.1661  MMR_HSR1FALSE   0   0   
1   0
NP_000483.3 AAA 0.0031  MMR_HSR1FALSE   0   0   1   0
NP_000483.3 DUF448  0.0021  MMR_HSR1FALSE   0   0   1   0
NP_000483.3 CBF_beta0.1201  MMR_HSR1FALSE   0   0   
1   0
NP_000483.3 zf-C3HC40.0959  MMR_HSR1FALSE   0   0   
1   0
NP_000560.5 ig  5.69E-39ig  TRUE1   0   0   0
NP_000704.1 Epimerase   4.40E-21Epimerase   TRUE1   
0   0   0
NP_000704.1 Lipase_GDSL 6.63E-11Epimerase   FALSE   0   
0   1   0

...

this is a shorted list from one of the 10 lists I have for different 
p-values.


As you can see I have separate p-value experiments and probably need to 
calculate for each of them a separate ROC. But I don't know how to 
calculate these characteristics for the p-values.

How do I assign the predictions to each of the single p-value experiments?

I would appreciate any help

Thanks
Assa


On Tue, Aug 17, 2010 at 12:55, Claudia Beleites cbelei...@units.it 
mailto:cbelei...@units.it wrote:


Dear Assa,



I am having a problem building a ROC curve with my data using
the ROCR
package.

I have 10 lists of proteins such as attached (proteinlist.xls).
each of the

your file didn't make it to the list.



lists was calculated with a different p-value.
The goal is to find the optimal p-value for the highest number
of true
positives as well as lowaest number of false positives

Re: [R] cacheSweave / pgfSweave driver for package vignette

2010-08-13 Thread Claudia Beleites


Dear all,

Maybe we should move the discussion to r-devel? So please excuse the 
cross-posting, it is to tell people at r-help where to find the rest of the 
discussion (in case you agree with me).



I've been wondering about that, too.

Gabor, I use fake vignettes along your lines, too.
In order to provide meaningful samples, I have both bulky data and bulky 
calculations (at least too long to have any fun in running R CMD check frequently).


As I do not want to burden my package with lots ( 60 MB) of raw data in various 
file formats, two vignettes do their real work extra (and the source is 
available for separate download).


So for the development work it would be good to have caching for speed-up.
For the testing purposes of R CMD CHECK, however, the whole thing needs to be 
calculated: afaik the caching mechanism checks for changes in the respective 
chunks. Which is great for data-analysis work. However, in a package development 
scenario the changes are rather expected in the package. I suspect that the 
caching cannot check this. Thus a cached vignette does greatly reduce the 
calculation time, but also knocks out part of the testing.
This would be without concern, if the package is well behaved and does its 
testing in the tests and has the vignettes as manuals. I have to admit, though, 
that my package is not (yet) at this point.


So I personally find myself with a shell script that automatically builds all 
vignettes first, transfers some files into the package (the data sets coming 
with the package are constructed in vignettes), and then check and build the 
package.
In the end, this dependency of the package on the results of its vignettes needs 
much more calculation. I'm talking of ca. 10 - 15 min for the whole process 
(i.e. 5 - 7 min for one check cycle). This is awkward for development, but I 
think it's OK for something to be done occasionally on a nightly check on the 
server.


My conclusion is, that a cached Sweave driver should only be specified in 
certain situations. I.e. it would be very helpful for developing to do this at 
home, but I'm afraid it is not the best idea to reduce the work in checking 
the package in general (e.g. during nightly checks).
I also say this because I have been running into trouble with the nighly build 
on r-forge (due to some LaTeX packages that I thought to be fairly standard, 
which they weren't).
Another error I like to produce is to forget adding a new source file to the 
version control. Both cases are only found in checks during the nightly build on 
the server. There may be other mistakes that would be masked by the caching.


Of course, it is also not nice to keep the servers calculating examples for 
hours. I presume, however, that this case is quite rare (compared to situations 
where the regular building and checking is too long for a fluent development 
cycle), and I'd say that in this case Gabor's procedure is OK.


For my work it would be much more helpful, if R CMD CHECK had also positive 
flags (e.g. --tests as abbreviation for --no-codoc --no-examples --no-install 
--no-vignettes --no-latex)


I know hardly anything about make files and never wrote one myself. I think they 
could be helpful here to switch between the development checks and a complete 
build  check. So I'd be very curious to see some make files.


HTH,

Claudia




--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sweep / mapply question

2010-07-22 Thread Claudia Beleites

Dear list,

I have a matrix, spc, that should row-wise be interpolated:

E.g.

spc - matrix (1:1e6, ncol = 250, nrow = 4000, byrow = TRUE)
spc [1:10, 1:10]
shifts - seq_len (nrow (spc))
wl - seq_len (ncol (spc))

interpolate - function (spc.row, shift, wl)
  spline (wl + shift, spc.row, xout = wl, method = natural)$y

interpolate (spc [1,], shift = shifts [1], wl = wl) [1:10] # works


naively, I wanted to use sweep to vectorize this:

sweep (spc, 1, shifts, interpolate, wl = wl)

This doesn't work, as sweep basically repeats the STATS in the correct way, and 
hands two matrices (arrays) to the function. While this is fine and fast for + - 
* / etc., but doesn't help my interpolation.



Of course, I can calculate what I need:

system.time (
t (mapply (interpolate, as.data.frame (t (spc)),
shift = shifts,
   MoreArgs= list (wl = wl)))
)

system.time (
sapply (1 : nrow (spc),
function (i) interpolate (spc [i, ], shifts [i], wl = wl))
)

tmp - spc
system.time ({
for (i in 1 : nrow (spc))
 tmp [i,] - interpolate (spc [i, ], shifts [i], wl = wl)
})

On my computer the for loop is fastest (slightly faster than sapply, a bit less 
than half of the time of mapply).


However, as I expect this to be a fairly common situation,

I want to share this experience, and
the question is: is there a better / faster / nicer / more elegant way to do 
this?

Comments?

Thanks,

Claudia






--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave: infelicities with lattice graphics

2010-07-15 Thread Claudia Beleites

Dear Michael,

I know this situation from writing vignettes and I usually cheat a bit

I redefine the functions along these lines:

plotmap - function (...) print (hyperSpec:::plotmap (...))

   (plotmap is a lattice-function for hyperSpec objects)

plotmap can tehn be used without the print in the vignettes - this works 
fine for almost all cases. Only, if you have a structure that one of the 
redefined functions call another one of them, you get e.g. a pdf with 2 
pages.


Have a look at the vignettes and particularly the file vignettes.defs 
of package hyperSpec 
(https://r-forge.r-project.org/scm/viewvc.php/Vignettes/?root=hyperspec).


BTW: I find it polite to mention that some definitions etc. are executed 
silently and where people can find it.


Cheers,

Claudia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Backslash \ in string

2010-07-15 Thread Claudia Beleites

Jannis,


You are right, it does not seem to matter. When the R string contains
two \\, xtable prints it as only one \. I should have looked into the
Latex output before posting!



'\\' is just _one_ character in R:
  nchar (\\)
[1] 1

Just like '\n' etc.

It is just the `print`ed (as opposed to cat) output that mislead you:
the print function displays a bunch of special characters in their 
backslash-escaped fashion:


 print (someting\tblah\\blubb\n)
[1] someting\tblah\\blubb\n
 cat (someting\tblah\\blubb\n)
sometingblah\blubb
 print (\12)
[1] \n

Claudia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave: infelicities with lattice graphics

2010-07-15 Thread Claudia Beleites

Dear David,

you can use Gin:

\setkeys{Gin}{width=0.5\linewidth}

just before the chunk that actually produces the figure.



and, cool, I hadn't realized, that

fig = TRUE, echo = FALSE=
print (
chunk-with-lattice-function
   )
@

works.

with {} inside the print there can be even more than one statement in 
the chunk-with-lattice-function.
However, that's not a good idea. There may be surprises due to the 
question how often the chunk-with-lattice-function is actually executed.


Claudia


I have wondered about this too.  The approach I use isn't pretty but does
have a couple of advantages - there is only one set of code to run and I
have control over the figure size.

The first part of the code below is what is shown in the document (but not
run), and the second part actually runs the code and makes the plot.

no2hist, eval=FALSE=
hist(mydata$no2)
no2hist1, echo = FALSE, results=hide=
pdf(no2hist.pdf)
no2hist
dev.off()
@
\begin{figure}
 \centering
 \includegraphics[width=0.5\textwidth]{no2hist}
 \caption{The caption.}
\label{fig:hist}
\end{figure}

I'd be interested to know if there are neater ways of doing this.

Regards,

David


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge

2010-06-15 Thread Claudia Beleites

apropos (merge)

Cheers, Claudia

n.via...@libero.it wrote:

Dear list I have two different data frame
the first one is like this


CLUSTER year  variablevalue
m1  2006 EC01  4
m1  2007 EC01  5
m2  2006  EC01 42
m2  2007  EC019


and other variables this data frame has 800 number of rows and 14 number of 
columns


the second data frame has more or less the same structure
CLUSTER year
m1   2005
m1  2006
m1  2007
m2   2005
m2  2006
m2  2007

This data frame has 548833 number of rows and 18 number of columns

What im trying to do is to merge the Year columns of the second data frame 
with the whole First data frame in order to get the following new data frame

CLUSTER year  variablevalue
m1  2005 EC01 /
m1  2006 EC01 4
m1  2007 EC01 5
m2  2005EC01/
m2  2006  EC0142
m2  2007  EC019

Someone could help me???
Thanks a lot

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Claudia Beleites

Dear all,

(this first part of the email I sent to John earlier today, but forgot to put it 
to the list as well)

Dear John,

 Hi, this is not R technical question per se. I know there are many excellent
 statisticians in this list, so here my questions: I have dataset with ~1800
 observations and 50 independent variables, so there are about 35 samples per
 variable. Is it wise to build a stable multiple logistic model with 50
 independent variables? Any problem with this approach? Thanks

First: I'm not a statistician, but a spectroscopist.
But I do build logistic Regression models with far less than 1800 samples and 
far more variates (e.g. 75 patients / 256 spectral measurement channels). Though 
I have many measurements per sample: typically several hundred spectra per sample.


Question: are the 1800 real, independent samples?

Model stability is something you can measure.
Do a honest validation of your model with really _independent_ test data and 
measure the stability according to what your stability needs are (e.g. stable 
parameters or stable predictions?).




(From here on reply to Joris)

 Marcs explanation is valid to a certain extent, but I don't agree with
 his conclusion. I'd like to point out the curse of
 dimensionality(Hughes effect) which starts to play rather quickly.
No doubt.

 The curse of dimensionality is easily demonstrated looking at the
 proximity between your datapoints. Say we scale the interval in one
 dimension to be 1 unit. If you have 20 evenly-spaced observations, the
 distance between the observations is 0.05 units. To have a proximity
 like that in a 2-dimensional space, you need 20^2=400 observations. in
 a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
 distance between your observations is important, as a sparse dataset
 will definitely make your model misbehave.

But won't also the distance between groups grow?
No doubt, that high-dimensional spaces are _very_ unintuitive.

However, the required sample size may grow substantially slower, if the model 
has appropriate restrictions. I remember the recommendation of at least 5 
samples per class and variate for linear classification models. I.e. not to get 
a good model, but to have a reasonable chance of getting a stable model.


 Even with about 35 samples per variable, using 50 independent
 variables will render a highly unstable model,
Am I wrong thinking that there may be a substantial difference between stability 
of predictions and stability of model parameters?


BTW: if the models are unstable, there's also aggregation.

At least for my spectra I can give toy examples with physical-chemical 
explanation that yield the same prediction with different parameters (of course 
because of correlation).


 as your dataspace is
 about as sparse as it can get. On top of that, interpreting a model
 with 50 variables is close to impossible,
No, not necessary. IMHO it depends very much on the meaning of the variables. 
E.g. for the spectra a set of model parameters may be interpreted like spectra 
or difference spectra. Of course this has to do with the fact, that a parallel 
coordinate plot is the more natural view of spectra compared to a point in so 
many dimensions.


 and then I didn't even start
 on interactions. No point in trying I'd say. If you really need all
 that information, you might want to take a look at some dimension
 reduction methods first.

Which puts to my mind a question I've had since long:
I assume that all variables that I know beforehand to be without information are 
already discarded.
The dimensionality is then further reduced in a data-driven way (e.g. by PCA or 
PLS). The model is built in the reduced space.


How much less samples are actually needed, considering the fact that the 
dimension reduction is a model estimated on the data?
...which of course also means that the honest validation embraces the 
data-driven dimensionality reduction as well...


Are there recommendations about that?


The other curious question I have is:
I assume that it is impossible for him to obtain the 10^xy samples required for 
comfortable model building.

So what is he to do?


Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROC curve

2010-05-24 Thread Claudia Beleites

Dear Changbin,


I want to know how to select the optimal decision threshold from the ROC
curve? 

Depends on what optimal means. I think there are a bunch of different criteria 
used:

- point closest to the ideal model
- point furthest from the guessing model
- these criteria may include costs, i.e. a FP/FN ratio != 1
- ...

More practical:
If you use ROCR: the help of the performance class explains the slots in the 
object. You find there the data of the curve, incl. the thresholds.



At what threshold will give the highest accuracy?

to know that, optmize the accuracy as function of the threshold.

Remember: finding the optimal threshold from a ROC curve is a data-driven 
optimization. You need to validate the resulting model with independent test 
data afterwards.




--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on getting help from manuals

2010-03-12 Thread Claudia Beleites

ManInMoon wrote:

Hi,

A number of people have suggested I read the manuals...

Could someone help me by telling me where the primary start point is please?

In R, type
help.start ()
this should open a browser window with links to
- the packages
- the manuals
- a search engine
Please note: this is written in section 1.7 Getting help with functions and 
features of Introduction to R

In the same section, you learn about
help.search

Note also:
? help
leads you to the man page describing the help system. In section see also you 
find a list of other useful commands to find help


If you look them up and look a again what alternatives they suggest and actually 
try them out (again with topic help) you will come across all informations 
about finding help on R topics that is written in this email.



- There also exists apropos ().
- In addition, e.g. reading this mailing list, you learn about the sos package.
- You can also use the internet resources: on r-project.org - manuals

- I personally use a lot:
http://finzi.psych.upenn.edu/cgi-bin/namazu.cgi (which is where RSiteSearch () 
gets you). You can nicely decide where to search: documentation of R and CRAN 
packages, and/or the mailing list archives.


Homework try out  read the results of:
RSiteSearch (help)



For example, I am interested in writing functions with variable number of
arguments - where should I start to look?

An introduction to R only show a brief example - with no pointer to where
to find further data.

I can't do ?xxx from R console in most cases - as I don't know what the
function name is that I am looking for!!!

Then do
??xxx
or
???xxx (needs sos)
or
RSiteSearch (xxx)
or
apropos (xxx)
...
which you could have found out by reading
? help




People have helped me find substitute to get some metadata out - BUT how
could I have found that without guidance from nice people in Nabble?

Any help on this very much appreciated.


Sometimes it _is_ difficult to find the correct search terms.
However, I think that people in this list will appreciate if you
- show that you did search before asking, and also tell then with which terms 
you did the search

- particularly for questions about the meaning of commands:
  Try them out!
  Put the command into pieces and look what each piece does
- people will appreciate if you ask what the correct search terms are for your 
problem (as opposed to ask them doing your homework)
  Learning R is learning a language. Including vocabulary (i.e. terms for the 
different concepts).
  Asking for help with searching is like asking How do you say in R for 
concept xyz? instead of Could anyone do the translation I got as homework?


HTH,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting multiple matrix-values using a single command

2010-03-12 Thread Claudia Beleites
use a matrix of n x 2 to index. For details: sec. 5.3 Index matrices in the 
introduction.


HTH Claudia

Nils Rüfenacht wrote:

Dear all!

I'm trying to get multiple values from a matrix by using a single command.

Given a matrix A

A - matrix(seq(1,9),nrow=3,ncol=3)

How can I get e.g. the values A[1,2] = 4 and A[3,3] = 9 with a single 
command and without using any loop? My first idea was to generate a row- 
and a column vector for the indices, i.e. c(1,3) indicating row number 1 
(for A[1,2]) and row number 3 (for A[3,3]) and similar for 
column-indices. Then I've tried to call


A[c(1,3),c(2,3)]

but instead of 4 , 9 the result is

[,1] [,2]
[1,]47
[2,]69

Any suggestions?

Regards, Nils

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame question

2010-03-12 Thread Claudia Beleites

Andy,

Did you run into any kind of trouble?
I'm asking because I'm maintaining a package for spectroscopic data that heavily 
uses I (spectra.matrix) ...


However, once you have the matrix safe inside the data.frame, you can delete the 
 AsIs:


 a - matrix (1:9, 3)
 str (a)
 int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
 df - data.frame (a = I (a))
 str (df)
'data.frame':   3 obs. of  1 variable:
 $ a: 'AsIs' int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
 df$a - unclass (df$a)
 str (df)
'data.frame':   3 obs. of  1 variable:
 $ a: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
 df$a
 [,1] [,2] [,3]
[1,]147
[2,]258
[3,]369
 dim (df)
[1] 3 1

However, I don't know whether something can now trigger a conversion to 
data.frame that the AsIs would have stopped.


Cheers,

Claudia

apjawor...@mmm.com wrote:

Hi,

I have the following question about creating data frames.  I want to 
create a data frame with 2 components: a vector and a matrix.


Let me use a simple example:

y - rnorm(10)
x - matrix(rnorm(150), nrow=10)

Now if I do

dd - data.frame(x=x, y=y)

I get a data frame with 16 colums, but if, according to the documentation, 
 I do


dd - data.frame(x=I(x), y=y)

then str(dd) gives:

'data.frame':   10 obs. of  2 variables:
 $ x: AsIs [1:10, 1:15] 0.700073 -0.44371 -0.46625 
0.977337 0.509786 ...

 $ y: num  0.4676 -1.4343 -0.3671 0.0637 -0.231 ...

This looks and works OK.

Now, there exists a CRAN package called pls.  It has a yarn data set in 
it.



data(yarn)
str(yarn)

'data.frame':   28 obs. of  3 variables:
 $ NIR: num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.1 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ density: num  100 80.2 79.5 60.8 60 ...
 $ train  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

This looks almost the same, except the matrix component in my example has 
the AsIs instead of num.


Is this just some older behavior of the data.frame function producing this 
difference?  If not, how can I get my data frame (dd) to look like yarn?


I read the help pages for data.frame and as.data.frame and found this 
paragraph


If a list is supplied, each element is converted to a column in the data 
frame. Similarly, each column of a matrix is converted separately. This 
can be overridden if the object has a class which has a method for 
as.data.frame: two examples are matrices of class model.matrix (which 
are included as a single column) and list objects of class POSIXlt which 
are coerced to class POSIXct. 

If I do 


methods(as.data.frame)
 [1] as.data.frame.aovproj*as.data.frame.array 
 [3] as.data.frame.AsIsas.data.frame.character 
 [5] as.data.frame.complex as.data.frame.data.frame 
 [7] as.data.frame.Dateas.data.frame.default 
 [9] as.data.frame.difftimeas.data.frame.factor 
[11] as.data.frame.ftable* as.data.frame.integer 
[13] as.data.frame.listas.data.frame.logical 
[15] as.data.frame.logLik* as.data.frame.matrix 
[17] as.data.frame.model.matrixas.data.frame.numeric 
[19] as.data.frame.numeric_version as.data.frame.ordered 
[21] as.data.frame.POSIXct as.data.frame.POSIXlt 
[23] as.data.frame.raw as.data.frame.table 
[25] as.data.frame.ts  as.data.frame.vector 

so it looks like there is a matrix method for as.data.frame.  The question 
then is how can I override the default behavior for the matrix object 
(converting columns separately).



Any hint will be appreciated,

Andy


__
Andy Jaworski
518-1-01
Process Laboratory
3M Corporate Research Laboratory
-
E-mail: apjawor...@mmm.com
Tel:  (651) 733-6092
Fax:  (651) 736-3122
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame question

2010-03-12 Thread Claudia Beleites

apjawor...@mmm.com wrote:


Thanks for the quick reply.

No, I did not run into any problems so far.  I have been using the PLS 
package and the modelling functions seem to work just fine.


In fact, even if I let the data.frame convert the x matrix to separate 
column, the y ~ x modeling syntax still seems to work fine.



I don't see that behaviour:

rm (x)  # make sure there is no leftover x in the workspace
mat - matrix (1 : 9, 3)
df - data.frame (y = 1 : 3, x = mat)
str (df)
df
coef (plsr (y ~ x, data = df, ncomp = 1)) # error
coef (plsr (y ~ x.1 + x.2 + x.3, data = df, ncomp = 1)) # works

df$x - I (-mat)
str (df)
df
coef (plsr (y ~ x, data = df, ncomp = 1)) # works

Claudia

PS: May I be curious: what kind of data do you analyze with PLS?



Thanks again,

Andy

__
Andy Jaworski
518-1-01
Process Laboratory
3M Corporate Research Laboratory
-
E-mail: apjawor...@mmm.com
Tel:  (651) 733-6092
Fax:  (651) 736-3122


From:   Claudia Beleites cbelei...@units.it
To: apjawor...@mmm.com
Cc: r-help@r-project.org
Date:   03/12/2010 02:13 PM
Subject:Re: [R] Data frame question





Andy,

Did you run into any kind of trouble?
I'm asking because I'm maintaining a package for spectroscopic data that 
heavily

uses I (spectra.matrix) ...

However, once you have the matrix safe inside the data.frame, you can 
delete the

 AsIs:

  a - matrix (1:9, 3)
  str (a)
 int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
  df - data.frame (a = I (a))
  str (df)
'data.frame': 3 obs. of  1 variable:
 $ a: 'AsIs' int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
  df$a - unclass (df$a)
  str (df)
'data.frame': 3 obs. of  1 variable:
 $ a: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
  df$a
 [,1] [,2] [,3]
[1,]147
[2,]258
[3,]369
  dim (df)
[1] 3 1

However, I don't know whether something can now trigger a conversion to
data.frame that the AsIs would have stopped.

Cheers,

Claudia

apjawor...@mmm.com wrote:
  Hi,
 
  I have the following question about creating data frames.  I want to
  create a data frame with 2 components: a vector and a matrix.
 
  Let me use a simple example:
 
  y - rnorm(10)
  x - matrix(rnorm(150), nrow=10)
 
  Now if I do
 
  dd - data.frame(x=x, y=y)
 
  I get a data frame with 16 colums, but if, according to the 
documentation,

   I do
 
  dd - data.frame(x=I(x), y=y)
 
  then str(dd) gives:
 
  'data.frame':   10 obs. of  2 variables:
   $ x: AsIs [1:10, 1:15] 0.700073 -0.44371 -0.46625
  0.977337 0.509786 ...
   $ y: num  0.4676 -1.4343 -0.3671 0.0637 -0.231 ...
 
  This looks and works OK.
 
  Now, there exists a CRAN package called pls.  It has a yarn data set in
  it.
 
  data(yarn)
  str(yarn)
  'data.frame':   28 obs. of  3 variables:
   $ NIR: num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.1 ...
..- attr(*, dimnames)=List of 2
.. ..$ : NULL
.. ..$ : NULL
   $ density: num  100 80.2 79.5 60.8 60 ...
   $ train  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 
  This looks almost the same, except the matrix component in my example 
has

  the AsIs instead of num.
 
  Is this just some older behavior of the data.frame function producing 
this

  difference?  If not, how can I get my data frame (dd) to look like yarn?
 
  I read the help pages for data.frame and as.data.frame and found this
  paragraph
 
  If a list is supplied, each element is converted to a column in the data
  frame. Similarly, each column of a matrix is converted separately. This
  can be overridden if the object has a class which has a method for
  as.data.frame: two examples are matrices of class model.matrix (which
  are included as a single column) and list objects of class POSIXlt 
which

  are coerced to class POSIXct.
 
  If I do
 
  methods(as.data.frame)
   [1] as.data.frame.aovproj*as.data.frame.array
   [3] as.data.frame.AsIsas.data.frame.character
   [5] as.data.frame.complex as.data.frame.data.frame
   [7] as.data.frame.Dateas.data.frame.default
   [9] as.data.frame.difftimeas.data.frame.factor
  [11] as.data.frame.ftable* as.data.frame.integer
  [13] as.data.frame.listas.data.frame.logical
  [15] as.data.frame.logLik* as.data.frame.matrix
  [17] as.data.frame.model.matrixas.data.frame.numeric
  [19] as.data.frame.numeric_version as.data.frame.ordered
  [21] as.data.frame.POSIXct as.data.frame.POSIXlt
  [23] as.data.frame.raw as.data.frame.table
  [25] as.data.frame.ts  as.data.frame.vector
 
  so it looks like there is a matrix method for as.data.frame.  The 
question

  then is how can I override the default behavior for the matrix object
  (converting columns separately).
 
 
  Any hint will be appreciated,
 
  Andy
 
 
  __
  Andy Jaworski
  518-1-01
  Process Laboratory
  3M Corporate Research Laboratory
  -
  E-mail: apjawor...@mmm.com
  Tel:  (651) 733-6092

Re: [R] colname of ... arguments

2010-03-11 Thread Claudia Beleites

what about:

niceplot-function(...) {
arg.names - as.list (match.call () [-1])
for (a in seq_along (arg.names))
 cat (as.character (as.expression (arg.names [[a]])), \n\n)
}

niceplot (greeneye, log (greeneye), 1:3)

note that this works also if there is no greeneye

Disclaimer: I don't know whether I'm suggesting something bad, but I'd like to 
learn about better ways. So I really appreciate comments.


Claudia

ManInMoon wrote:

That is quite helpful David

niceplot-function(...) {
 parms=list(...)
 for (x in parms) {
 xname - paste(deparse(substitute(x), 500), collapse = \n)
   cat(xname)
 }
}


GreenEyes=c(1,2,3,4)
niceplot(GreenEyes)

c(1, 2, 3, 4)

BUT what I want is:
  GreenEyes=c(1,2,3,4)

niceplot(GreenEyes)

GreenEyes

I will use the vector for plotting too, but I need it's name to produce a
legend automatically




On 10 March 2010 23:32, David Scott-6 [via R] 
ml-node+1588213-620034400-180...@n4.nabble.comml-node%2b1588213-620034400-180...@n4.nabble.com

wrote:



ManInMoon wrote:


I have writtn a function where I pass a variable number of arguments.

I They are vectors and I can manipulate them, but I need to get hold of

the

name for a legend.

niceplot-function(...) {
   parms=list(...)

  for (x in parms) {
DoSomethingWith(x)
  }

}

BUT how how can I get something like namestring(...) of nameofvector(x)?


I use the following syntax to get the name of a data object to use in a
title, label or whatever.

xname - paste(deparse(substitute(x), 500), collapse = \n)

This is taken from hist.default so at least has some provenance as an
appropriate method.

David Scott

--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email: [hidden 
email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1588213i=0,
 Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
[hidden 
email]http://n4.nabble.com/user/SendEmail.jtp?type=nodenode=1588213i=1mailing
 list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
 View message @
http://n4.nabble.com/colname-of-arguments-tp1588146p1588213.html
To unsubscribe from colname of ... arguments, click here (link removed) ==.








--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] modifying the dots argument - how?

2010-03-11 Thread Claudia Beleites

Mark,

dots - list (...) gives you a list with the dots arguments

if innerFoo expects not a list but normal arguments, do.call is your friend:
do.call (innerFoo, dots)

HTH  schönen Tag,

Claudia


Mark Heckmann wrote:

Is there a way to modify the dots-argument?
Let's consider I have a function that passes on its ... arguments to 
another function.

For some reason I know, that some elements in ... will cause problems.
Can I modify the ... structure somehow, e.g. delete elements?

foo - function(...){
innerFoo - function(...){
   
}


AT THIS POINT I WANT TO MODIFY THE CONTENT OF ... BEFORE IT IS 
PASSED ON


innerFoo(...)
}

Thanks,
Mark

–––
Mark Heckmann
Dipl. Wirt.-Ing. cand. Psych.
Vorstraße 93 B01
28359 Bremen
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com




--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with lattice boxplots...

2010-03-01 Thread Claudia Beleites

you can use parameters:

trellis.par.set (box.rectangle = modifyList (trellis.par.get (box.rectangle), 
list (col = black)))


bwplot(y~x, data=ex, pch = |)


I think the others go along the same line. Look into panel.bwplot to see what 
parameters are used to produce what.


HTH Claudia


Kim Jung Hwa wrote:

Hi All,

I need a small help with following code: I'm trying to convert dashed
lines to regular ones; and changing default blue border color to say
black... but I'm doing it wrong and its not working. Can anyone help
please. Thanks,

Code:
require(lattice)
ex - data.frame(x=1:10, y=rep(c(A,B), 5))
bwplot(y~x, data=ex,
  panel=function(x,y,...) {
   panel.bwplot(x, y, pch=|, border=black, lty=1,...)
  }
)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R help question: How can we enable useRs to contribute corrections to help files faster ?

2010-02-28 Thread Claudia Beleites

What about the short-term solution of having a function
package.bug.report - along the lines of bug.report?

E.g. see attachment

Claudia


--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Valerio 2
I-34127 Trieste
ITALY

email: cbelei...@units.it
phone: +39 (0 40) 5 58-34 68
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-02-26 Thread Claudia Beleites

Dear Patrick (and all)

I'm now working with R a couple of years, before working mostly in Matlab
Lazy  impatient is both true for me :-)


* What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?


 * What documents helped you the most in this
 initial phase?

 I especially want to hear from people who are
 lazy and impatient.

 Feel free to write to me off-list.  Definitely
 write off-list if you are just confirming what
 has been said on-list.


Stumbling:

* It took me long to remember
getwd () and setwd () (instead of pwd and cd / chdir or the like)

* I still discover very useful functions that I would have needed for a long 
time. Latest discoveries: mapply and ave
I knew aggregate. And was always a little angry that it needs a grouping list. I 
even decided that the aggregate method for my hyperSpec class should work with 
factors as well as with lists. Some day I read in this mailing list that ave 
does what I need...
I like the crosslinks in the help (see also) very much. Maybe I rely too much on 
them. So: not lazy today, I attach a patch for aggregate.Rd that adds the 
seealso to ave.


Reading this mailing list once in a while gives me nice new ideas. However,  50 
emails / d is somewhat scary for me, so I read only occasionally.


* Vecorization: I like the *apply functions.
but I'd really appreciate a comprehensive page/vignette here.
I remember that it took me a while to realize that the rule for MARGIN in sweep 
is use the same number as in the apply that created the STATS


* I never found the pdf manuals helpful (help pages are easier to access, and 
there is nothing in the pdf that the help doesn't have.

At the beginning I expected the pdf manual to be something that the vignettes 
are.

* I did not arrive at a comfortable debugging cycle for a long time. But now 
there's the debug package and setBreakpoint and I'm happy


* As I now start teaching I notice that many students react to error messages 
uhh! an error! (panic). Few realizing that the error message actually gives 
information on what went wrong.
A list with common causes of different error messages would be helpful here, I 
think.
In case someone agrees: I started one at the Wiki: 
http://rwiki.sciviews.org/doku.php?id=tips:errormessages



Cheers,

Claudia



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help me about data format.

2009-07-07 Thread Claudia Beleites
 I want to make a matrix and vector in same data frame.

You need to protect your matrix by I ()

Btw: I'm actually writing a package for handling spectra that I plan to 
release in some weeks. 
It contains a vignette showing how pls calibration can be done. 
If you want to give it a try, let me know.

Claudia Beleites

-- 
Claudia Beleites
DMRN, Università degli Studi di Trieste 
Via Alfonso Valerio 6/a 
I-34127 Trieste

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 and lattice

2008-12-16 Thread Claudia Beleites
Am Dienstag 16 Dezember 2008 17:13:33 schrieb Wayne F:
 stephen sefick wrote:
  yes a parallel coordinates plot- I understand that it is for
  multivariate data, but I am having a hard time figuring out what it is
  telling me.  Thanks for your help.

 In the lattice book, the author mentions that static parallel plots aren't
 very useful, in general.
While for some data they are just natural: e.g. when spectra are treated as 
multidimensional data. Then the parallel coordinate plot just gives you the 
spectrum. 
Of course, in this situation it is maybe the treatment as high-dimensional 
data that is somewhat weird for spectra. 

However, this offers a way, that might help understanding what's going on. 

I have a data set of p dimensions. E.g. spectra measured with p channels.
Now, we can either think of such a spectrum as a point in p-d. E.g. a spectrum 
consisting of red, green, blue intensity is at a certain point in rgb-space.

On the other hand, here the p dimensions have something to do with each other 
(e.g. an intrinsic order, let's say, by the wavelength). So it does make sense 
to plot the intensity over the p dimensions. That's the parallel coordinate 
plot. 

What you can tell from such a plot, depends very much on your data, and how 
you treated it. 

Claudia



-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Claudia Beleites
Dear list,

 Learning to use the power of R's indexing and functios like head() and
 tail() (which are just syntactic sugar) will probably lead you not to miss
 this.
However, how do I exclude the last columns of a data.frame or matrix (or, in 
general, head and tail for given dimensions of an array)?

I.e. something nicer than 
t (head (t (x), -n))
for excluding the last n columns of matrix x

THX, Claudia


-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Claudia Beleites
Am Freitag 12 Dezember 2008 13:10:20 schrieb Patrick Burns:
 How about:

 x[, -seq(to=ncol(x), length=n)]
Doing it is not my problem. I just agree with Mike in that I would like if I 
could do shorter than: 

x[, 1 : (ncol(x) - n)] 

which I btw prefer to your solution.

Also, I don't have a problem writing generalized versions of head and tail to 
work along other/more dimensions. Or a combined function, taking first and last 
arguments. 

Still, they would not be as convenient to use as matlab's:
3 : end - 4
which btw. also does not need parentheses.

I guess the general problem is that there is only one thing with integers that 
can easily be (ab)used as a flag: the negative sign. 

But there are (at least) 2 possibly useful special ways of indexing: 
- exclusion (as in R)
- using -n for end - n (as in perl)

Now we enjoy having a shortcut for exclusion (at least I do), but still feel 
that marking from the end would be useful.

As no other signs (in the sense of flag) are available for integers, we won't 
be able to stop typing somewhat more in R.

Wacek:
 x[3:]
 instead of
 x[3:length(x)]
 x[3:end]
I don't think that would help: 
what to use for end - 3 within the convention that negative values mean 
exclusion?




--- now I start dreaming ---

However, it is possible to define new binary operators (operators are great for 
lazy typing...).

Let's say %:% should be a new operator to generate proper indexing sequences 
to be used inside [ :
e.g. an.array [ 1:3, -2 %:% -5, ...]

If we now find an.array which is x inside [ (and also inside [[) - which is 
possible but maybe a bit fiddly

and if we can also find out which of the indices is actually evaluated (which I 
don't know how to do)

then we could use something* as a flag for from the end and calculate the 
proper sequence.

something* could e.g. be 
either an attribute to the operators (convenient if we can define an unary 
operator that allows setting it, e.g. § 3 [§ is the easy-to-type sign on my 
keyboard that is not yet used...])

or i (the imaginary one) if there is no other convenient unary operator e.g. 
3i

= 
easy part of the solution:
make.index - function (x, along.dim = 1, from, to){
if (is.null (dim (x)))
   dim - length (x)
else 
  dim - dim (x)[along.dim]

if (is.complex (from)){
from - dim - from # 0i means end
## warning if re (from) != 0 ?
}
if (is.complex (to)){
to - dim - to # 0i means end
## warning if re (to) != 0 ?
}
   
from : to
}

%:% - function (e1, e2)  ## using a new operator does not mess up :
make.index (x = find.x (), along.dim = find.dim (), e1, e2)

now, the heavy part are the still missing find.x () and find.dim () functions...

I'm not sure whether this would be worth the work, but maybe someone is around 
who just knows how to do this.


Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Re: The end of Matlab (sorry, I messed up a sentence)

2008-12-12 Thread Claudia Beleites
--  Weitergeleitete Nachricht  --

Betreff: Re: [R] The end of Matlab
Datum: Freitag 12 Dezember 2008
Von: Claudia Beleites cbelei...@units.it
An: r-help@r-project.org

Am Freitag 12 Dezember 2008 13:10:20 schrieb Patrick Burns:
 How about:

 x[, -seq(to=ncol(x), length=n)]
Doing it is not my problem. I just agree with Mike in that I would like if I 
could do shorter than: 

x[, 1 : (ncol(x) - n)] 

which I btw prefer to your solution.

Also, I don't have a problem writing generalized versions of head and tail to 
work along other/more dimensions. Or a combined function, taking head-n and 
tail-n arguments. 

Still, they would not be as convenient to use as matlab's:
3 : end - 4
which btw. also does not need parentheses.

I guess the general problem is that there is only one thing with integers that 
can easily be (ab)used as a flag: the negative sign. 

But there are (at least) 2 possibly useful special ways of indexing: 
- exclusion (as in R)
- using -n for end - n (as in perl)

Now we enjoy having a shortcut for exclusion (at least I do), but still feel 
that marking from the end would be useful.

As no other signs (in the sense of flag) are available for integers, we won't 
be able to stop typing somewhat more in R.

Wacek:
 x[3:]
 instead of
 x[3:length(x)]
 x[3:end]
I don't think that would help: 
what to use for end - 3 within the convention that negative values mean 
exclusion?




--- now I start dreaming ---

However, it is possible to define new binary operators (operators are great for 
lazy typing...).

Let's say %:% should be a new operator to generate proper indexing sequences 
to be used inside [ :
e.g. an.array [ 1:3, -2 %:% -5, ...]

If we now find an.array which is x inside [ (and also inside [[) - which is 
possible but maybe a bit fiddly

and if we can also find out which of the indices is actually evaluated (which I 
don't know how to do)

then we could use something* as a flag for from the end and calculate the 
proper sequence.

something* could e.g. be 
either an attribute to the operators (convenient if we can define an unary 
operator that allows setting it, e.g. § 3 [§ is the easy-to-type sign on my 
keyboard that is not yet used...])

or i (the imaginary one) if there is no other convenient unary operator e.g. 
3i

= 
easy part of the solution:
make.index - function (x, along.dim = 1, from, to){
if (is.null (dim (x)))
   dim - length (x)
else 
  dim - dim (x)[along.dim]

if (is.complex (from)){
from - dim - from # 0i means end
## warning if re (from) != 0 ?
}
if (is.complex (to)){
to - dim - to # 0i means end
## warning if re (to) != 0 ?
}
   
from : to
}

%:% - function (e1, e2)  ## using a new operator does not mess up :
make.index (x = find.x (), along.dim = find.dim (), e1, e2)

now, the heavy part are the still missing find.x () and find.dim () functions...

I'm not sure whether this would be worth the work, but maybe someone is around 
who just knows how to do this.


Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---
-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Claudia Beleites
I just realized that my idea of doing something without going into the 
extraction functions itself won't work 
:-( it was a nice dream, though.

The reason is that there is no general way to find out what the needed length 
is: At least I'm just writing a class where 2 kinds of columns are involved. I 
don't give a dim attribute, though. But I could, and then: how to know how it 
should be interpreted?

 on the other hand, another possible solution would be to have ':' mean,
 inside range selection expressions, not the usual sequence generation,
 but rather specification of start and end indices:
...
 this is daydreaming, of course, because such modifications would break
 much old code,
nothing would break if some other sign instead of : would be used. Maybe 
something like end...

 and the benefit may not outweigh the effort.
This might be true in any case.

If I only think of how many lines of nrow, ncol, length  Co I could have 
written instead of posting wrong proposals

Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] character count

2008-12-12 Thread Claudia Beleites

nchar (c(convert this to 47 because it has 47 characters, this one has 26 
characters, 13 characters))

HTH Claudia
-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The end of Matlab

2008-12-12 Thread Claudia Beleites
 evens()  last(5)
wouldn't x[evens()][last(5)] do the  already?

or is different, though.

Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] for loop query

2008-12-09 Thread Claudia Beleites
Hi,

 Why isn't my loop incrementing i - the outer loop to 2 and then resetting
 j=3?
It is. It runs out of bounds with j  26

 Am I missing something obvious?
   for (i in 1:25)
  + {
  +   for (j in i+1:26)
You miss parentheses.

i + 1 : 26  is i + (1 : 26) as the vector 1 :26 is calculated first

what happens is that for i = 1 j goes over 2 : 27, with i = 2 over 3 : 28, ...

what you want is (i + 1) : 26:

for (i in 1 : 25)
   for (j in (i + 1) : 26)
  cat (i, j, \n)

HTH Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 1-Pearson's R Distance

2008-11-27 Thread Claudia Beleites
Hi Rodrigo, 

afaik, (1 - r_Pearson)/2 is used rather than 1 - r_Pearson. This gives a 
distance measure ranging between 0 and 1 rather than 0 and 2. But after all, 
dies does not change anything substantial.
see e.g. Theodoridis  Koutroumbas: Pattern Recognition. 

I didn't know of the proxy package, but the calculation it straightforward 
(though a bit wasteful I suspect: first the whole matrix is produced, and 
as.dist cuts it down again to a triangular matrix):

as.dist (0.5 - cor (t(x) / 2)) 

Take care wheter you want to use x or t(x).

HTH Claudia



-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simplify this instruction

2008-11-19 Thread Claudia Beleites
in addition to %in% and depending on what the general principle behind the 
setup is: 

you may want to have a look into switch (e.g. if there happen to be Cs 
and Ds...).

or, of course you can check for B being between 0 and 9 rather than being the 
respective integers: 
ifelse ((B  0)  (B = 9)), A, B)

HTH
Claudia

 Is there a way to simplify this instruction:
 ifelse(B==0,A,
 ifelse(B==1,A,
 ifelse(B==2,A,
 ifelse(B==3,A,
 ifelse(B==4,A,
 ifelse(B==5,A,
 ifelse(B==6,A,
 ifelse(B==7,A,
 ifelse(B==8,A,
 ifelse(B==9,A,B))

 i am looking for something like this:

 ifelse(B==(0:9),A,B)

 Best regards



-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fourier Transform with irregularly spaced x

2008-11-03 Thread Claudia Beleites
Dear all,

I work with (vibrational) spectra: some kind of intensity (I) over frequency 
(nu), wavelength or the like.
I want to do fourier transform for interpolation, smoothing, etc. 

My problem is that the spectra are often irregularly spaced in nu: the 
difference between 2 neighbouring nu varies across the spectrum, and data 
points may be missing. 

Searching for discrete fourier transform I found lots of information and 
functions - but I didn't see anything that just works with irregularly spaced 
signals: all functions I found take only the signal, not its x-axis. 

Where should I look? 
Or am I lacking some math that tells how to do without the frequency axis?

Thanks a lot for your help,

Claudia
 

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fourier Transform with irregularly spaced x

2008-11-03 Thread Claudia Beleites
  Try http://finzi.psych.upenn.edu/R/library/nlts/html/spec.lomb.html or
 http://finzi.psych.upenn.edu/R/library/cts/html/spec.ls.html (do
 RSiteSearch(Lomb periodogram)  --
 the Lomb periodogram does a discrete (although not fast) Fourier
 transform of unevenly sampled (1D/time-series) data, accounting for
 the sampling distribution of points (which will the bias the results
 if you try to do a naive Fourier sum).
Thanks Ben, that looks like a good start point. 

Stephen, my aim are neither spline nor linear approximation but something in 
the line of matlab's interpfft

I do have the vibrational spectrum. Such spectra are frequently computed by ft 
from their (measured) interferograms. I.e. if you use an FT-spectrometer. 
However, the spectra can also be measured directly with a dispersive 
instrument. The difference between neighbouring frequencies of such spectra 
varies over the spectrum. E.g. I measure from 600 cm^-1 to 1800 cm^-1: at 600 
cm^-1 I have a data point spacing of 1.04 cm^-1, while at 1800 cm^-1 it is 
only 0.85 cm^-1. So doing a ft (like spec.pgram ()) only on the signal means 
that I do not use periodic functions (sin x), but something rather like sin 
(x^2) - the sinus changes its frequency. This does not help. 

The idea is to calculate the interferogram (space or time domain) taking into 
account this variation of delta nu. Then do a backtransform to evenly spaced 
frequencies. 
The next step will then be to do other interesting things like downsampling, 
denoising etc. using the interferogram.

Thanks,

Claudia



-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row and Column positions

2008-10-31 Thread Claudia Beleites
Am Freitag 31 Oktober 2008 12:17:30 schrieb Shubha Vishwanath Karanth:
 m=data.frame(a=c(1,NA,5,5),b=c(4,5,6,7),c=c(NA,NA,NA,5))

? which

HTH Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trying to pass arrays as arguments to a function

2008-10-20 Thread Claudia Beleites
 I'd like to avoid looping through an array in order to change values in
 the array as it takes too long.
 I red from an earlier post it can be done by do.call but never got it
 to work. The Idea is to change the value of y according to values in
 x. Wherever x holds the value 3, the corresponding value in y
 should be set to 1.
y [x == 3] - 1


-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >