Re: [R] introducing R to high school students

2012-04-22 Thread Indrajit Sengupta
Bert,
 
What you are saying - is a problem with people who are using Excel. It is not 
Excel's problem that people are sending data in an unstructured way. I agree - 
Excel may not be the right tool when you are doing some complicated data 
analysis (like for e.g. statistical modeling) - but that is not what Excel was 
built for. The power of Excel lies in being able to use it to explore data, 
represent and present your analysis. When exploring data, yes it may not be 
very useful beyond univariates and bivariates - but that is your starting point 
in EDA where you need to generate hypotheses about your data. 
 
I have been in the field of analytics for almost 7 years now, though we have 
embraced technologies like SAS, R, SPSS, Spotfire, etc., the power and 
importance of Excel in our lives has never been lost to us. Its a question of 
whether are you capable enough to use it.
 
Regards,
Indrajit
 



From: Bert Gunter gunter.ber...@gene.com

Cc: Rolf Turner rolf.tur...@xtra.co.nz 
Sent: Sunday, April 22, 2012 11:07 AM
Subject: Re: [R] introducing R to high school students

I would like to slightly clarify and echo  Rolf's comment:

Excel is a terrible tool for data analysis. Maybe it's a good tool for
keeping track of your car's repair history... but not for data
analysis.

I could go on at great length why, but let me just focus on one aspect
that drives me and other statisticians in my group crazy when we deal
with scientists who send us data in Excel: the data are frequently a
mess!  By this I mean that they are often stored in crazy ways, with
plots and summaries sprinkled around, capital letters and small
letters mixed, missing values coded arbitrarily e.g.(9 ), and so
forth. As someone I know once commented, it's a puzzle to get the data
extracted in a form susceptible to analysis.

Why is this? -- because Excel enforces no structure. It's
**cell-based** (du), so users can throw in the data anyway they
see fit, which frequently is pretty unfit.

This is not just a minor issue, imho. Not having data in a reasonable
structure limits what one can do for data analysis and graphics. This
promulgates the inadequate and frequently awful paradigms that one
sees throughout science (e.g. bar charts with 1 se bars sticking up
out of them).

The widespread use of Excel for serious' scientific and engineering
data analysis is a near  tragedy. All IMHO, of course.

Cheers,
Bert

On Sat, Apr 21, 2012 at 9:45 PM, Indrajit Sengupta

 Why do you think Excel is a terrible tool? In what ways have you tried to use 
 Excel and it has failed you?

 Regards,
 Indrajit


 
 From: Rolf Turner rolf.tur...@xtra.co.nz

 Cc: R-help R-help@r-project.org
 Sent: Sunday, April 22, 2012 9:25 AM
 Subject: Re: [R] introducing R to high school students

 On 22/04/12 15:29, Indrajit Sengupta wrote:

 SNIP
 1. At school we seldom deal with lot of data - the focus is more on 
 concepts. Excel is an excellent tool
     That is at best debatable, and IMHO just plain incorrect.  I firmly 
 believe
     that Excel is a ***TERRIBLE*** tool.
 and no matter how much we love or hate it - we will be using Excel a lot in 
 our lives.

     This is not (unfortunately IMHO) debatable.  It is all too sadly true.  
 For most
     people at least.  (Not for my very good self.  I can get away with 
 eschewing
     Excel.  Most people are not lucky enough to have that option.)

 SNIP

     I think much of the remainder of the post was highly disputable as well,
     but I will desist at this point.

         cheers,

             Rolf Turner
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove $ (Dollar sign) from string

2012-04-22 Thread Giuseppe Marinelli
In data martedì 10 aprile 2012 13:34:13, Nevil Amos ha scritto:
 How do I remove a $ character from a string sub() and gsub() with $ or
 \$ as pattern do not work.
 
  sub($,,ABC$DEF)
 
 [1] ABC$DEF
 
  sub(\$,,ABC$DEF)
 
 Error: '\$' is an unrecognized escape in character string starting \$
 
  sub(\$,,ABC$DEF)
 
 Error: unexpected input in sub(\
 
 Thanks

You just need a double backslash:
 sub(\\$,,ABC$DEF)
[1] ABCDEF

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unexpected plot behavior

2012-04-22 Thread Martin Renner
Thank you for the replies, Uwe and Marc. These are explanations that make 
perfect sense. However, shouldn't the behavior of plot.factor include the 
option of type = n for consistency with the default plot function? 

Best,
Martin



On 21 Apr 2012, at 08:18 , Marc Schwartz wrote:

 On Apr 21, 2012, at 9:49 AM, Martin Renner wrote:
 
 When plotting a numerical vector against a factor, 'type=n' seems to have 
 no affect, e.g. 
 plot (1:10~factor (1:10), type = n)
 
 looks just like
 plot (1:10~factor (1:10))
 
 Plotting a numerical against itself works as expected: 
 plot (1:10, type = n)
 
 I see the same behavior under debian gnu/linux, Mac OS X, and Win7 (all 
 current versions, see below). Is this a bug? 
 
 Regards,
 Martin
 
 
 
 This has to do with method dispatch. See ?plot.formula, which is the plot 
 method called you pass a formula, as opposed to passing a vector as in your 
 third example. 
 
 In this case, ?plot.factor is called when the 'x' part of the formula (RHS) 
 is a factor. When plot.factor is called, it internally calls ?boxplot and of 
 course, there is no type = 'n' for boxplots, hence it is ignored.
 
 Regards,
 
 Marc Schwartz
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] contour algorithm

2012-04-22 Thread Duncan Murdoch

On 12-04-21 9:21 PM, Stoch astic wrote:

First time user, so sorry if I don't understand protocol.. Anyway, I have
created a data frame consisting of pearson's R values at various x and y
coordinates and then plotted this using filled.contour. My data is similar
to fMRI data except that it is a surface map reconstructed from
histological sections. I like the results but would like to know how
contours were detected. Google search provides me various sources claiming
the algorithm used is undocumented. For example:


http://wipaed.wiso.uni-goettingen.de/~holdenb1/R/library/base/html/contour.html
Draws contour lines for the desired levels. There is currently no
documentation about the algorithm. The source code is in
`$RHOME/src/main/plot.c'.


That's a very old copy of the help page.  You're generally better off 
using the ones installed with R, or the ones on CRAN 
(cran.r-project.org/web/packages) rather than what Google finds somewhere.


Duncan Murdoch




Does anyone know where or how I can find the method by which contours are
calculated?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to avoid newlines tabs in file opening?

2012-04-22 Thread sagarnikam123
i have uploaded file,but when i am opening it in R,using

 u-file(file.choose(),r)
k-readLines(u)
k
 k[1:120]

is has all /t (tabs) newlines, how to avoid it,
can i take first 3 columns only in table form (lines starts with # not
important for me)

uploaded file:-
http://r.789695.n4.nabble.com/file/n4577757/rabata.txt rabata.txt 

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-avoid-newlines-tabs-in-file-opening-tp4577757p4577757.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to cut files from any folder to another folder?

2012-04-22 Thread sagarnikam123
i want to cut file from e.g. abc  folder  put it into another location
with folder name e.g. xyz
how should i proceed?

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-cut-files-from-any-folder-to-another-folder-tp4577818p4577818.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Standard error

2012-04-22 Thread Christopher Kelvin
Hello,
I have tried obtaining the value of standard error from the code below but i 
get different values when i compare it with the 
standard error obtained from the hessian matrix. Can somebody help me out?
Thank you

n=100;rr=1000
p1=1.2;b=1.5
sq11=sq21=0
for (i in 1:rr){
t-rweibull(n,shape=p1,scale=b)
meantrue-gamma(1+(1/p1))*b
meantrue
d-meantrue/0.40
cen- runif(n,min=0,max=d)
s-ifelse(t=cen,1,0)
q-c(t,cen)

z-function(data, p){ 
beta-p[1]
eta-p[2]
log1-(n*sum(s)*log(p[1])-n*sum(s)*(p[1])*log(p[2])+sum(s)*(p[1]-1)*sum(log(t))-n*sum((t/(p[2]))^(p[1])))
return(-log1)
}

start - c(1,1)
zz-optim(start,fn=z,data=q,hessian=T)
zz
m1-zz$par[2]
p-zz$par[1]

sq11-sq11+(1/rr*(sum((q-m1)^2)))
sq21-sq21+(1/rr*(sum((q-Lm1)^2)))

}

se11-sqrt(sq11)/(rr-1)
se11
se21-sqrt(sq21)/(rr-1)
se21

f-solve(zz$hessian)
se-sqrt(diag(f))
se


Chris Guure
Researcher
Institute for Mathematical Research
UPM


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove $ (Dollar sign) from string

2012-04-22 Thread Patrick Burns

Why you need a double backslash is alluded
to in Circle 8.1.23 of 'The R Inferno'.

http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

Pat

On 22/04/2012 10:18, Giuseppe Marinelli wrote:

In data martedì 10 aprile 2012 13:34:13, Nevil Amos ha scritto:

How do I remove a $ character from a string sub() and gsub() with $ or
\$ as pattern do not work.


sub($,,ABC$DEF)


[1] ABC$DEF


sub(\$,,ABC$DEF)


Error: '\$' is an unrecognized escape in character string starting \$


sub(\$,,ABC$DEF)


Error: unexpected input in sub(\

Thanks


You just need a double backslash:

sub(\\$,,ABC$DEF)

[1] ABCDEF

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to take ID of number 7.

2012-04-22 Thread Yellow
I figured out something new that I would like to see if I can do this more
easy with R then Excel. 

I have these huge files with data. 
For example: 

DataFile.csv 
ID Name log2 
1 Fantasy 5.651 
2 New 7.60518 
3 Finding 8.9532 
4 Looeka -0.248652 
5 Vani 0.3548 

With like header1: ID, header 2: Name, header 3: log2 

Now I need to get the $ID out who have a log2 value higher then 7. 

I know ho to grab the $log2 values with 7+ numbers. 

Log2HigherSeven = DataFile$log2 [ DataFile$log2 = 7]  

But how can I take thise ID numbers also? 

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-take-ID-of-number-7-tp4577998p4577998.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to take ID of number 7.

2012-04-22 Thread Berend Hasselman

On 22-04-2012, at 13:03, Yellow wrote:

 I figured out something new that I would like to see if I can do this more
 easy with R then Excel. 
 
 I have these huge files with data. 
 For example: 
 
 DataFile.csv 
 ID Name log2 
 1 Fantasy 5.651 
 2 New 7.60518 
 3 Finding 8.9532 
 4 Looeka -0.248652 
 5 Vani 0.3548 
 
 With like header1: ID, header 2: Name, header 3: log2 
 
 Now I need to get the $ID out who have a log2 value higher then 7. 
 
 I know ho to grab the $log2 values with 7+ numbers. 
 
 Log2HigherSeven = DataFile$log2 [ DataFile$log2 = 7]  
 
How about

DataFile[DataFile$log2 = 7, c(ID,Log2)]

to get a dataframe with two columns ID and log2.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compare mean

2012-04-22 Thread Yellow
I am also compairing 2 things with each other. 

x = c(1, 5, 7, 9) 
y = c(2, 7, 9, 10, 11) 

intersect(x, y) 

Output will be: 7, 9. 

Hope it helped. :) 

--
View this message in context: 
http://r.789695.n4.nabble.com/compare-mean-tp4576372p4578007.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RE how to cut files from any folder to another folder?

2012-04-22 Thread Carl Witthoft



?file.copy

Or ?system


From: sagarnikam123 sagarnikam123_at_gmail.com
Date: Sun, 22 Apr 2012 01:25:21 -0700 (PDT)

i want to cut file from e.g. abc folder  put it into another location 
with folder name e.g. xyz

how should i proceed?


--

Sent from my Cray XK6
Quidvis recte factum, quamvis humile, praeclarum.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to take ID of number 7.

2012-04-22 Thread Rui Barradas
Hello,


Berend Hasselman wrote
 
 On 22-04-2012, at 13:03, Yellow wrote:
 
 I figured out something new that I would like to see if I can do this
 more
 easy with R then Excel. 
 
 I have these huge files with data. 
 For example: 
 
 DataFile.csv 
 ID Name log2 
 1 Fantasy 5.651 
 2 New 7.60518 
 3 Finding 8.9532 
 4 Looeka -0.248652 
 5 Vani 0.3548 
 
 With like header1: ID, header 2: Name, header 3: log2 
 
 Now I need to get the $ID out who have a log2 value higher then 7. 
 
 I know ho to grab the $log2 values with 7+ numbers. 
 
 Log2HigherSeven = DataFile$log2 [ DataFile$log2 = 7]  
 
 How about
 
 DataFile[DataFile$log2 = 7, c(ID,Log2)]
 
 to get a dataframe with two columns ID and log2.
 
 Berend
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

Or maybe create an index vector into the rows of the data frame.
This would be more flexible, later any columns could be extracted.
The index can be a logical or integer vector.

inx.log - DataFile$log2 = 7
inx.int - which(DataFile$log2 = 7)

DataFile[inx.one.of.them, needed.cols]

As a side effect, it might also save some memory. Both indexes are
internally integers.

Hope this helps,

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-take-ID-of-number-7-tp4577998p4578162.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] introducing R to high school students

2012-04-22 Thread Christopher W. Ryan
I have to agree that Excel is a poor tool for serious scientific and 
engineering data analysis (love the phrase.) I too have spent way too 
much time beating Excel files into submission, with workarounds and 
manipulations, just to be able to do anything useful with them. I'm told 
that one can to some degree impose structure on Excel data entry, but I 
don't know how, and no users ever seem to set up their spreadsheets that 
way.


Somehow, a reasonable tool for business (I suppose, not being a 
businessman), has infiltrated the scientific world as well.


That's really the motivation for my proposal to my science teacher 
colleague. I want to introduce budding scientists to the idea that there 
is a better tool for data analysis, even for exploratory analysis and 
univariates and bivariates, which R does very handily. Why start an 
analysis in Excel only to have to switch to something else for the 
latter half?


And this will lead inevitably into conversations about better ways to 
record, store, and share data. And it ties into concepts of 
collaboration and reproducible research.


--Chris Ryan
SUNY Upstate Clinical Campus
Binghamton, NY

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unexpected plot behavior

2012-04-22 Thread Marc Schwartz

On Apr 22, 2012, at 1:25 AM, Martin Renner wrote:

 Thank you for the replies, Uwe and Marc. These are explanations that make 
 perfect sense. However, shouldn't the behavior of plot.factor include the 
 option of type = n for consistency with the default plot function? 
 
 Best,
 Martin


I don't believe so.

The use of type = n is to facilitate the creation of a plotting environment, 
into which you can, in a piecemeal fashion, create a new plot from a blank 
canvas. The nature of that plot could be virtually anything with symbols, 
lines/curves, shapes and perhaps even pure text.

Since plot.factor() internally calls one of several specific plot functions 
(eg. boxplot(), barplot(), spineplot() or plot()) depending upon the nature of 
the argument(s) passed, you need to understand exactly what you intend to do 
with a blank plotting device as each of those functions has it's own set of 
characteristics, defaults and intents.

Thus, having plot() or more specifically, plot.default(), support the type = 
n paradigm, is sufficient in creating a plot device with desired axis ranges, 
parameters and so forth, to then enable you to then add whatever additional 
content you require.

There is no a priori expectation that a function's child methods inherit all 
of the parent's functionality, because the generic default method's 
functionality may not be apropos to the child classes. Similarly, the child 
classes may implement specific functionality not apropos to the generic parent 
class because more specific information is known about the structure of the 
child. 

Regards,

Marc

 On 21 Apr 2012, at 08:18 , Marc Schwartz wrote:
 
 On Apr 21, 2012, at 9:49 AM, Martin Renner wrote:
 
 When plotting a numerical vector against a factor, 'type=n' seems to have 
 no affect, e.g. 
 plot (1:10~factor (1:10), type = n)
 
 looks just like
 plot (1:10~factor (1:10))
 
 Plotting a numerical against itself works as expected: 
 plot (1:10, type = n)
 
 I see the same behavior under debian gnu/linux, Mac OS X, and Win7 (all 
 current versions, see below). Is this a bug? 
 
 Regards,
 Martin
 
 
 
 This has to do with method dispatch. See ?plot.formula, which is the plot 
 method called you pass a formula, as opposed to passing a vector as in your 
 third example. 
 
 In this case, ?plot.factor is called when the 'x' part of the formula (RHS) 
 is a factor. When plot.factor is called, it internally calls ?boxplot and of 
 course, there is no type = 'n' for boxplots, hence it is ignored.
 
 Regards,
 
 Marc Schwartz


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to avoid newlines tabs in file opening?

2012-04-22 Thread Jeff Newmiller
How about, don't avoid them, use them?

dta - read.table(  http://r.789695.n4.nabble.com/file/n4577757/rabata.txt 
rabata.txt, as.is=TRUE, skip=4, sep=\t )

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

sagarnikam123 sagarnikam...@gmail.com wrote:

i have uploaded file,but when i am opening it in R,using

 u-file(file.choose(),r)
k-readLines(u)
k
 k[1:120]

is has all /t (tabs) newlines, how to avoid it,
can i take first 3 columns only in table form (lines starts with # not
important for me)

uploaded file:-
http://r.789695.n4.nabble.com/file/n4577757/rabata.txt rabata.txt 

--
View this message in context:
http://r.789695.n4.nabble.com/how-to-avoid-newlines-tabs-in-file-opening-tp4577757p4577757.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Transform dataframe

2012-04-22 Thread David Studer
Hi everyone!

I have to following question: I have three items that had
to be ordered (e.g. three persons were rating var1 on the
first rank):

var1 var2 var3
123
213
132
123

Now I'd like to have the data.frame the other way round, so that
the ranks are in the columns:

rank1 rank2 rank3
var1  var2  var3
var2  var1  var3
var1  var3  var2
var1  var2  var3

Can anyone help me achieving this?

# code:

var1-c(1,2,1,1)
var2-c(2,1,3,2)
var3-c(3,3,2,3)
df-as.data.frame(cbind(var1,var2,var3,var4))

??

Thank you very much!
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to avoid newlines tabs in file opening?

2012-04-22 Thread Rui Barradas
Hello,


sagarnikam123 wrote
 
 i have uploaded file,but when i am opening it in R,using
 
 u-file(file.choose(),r)
k-readLines(u)
k
 k[1:120]
 
 is has all /t (tabs) newlines, how to avoid it,
 can i take first 3 columns only in table form (lines starts with # not
 important for me)
 
 uploaded file:-
  http://r.789695.n4.nabble.com/file/n4577757/rabata.txt rabata.txt 
 



Try the following.


# Read from the link you gave
fl - file(http://r.789695.n4.nabble.com/file/n4577757/rabata.txt;, rb)
bin - readBin(fl, what=character)
close(fl)

# Get rid of tabs and '\r' if any
bin - gsub([[:blank:]],  , bin)
bin - gsub(\\r, , bin)
# Split in lines of text and keep those not starting with '#'
txt - unlist(strsplit(bin, \\n))
txt - txt[substr(txt, 1, 1) != #]
# Now make a data.frame of it, cols 1:3 only
lst - lapply(strsplit(txt,  ), function(x) x[1:3])
df1 - data.frame(do.call(rbind, lst), stringsAsFactors=FALSE)
# See what we have
str(df1)
head(df1)
# And revamp col 1
df1$X1 - as.integer(df1$X1)
str(df1)
head(df1)

# Final clean-up
# rm(bin, txt, lst)


Hope this helps,

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-avoid-newlines-tabs-in-file-opening-tp4577757p4578360.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to cut files from any folder to another folder?

2012-04-22 Thread Jeff Newmiller
The cut/copy/paste paradigm is not common in programmed file manipulation under 
various operating systems... due to cross-platform compatibility, be prepared 
to work on files with a copy(=duplicate)/remove approach.

?files

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

sagarnikam123 sagarnikam...@gmail.com wrote:

i want to cut file from e.g. abc  folder  put it into another
location
with folder name e.g. xyz
how should i proceed?

--
View this message in context:
http://r.789695.n4.nabble.com/how-to-cut-files-from-any-folder-to-another-folder-tp4577818p4577818.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] difficulty in Formatting time series data

2012-04-22 Thread Raghuraman Ramachandran
Dear R-Gurus

I have a data frame (from CSV file) which has its first column called Date.
The Date is in the format mm/dd/. I was trying to get the weekday for
these dates and I tried using wday() and day.of.week() functions and both
of them gave me precisely the wrong answers. I think the issue lies in the
proper formatting of dates. The class of this column is a factor class and
hence I tried converting into POSIXlt, xts, zoo objects and yet I could not
get the weekday correctly. Anyone has any suggestions please?

Many thanks
Raghu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulty in Formatting time series data

2012-04-22 Thread R. Michael Weylandt
Yes dput() for a reproducible example with some minimal
reproducible code (and the packages day.of.week and wday() come
from...)

x - xts(10, Sys.Date())
wday(x)

seems fine for me.

precisely the wrong answers  -- interesting turn of phrase.

Michael

On Sun, Apr 22, 2012 at 12:53 PM, Raghuraman Ramachandran
optionsra...@gmail.com wrote:
 Dear R-Gurus

 I have a data frame (from CSV file) which has its first column called Date.
 The Date is in the format mm/dd/. I was trying to get the weekday for
 these dates and I tried using wday() and day.of.week() functions and both
 of them gave me precisely the wrong answers. I think the issue lies in the
 proper formatting of dates. The class of this column is a factor class and
 hence I tried converting into POSIXlt, xts, zoo objects and yet I could not
 get the weekday correctly. Anyone has any suggestions please?

 Many thanks
 Raghu

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transform dataframe

2012-04-22 Thread Jeff Newmiller

On Sun, 22 Apr 2012, David Studer wrote:


Hi everyone!

I have to following question: I have three items that had
to be ordered (e.g. three persons were rating var1 on the
first rank):

var1 var2 var3
123
213
132
123

Now I'd like to have the data.frame the other way round, so that
the ranks are in the columns:

rank1 rank2 rank3
var1  var2  var3
var2  var1  var3
var1  var3  var2
var1  var2  var3

Can anyone help me achieving this?

# code:

var1-c(1,2,1,1)
var2-c(2,1,3,2)
var3-c(3,3,2,3)
df-as.data.frame(cbind(var1,var2,var3,var4))

??

Thank you very much!
David

[[alternative HTML version deleted]]


Please fix your email settings to send text to this list...

tc - textConnection(
var1 var2 var3
123
213
132
123
)
dta - read.table( tc, as.is=TRUE, header=TRUE )
close( tc )
dta$respondent - letters[1:4]

library(reshape2)
dtalong - melt( dta, id=respondent )
# levels specified to guard against sorting problems if more
# than 9 rankings
dtalong$rank - factor( paste( rank, dtalong$value, sep= )
  , levels=paste( rank
, sort( unique( dtalong$value ) )
, sep= )
  , ordered=TRUE )
dta2 - dcast( dtalong, respondent ~ rank, value.var=variable )


---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulty in Formatting time series data

2012-04-22 Thread Hasan Diwan
Raghu,

On 22 April 2012 09:53, Raghuraman Ramachandran optionsra...@gmail.comwrote:

 I have a data frame (from CSV file) which has its first column called Date.
 The Date is in the format mm/dd/. I was trying to get the weekday for
 these dates and I tried using wday() and day.of.week() functions and both
 of them gave me precisely the wrong answers. I think the issue lies in the
 proper formatting of dates. The class of this column is a factor class and
 hence I tried converting into POSIXlt, xts, zoo objects and yet I could not
 get the weekday correctly. Anyone has any suggestions please?


Try this:
# assume dataIn is where the CSV files data is...
dataIn$Date - as.POSIXct(dataIn$Date, format='%m/%d/%y')
dataIn - cbind(dataIn, day.of.week = format(dataIn$Date, format='%A')

-- 
Sent from my mobile device
Envoyait de mon portable

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to take ID of number 7.

2012-04-22 Thread Yellow
O_o This is kinda interesting 
I have 267 log2 values = 7. 
And 295 ID numbers. 

I don't see any problems in my code also: 

ID_Log2_Above_7 = DataFile[DataFile$log2 = 7, c(ID, Log2] 

# Take ID out. 

ID_Above_7 = ID_Log2_Above_7$ID 

# Only numbers, no na or inf. 

ID_Above_7_NO_NA = ID_Above_7[is.na(ID_Above_7)]
ID_Above_7_FINAL = ID_Above_7_NO_NA[is.finite(ID_Above_7_NO_NA)] 

 

I also did the same thing for these log2, and those are 267, as it should
be. 
But why do I have 295 ID numbers? 

I seriously don't get it? 





--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-take-ID-of-number-7-tp4577998p4578532.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulty in Formatting time series data

2012-04-22 Thread Jeff Newmiller

On Sun, 22 Apr 2012, Hasan Diwan wrote:


Raghu,

On 22 April 2012 09:53, Raghuraman Ramachandran optionsra...@gmail.comwrote:


I have a data frame (from CSV file) which has its first column called Date.
The Date is in the format mm/dd/. I was trying to get the weekday for
these dates and I tried using wday() and day.of.week() functions and both
of them gave me precisely the wrong answers. I think the issue lies in the
proper formatting of dates. The class of this column is a factor class and
hence I tried converting into POSIXlt, xts, zoo objects and yet I could not
get the weekday correctly. Anyone has any suggestions please?



Try this:
# assume dataIn is where the CSV files data is...
dataIn$Date - as.POSIXct(dataIn$Date, format='%m/%d/%y')


By far the most common error I see is failing to import the Date column as 
character, instead allowing the import function to convert it to factor, 
after which computations (such as the above suggestion) use the hidden 
factor index instead of the visible character representation, which 
further mystifies beginners.  The conversion above will only work 
correctly if the column was imported as character.  E.g.


dataIn - read.csv( file=yourdatafile, as.is=TRUE )

OP: Use the str() function to see what types you are working with, and in 
future R-help queries send dput() of the data and code you have tried if 
we are to be able to reproduce your attempts effectively rather than 
reading your mind.



dataIn - cbind(dataIn, day.of.week = format(dataIn$Date, format='%A')


Why not just

dataIn$day.of.week - weekdays( dataIn$Date )

?

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to take ID of number 7.

2012-04-22 Thread Jeff Newmiller

Please provide self-contained, reproducible examples.

On Sun, 22 Apr 2012, Yellow wrote:


O_o This is kinda interesting
I have 267 log2 values = 7.
And 295 ID numbers.

I don't see any problems in my code also:

ID_Log2_Above_7 = DataFile[DataFile$log2 = 7, c(ID, Log2]


Missing a parenthesis, and see below.


# Take ID out.

ID_Above_7 = ID_Log2_Above_7$ID

# Only numbers, no na or inf.

ID_Above_7_NO_NA = ID_Above_7[is.na(ID_Above_7)]
ID_Above_7_FINAL = ID_Above_7_NO_NA[is.finite(ID_Above_7_NO_NA)]



I also did the same thing for these log2, and those are 267, as it should
be.
But why do I have 295 ID numbers?

I seriously don't get it?


You stopped working with data frames midway through, and now there is no 
well-defined correspondence between ID numbers and log2 numbers (whatever 
they are).  The troublesome values you eliminate in one column must also 
eliminate values in the other column.


Modify the line above to select rows in the data frame that are not null 
and are not finite, and you will end up with a single dataframe of data 
that meets your quality criteria.


---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulty in Formatting time series data

2012-04-22 Thread Raghuraman Ramachandran
I tried downloading using as.is and have also provided the dput below. The
date for example is 20/4/2012 and wday gives 2 instead of 6? Thanks for all
your help.

 str(test1)
'data.frame':   1825 obs. of  7 variables:
 $ Date : chr  20/04/2012 19/04/2012 18/04/2012 17/04/2012 ...
 $ Open : num  2.33 2.35 2.35 2.34 2.32 2.34 2.3 2.28 2.29 2.28 ...
 $ High : num  2.34 2.36 2.38 2.34 2.35 2.36 2.32 2.29 2.33 2.3 ...
 $ Low  : num  2.31 2.33 2.34 2.3 2.31 2.32 2.29 2.25 2.28 2.28 ...
 $ Close: num  2.31 2.35 2.36 2.33 2.31 2.32 2.31 2.26 2.3 2.29 ...
 $ Volume   : int  5366000 5382000 9606000 9596000 5941000 10332000 700
9636000 6019000 3279000 ...
 $ Adj.Close: num  2.31 2.35 2.36 2.33 2.31 2.32 2.31 2.26 2.3 2.29 ...
 wday(test$Date[1])
[1] 2
 wday(test1$Date[1])
[1] 2
 test1%Date[1]
Error: unexpected input in test1%Date[1]
 test1$Date[1]
[1] 20/04/2012

 dput(test1)
structure(list(Date = c(20/04/2012, 19/04/2012, 18/04/2012,
17/04/2012, 16/04/2012, 13/04/2012, 12/04/2012, 11/04/2012,
10/04/2012, 09/04/2012, 05/04/2012, 04/04/2012, 03/04/2012,
02/04/2012, 30/03/2012, 29/03/2012, 28/03/2012, 27/03/2012,
26/03/2012, 23/03/2012, 21/03/2012, 20/03/2012, 19/03/2012,
16/03/2012, 15/03/2012, 14/03/2012, 13/03/2012, 12/03/2012,
09/03/2012, 08/03/2012, 07/03/2012, 06/03/2012, 05/03/2012,
02/03/2012, 01/03/2012, 29/02/2012, 28/02/2012, 27/02/2012,
24/02/2012, 23/02/2012, 22/02/2012, 21/02/2012, 20/02/2012,
17/02/2012, 16/02/2012, 15/02/2012, 14/02/2012, 13/02/2012,
10/02/2012, 09/02/2012, 08/02/2012, 07/02/2012, 06/02/2012,
03/02/2012, 02/02/2012, 01/02/2012, 31/01/2012, 30/01/2012,
27/01/2012, 26/01/2012, 25/01/2012, 20/01/2012, 19/01/2012,
18/01/2012, 17/01/2012, 16/01/2012, 13/01/2012, 12/01/2012,
11/01/2012, 10/01/2012, 09/01/2012, 06/01/2012, 05/01/2012,
04/01/2012, 03/01/2012, 30/12/2011, 29/12/2011, 28/12/2011,
27/12/2011, 23/12/2011, 22/12/2011, 21/12/2011, 20/12/2011,
19/12/2011, 16/12/2011, 15/12/2011, 14/12/2011, 13/12/2011,
12/12/2011, 09/12/2011, 08/12/2011, 07/12/2011, 06/12/2011,
05/12/2011, 02/12/2011, 01/12/2011, 30/11/2011, 29/11/2011,
28/11/2011, 25/11/2011, 24/11/2011, 23/11/2011, 22/11/2011,
21/11/2011, 18/11/2011, 17/11/2011, 16/11/2011, 15/11/2011,
14/11/2011, 11/11/2011, 10/11/2011, 09/11/2011, 08/11/2011,
04/11/2011, 03/11/2011, 02/11/2011, 01/11/2011, 31/10/2011,
28/10/2011, 27/10/2011, 25/10/2011, 24/10/2011, 21/10/2011,
20/10/2011, 19/10/2011, 18/10/2011, 17/10/2011, 14/10/2011,
13/10/2011, 12/10/2011, 11/10/2011, 10/10/2011, 07/10/2011,
06/10/2011, 05/10/2011, 04/10/2011, 03/10/2011, 30/09/2011,
29/09/2011, 28/09/2011, 27/09/2011, 26/09/2011, 23/09/2011,
22/09/2011, 21/09/2011, 20/09/2011, 19/09/2011, 16/09/2011,
15/09/2011, 14/09/2011, 13/09/2011, 12/09/2011, 09/09/2011,
08/09/2011, 07/09/2011, 06/09/2011, 05/09/2011, 02/09/2011,
01/09/2011, 31/08/2011, 29/08/2011, 26/08/2011, 25/08/2011,
24/08/2011, 23/08/2011, 22/08/2011, 19/08/2011, 18/08/2011,
17/08/2011, 16/08/2011, 15/08/2011, 12/08/2011, 11/08/2011,
10/08/2011, 08/08/2011, 05/08/2011, 04/08/2011, 03/08/2011,
02/08/2011, 01/08/2011, 29/07/2011, 28/07/2011, 27/07/2011,
26/07/2011, 25/07/2011, 22/07/2011, 21/07/2011, 20/07/2011,
19/07/2011, 18/07/2011, 15/07/2011, 14/07/2011, 13/07/2011,
12/07/2011, 11/07/2011, 08/07/2011, 07/07/2011, 06/07/2011,
05/07/2011, 04/07/2011, 01/07/2011, 30/06/2011, 29/06/2011,
28/06/2011, 27/06/2011, 24/06/2011, 23/06/2011, 22/06/2011,
21/06/2011, 20/06/2011, 17/06/2011, 16/06/2011, 15/06/2011,
14/06/2011, 13/06/2011, 10/06/2011, 09/06/2011, 08/06/2011,
07/06/2011, 06/06/2011, 03/06/2011, 02/06/2011, 01/06/2011,
31/05/2011, 30/05/2011, 27/05/2011, 26/05/2011, 25/05/2011,
24/05/2011, 23/05/2011, 20/05/2011, 19/05/2011, 18/05/2011,
16/05/2011, 13/05/2011, 12/05/2011, 11/05/2011, 10/05/2011,
09/05/2011, 06/05/2011, 05/05/2011, 04/05/2011, 03/05/2011,
29/04/2011, 28/04/2011, 27/04/2011, 26/04/2011, 25/04/2011,
21/04/2011, 20/04/2011, 19/04/2011, 18/04/2011, 15/04/2011,
14/04/2011, 13/04/2011, 12/04/2011, 11/04/2011, 08/04/2011,
07/04/2011, 06/04/2011, 05/04/2011, 04/04/2011, 01/04/2011,
31/03/2011, 30/03/2011, 29/03/2011, 28/03/2011, 25/03/2011,
24/03/2011, 23/03/2011, 22/03/2011, 21/03/2011, 18/03/2011,
17/03/2011, 16/03/2011, 15/03/2011, 14/03/2011, 11/03/2011,
10/03/2011, 09/03/2011, 08/03/2011, 07/03/2011, 04/03/2011,
03/03/2011, 02/03/2011, 01/03/2011, 28/02/2011, 25/02/2011,
24/02/2011, 23/02/2011, 22/02/2011, 21/02/2011, 18/02/2011,
17/02/2011, 16/02/2011, 15/02/2011, 14/02/2011, 11/02/2011,
10/02/2011, 09/02/2011, 08/02/2011, 07/02/2011, 02/02/2011,
01/02/2011, 31/01/2011, 28/01/2011, 27/01/2011, 26/01/2011,
25/01/2011, 24/01/2011, 21/01/2011, 20/01/2011, 19/01/2011,
18/01/2011, 17/01/2011, 14/01/2011, 13/01/2011, 12/01/2011,
11/01/2011, 10/01/2011, 07/01/2011, 06/01/2011, 05/01/2011,
04/01/2011, 03/01/2011, 31/12/2010, 30/12/2010, 29/12/2010,
28/12/2010, 27/12/2010, 24/12/2010, 23/12/2010, 22/12/2010,
21/12/2010, 20/12/2010, 17/12/2010, 16/12/2010, 15/12/2010,
14/12/2010, 

Re: [R] how to cut files from any folder to another folder?

2012-04-22 Thread cberry
sagarnikam123 sagarnikam...@gmail.com writes:

 i want to cut file from e.g. abc  folder  put it into another location
 with folder name e.g. xyz
 how should i proceed?

See

?files




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-to-cut-files-from-any-folder-to-another-folder-tp4577818p4577818.html
 Sent from the R help mailing list archive at Nabble.com.


-- 
Charles C. BerryDept of Family/Preventive Medicine
cberry at ucsd edu  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulty in Formatting time series data

2012-04-22 Thread Raghuraman Ramachandran
I also tried:

 test$Date=as.POSIXct(test$Date,format=%m%d%y)
 test=cbind(test,day.of.week=format(test$Date,format=%A))
 head(test)
  Date Open High  Low Close   Volume Adj.Close day.of.week
1 NA 2.33 2.34 2.31  2.31  5366000  2.31NA
2 NA 2.35 2.36 2.33  2.35  5382000  2.35NA
3 NA 2.35 2.38 2.34  2.36  9606000  2.36NA
4 NA 2.34 2.34 2.30  2.33  9596000  2.33NA
5 NA 2.32 2.35 2.31  2.31  5941000  2.31NA
6 NA 2.34 2.36 2.32  2.32 10332000  2.32

It didnt help.

Thx
Raghu

On Sun, Apr 22, 2012 at 6:41 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote:

 On Sun, 22 Apr 2012, Hasan Diwan wrote:

 Raghu,

 On 22 April 2012 09:53, Raghuraman Ramachandran optionsra...@gmail.com
 wrote:

 I have a data frame (from CSV file) which has its first column called
 Date.
 The Date is in the format mm/dd/. I was trying to get the weekday for
 these dates and I tried using wday() and day.of.week() functions and both
 of them gave me precisely the wrong answers. I think the issue lies in
 the
 proper formatting of dates. The class of this column is a factor class
 and
 hence I tried converting into POSIXlt, xts, zoo objects and yet I could
 not
 get the weekday correctly. Anyone has any suggestions please?


 Try this:
 # assume dataIn is where the CSV files data is...
 dataIn$Date - as.POSIXct(dataIn$Date, format='%m/%d/%y')


 By far the most common error I see is failing to import the Date column as
 character, instead allowing the import function to convert it to factor,
 after which computations (such as the above suggestion) use the hidden
 factor index instead of the visible character representation, which further
 mystifies beginners.  The conversion above will only work correctly if the
 column was imported as character.  E.g.

 dataIn - read.csv( file=yourdatafile, as.is=TRUE )

 OP: Use the str() function to see what types you are working with, and in
 future R-help queries send dput() of the data and code you have tried if we
 are to be able to reproduce your attempts effectively rather than reading
 your mind.


 dataIn - cbind(dataIn, day.of.week = format(dataIn$Date, format='%A')


 Why not just

 dataIn$day.of.week - weekdays( dataIn$Date )

 ?

 --**--**
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
  Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.
  rocks...1k


 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulty in Formatting time series data

2012-04-22 Thread Berend Hasselman

On 22-04-2012, at 20:12, Raghuraman Ramachandran wrote:

 I tried downloading using as.is and have also provided the dput below. The
 date for example is 20/4/2012 and wday gives 2 instead of 6? Thanks for all
 your help.

dt - 20/04/2012

 as.Date(dt)
[1] 0020-04-20
 as.Date(dt,format=%d/%m/%Y)
[1] 2012-04-20

 weekdays(as.Date(dt,format=%d/%m/%Y))
[1] Friday


Read the help for as.Date

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulty in Formatting time series data

2012-04-22 Thread David Winsemius


On Apr 22, 2012, at 2:12 PM, Raghuraman Ramachandran wrote:

I tried downloading using as.is and have also provided the dput  
below. The
date for example is 20/4/2012 and wday gives 2 instead of 6? Thanks  
for all

your help.


str(test1)

'data.frame':   1825 obs. of  7 variables:
$ Date : chr  20/04/2012 19/04/2012 18/04/2012  
17/04/2012 ...

$ Open : num  2.33 2.35 2.35 2.34 2.32 2.34 2.3 2.28 2.29 2.28 ...
$ High : num  2.34 2.36 2.38 2.34 2.35 2.36 2.32 2.29 2.33 2.3 ...
$ Low  : num  2.31 2.33 2.34 2.3 2.31 2.32 2.29 2.25 2.28 2.28 ...
$ Close: num  2.31 2.35 2.36 2.33 2.31 2.32 2.31 2.26 2.3 2.29 ...
$ Volume   : int  5366000 5382000 9606000 9596000 5941000 10332000  
700

9636000 6019000 3279000 ...
$ Adj.Close: num  2.31 2.35 2.36 2.33 2.31 2.32 2.31 2.26 2.3 2.29 ...

wday(test$Date[1])

[1] 2


You are skipping a couple of essential steps. The wday() function is  
unable to infer that you are using a non-standard date format. (I  
don't think it would even work if you were using -MM-DD.) Read up  
on :


?DateTimeClasses
?as.Date
?strptime

--
David.


wday(test1$Date[1])

[1] 2

test1%Date[1]

Error: unexpected input in test1%Date[1]

test1$Date[1]

[1] 20/04/2012


dput(test1)

structure(list(Date = c(20/04/2012, 19/04/2012, 18/04/2012,
17/04/2012, 16/04/2012, 13/04/2012, 12/04/2012, 11/04/2012,
10/04/2012, 09/04/2012, 05/04/2012, 04/04/2012, 03/04/2012,
02/04/2012, 30/03/2012, 29/03/2012, 28/03/2012, 27/03/2012,
26/03/2012, 23/03/2012, 21/03/2012, 20/03/2012, 19/03/2012,
16/03/2012, 15/03/2012, 14/03/2012, 13/03/2012, 12/03/2012,
09/03/2012, 08/03/2012, 07/03/2012, 06/03/2012, 05/03/2012,
02/03/2012, 01/03/2012, 29/02/2012, 28/02/2012, 27/02/2012,
24/02/2012, 23/02/2012, 22/02/2012, 21/02/2012, 20/02/2012,
17/02/2012, 16/02/2012, 15/02/2012, 14/02/2012, 13/02/2012,
10/02/2012, 09/02/2012, 08/02/2012, 07/02/2012, 06/02/2012,
03/02/2012, 02/02/2012, 01/02/2012, 31/01/2012, 30/01/2012,
27/01/2012, 26/01/2012, 25/01/2012, 20/01/2012, 19/01/2012,
18/01/2012, 17/01/2012, 16/01/2012, 13/01/2012, 12/01/2012,
11/01/2012, 10/01/2012, 09/01/2012, 06/01/2012, 05/01/2012,
04/01/2012, 03/01/2012, 30/12/2011, 29/12/2011, 28/12/2011,
27/12/2011, 23/12/2011, 22/12/2011, 21/12/2011, 20/12/2011,
19/12/2011, 16/12/2011, 15/12/2011, 14/12/2011, 13/12/2011,
12/12/2011, 09/12/2011, 08/12/2011, 07/12/2011, 06/12/2011,
05/12/2011, 02/12/2011, 01/12/2011, 30/11/2011, 29/11/2011,
28/11/2011, 25/11/2011, 24/11/2011, 23/11/2011, 22/11/2011,
21/11/2011, 18/11/2011, 17/11/2011, 16/11/2011, 15/11/2011,
14/11/2011, 11/11/2011, 10/11/2011, 09/11/2011, 08/11/2011,
04/11/2011, 03/11/2011, 02/11/2011, 01/11/2011, 31/10/2011,
28/10/2011, 27/10/2011, 25/10/2011, 24/10/2011, 21/10/2011,
20/10/2011, 19/10/2011, 18/10/2011, 17/10/2011, 14/10/2011,
13/10/2011, 12/10/2011, 11/10/2011, 10/10/2011, 07/10/2011,
06/10/2011, 05/10/2011, 04/10/2011, 03/10/2011, 30/09/2011,
29/09/2011, 28/09/2011, 27/09/2011, 26/09/2011, 23/09/2011,
22/09/2011, 21/09/2011, 20/09/2011, 19/09/2011, 16/09/2011,
15/09/2011, 14/09/2011, 13/09/2011, 12/09/2011, 09/09/2011,
08/09/2011, 07/09/2011, 06/09/2011, 05/09/2011, 02/09/2011,
01/09/2011, 31/08/2011, 29/08/2011, 26/08/2011, 25/08/2011,
24/08/2011, 23/08/2011, 22/08/2011, 19/08/2011, 18/08/2011,
17/08/2011, 16/08/2011, 15/08/2011, 12/08/2011, 11/08/2011,
10/08/2011, 08/08/2011, 05/08/2011, 04/08/2011, 03/08/2011,
02/08/2011, 01/08/2011, 29/07/2011, 28/07/2011, 27/07/2011,
26/07/2011, 25/07/2011, 22/07/2011, 21/07/2011, 20/07/2011,
19/07/2011, 18/07/2011, 15/07/2011, 14/07/2011, 13/07/2011,
12/07/2011, 11/07/2011, 08/07/2011, 07/07/2011, 06/07/2011,
05/07/2011, 04/07/2011, 01/07/2011, 30/06/2011, 29/06/2011,
28/06/2011, 27/06/2011, 24/06/2011, 23/06/2011, 22/06/2011,
21/06/2011, 20/06/2011, 17/06/2011, 16/06/2011, 15/06/2011,
14/06/2011, 13/06/2011, 10/06/2011, 09/06/2011, 08/06/2011,
07/06/2011, 06/06/2011, 03/06/2011, 02/06/2011, 01/06/2011,
31/05/2011, 30/05/2011, 27/05/2011, 26/05/2011, 25/05/2011,
24/05/2011, 23/05/2011, 20/05/2011, 19/05/2011, 18/05/2011,
16/05/2011, 13/05/2011, 12/05/2011, 11/05/2011, 10/05/2011,
09/05/2011, 06/05/2011, 05/05/2011, 04/05/2011, 03/05/2011,
29/04/2011, 28/04/2011, 27/04/2011, 26/04/2011, 25/04/2011,
21/04/2011, 20/04/2011, 19/04/2011, 18/04/2011, 15/04/2011,
14/04/2011, 13/04/2011, 12/04/2011, 11/04/2011, 08/04/2011,
07/04/2011, 06/04/2011, 05/04/2011, 04/04/2011, 01/04/2011,
31/03/2011, 30/03/2011, 29/03/2011, 28/03/2011, 25/03/2011,
24/03/2011, 23/03/2011, 22/03/2011, 21/03/2011, 18/03/2011,
17/03/2011, 16/03/2011, 15/03/2011, 14/03/2011, 11/03/2011,
10/03/2011, 09/03/2011, 08/03/2011, 07/03/2011, 04/03/2011,
03/03/2011, 02/03/2011, 01/03/2011, 28/02/2011, 25/02/2011,
24/02/2011, 23/02/2011, 22/02/2011, 21/02/2011, 18/02/2011,
17/02/2011, 16/02/2011, 15/02/2011, 14/02/2011, 11/02/2011,
10/02/2011, 09/02/2011, 08/02/2011, 07/02/2011, 02/02/2011,
01/02/2011, 31/01/2011, 28/01/2011, 27/01/2011, 26/01/2011,
25/01/2011, 24/01/2011, 21/01/2011, 

Re: [R] difficulty in Formatting time series data

2012-04-22 Thread David Winsemius


On Apr 22, 2012, at 2:18 PM, Raghuraman Ramachandran wrote:


I also tried:


test$Date=as.POSIXct(test$Date,format=%m%d%y)


Well, as became apparent when you eventually offered an example, you  
have dates in dd/mm/ format,  so it's hardly surprising that it  
didn't work with a format that didn't match your data.


?strptime
?as.Date



test=cbind(test,day.of.week=format(test$Date,format=%A))
head(test)

 Date Open High  Low Close   Volume Adj.Close day.of.week
1 NA 2.33 2.34 2.31  2.31  5366000  2.31NA
2 NA 2.35 2.36 2.33  2.35  5382000  2.35NA
3 NA 2.35 2.38 2.34  2.36  9606000  2.36NA
4 NA 2.34 2.34 2.30  2.33  9596000  2.33NA
5 NA 2.32 2.35 2.31  2.31  5941000  2.31NA
6 NA 2.34 2.36 2.32  2.32 10332000  2.32

It didnt help.

Thx
Raghu

On Sun, Apr 22, 2012 at 6:41 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us 
wrote:



On Sun, 22 Apr 2012, Hasan Diwan wrote:

Raghu,


On 22 April 2012 09:53, Raghuraman Ramachandran optionsra...@gmail.com

wrote:


I have a data frame (from CSV file) which has its first column  
called

Date.
The Date is in the format mm/dd/. I was trying to get the  
weekday for
these dates and I tried using wday() and day.of.week() functions  
and both
of them gave me precisely the wrong answers. I think the issue  
lies in

the
proper formatting of dates. The class of this column is a factor  
class

and
hence I tried converting into POSIXlt, xts, zoo objects and yet I  
could

not
get the weekday correctly. Anyone has any suggestions please?



Try this:
# assume dataIn is where the CSV files data is...
dataIn$Date - as.POSIXct(dataIn$Date, format='%m/%d/%y')



By far the most common error I see is failing to import the Date  
column as
character, instead allowing the import function to convert it to  
factor,
after which computations (such as the above suggestion) use the  
hidden
factor index instead of the visible character representation, which  
further
mystifies beginners.  The conversion above will only work correctly  
if the

column was imported as character.  E.g.

dataIn - read.csv( file=yourdatafile, as.is=TRUE )

OP: Use the str() function to see what types you are working with,  
and in
future R-help queries send dput() of the data and code you have  
tried if we
are to be able to reproduce your attempts effectively rather than  
reading

your mind.


dataIn - cbind(dataIn, day.of.week = format(dataIn$Date,  
format='%A')




Why not just

dataIn$day.of.week - weekdays( dataIn$Date )

?

--**--**
---
Jeff NewmillerThe .   .  Go  
Live...

DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
Go...
Live:   OO#.. Dead: OO#..   
Playing

Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.
rocks...1k


__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help 


PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] need advice on using excel to check data for import into R

2012-04-22 Thread Markus Weisner
I have created an S4 object type for conducting fire department data
analysis.  The object includes validity check that ensures certain fields
are present and that duplicate records don't exist for certain combinations
of columns (e.g. no duplicate incident number / incident data / unit ID
ensures that the data does not show the same fire engine responding twice
on the same call).

I am finding that I spend a lot of time taking client data, converting it
to my S4 object, and then sending it back to the client to correct data
validity issues.

I am trying to figure out a clever way to have excel (typically the program
used by my clients) check client data prior to them submitting it to me.  I
have been working with somebody on trying to develop an excel toolbar
add-in with limited success.

My question is whether anybody can think of clever alternatives for clients
to validate their data … for example, is their a R excel plugin (that would
be easily installed by a client) where I might be able write some lines of
R to check the data and output messages … or maybe some sort of server
where they could upload their data and I could have some lines of R code
that would check the code and send back potential error messages?

I realize this is a fairly open ended question … just looking for some
general ideas and directions to go. Getting a little frustrated with
spending most of my work time dealing with data cleaning issues … guessing
this is a problem shared by many of us that use R!

Thanks,
Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difficulty in Formatting time series data

2012-04-22 Thread Rui Barradas
Hello,

SMALL, reproducible examples...

Anyway, it's not that difficult. Try this

d.of.w - as.integer(format(as.Date(test1$Date, format=%d/%m/%Y), %w))
str(d.of.w)
head(d.of.w)


Note that the format '%w' gives days in 0-6, where Sunday == 0. See
?strftime.
Your Friday is therefore 5.
(Or use d.of.w - d.of.w + 1)

Hope this helps,

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/difficulty-in-Formatting-time-series-data-tp4578461p4578655.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Assignment problems

2012-04-22 Thread phillip03
The text below is a part of, some work I have to do, which is due in 2 days
and I am strung up with a lot of other stuff, so I was hoping someone would
take 5 mins and help me ??

Here is a part of my data.frame:

 year country1 country2 contig comlangpop1gdp1   
pop2 gdp2 rtadist  avgflow
11992  AUS  AUT  0   0  17.4950008  321708.281  
7.7825189   194684.078   0 15608.4 1.075999e+02
21992  AUS  BEL  0   0  17.4950008  321708.281 
10.0450001   231762.094   0 16319.2 4.767162e+02
31992  AUS  CAN  0   1  17.4950008  321708.281 
28.5195980   570291.188   0 15391.1 7.456945e+02
41992  AUS  CHE  0   0  17.4950008  321708.281  
6.875   249471.422   0 16170.1 4.625214e+02
51992  AUS  DEU  0   0  17.4950008  321708.281 
80.6240005  2062141.500   0 15935.1 2.047573e+03
61992  AUS  DNK  0   0  17.4950008  321708.281  
5.171   150195.484   0 15725.5 1.453406e+02
71992  AUS  ESP  0   0  17.4950008  321708.281 
39.0677490   612585.250   0 17072.9 2.106880e+02
81992  AUS  FIN  0   0  17.4950008  321708.281  
5.0419998   109859.438   0 14849.5 2.025125e+02
91992  AUS  FRA  0   0  17.4950008  321708.281 
57.2422981  1371706.000   0 16513.0 1.070802e+03
10   1992  AUS  GBR  0   1  17.4950008  321708.281 
57.9023476  1071537.375   0 16602.3 2.279130e+03
11   1992  AUS  GRC  0   0  17.4950008  321708.281 
10.369   102022.352   0 14845.6 4.164985e+01
12   1992  AUS  IRL  0   1  17.4950008  321708.281  
3.549099954272.410   0 16895.0 1.076323e+02
13   1992  AUS  ISL  0   0  17.4950008  321708.281  
0.2611000 6976.168   0 16443.6 2.190602e+01
14   1992  AUS  ITA  0   0  17.4950008  321708.281 
56.7976494  1265800.125   0 15855.4 9.683720e+02
15   1992  AUS  JPN  0   0  17.4950008  321708.281
124.2289963  3766884.000   0  7827.1 1.026065e+04
16   1992  AUS  NLD  0   0  17.4950008  321708.281 
15.1780005   348224.562   0 16227.5 6.510009e+02
17   1992  AUS  NOR  0   0  17.4950008  321708.281  
4.2863998   127170.328   0 15646.2 9.357240e+01
18   1992  AUS  NZL  0   1  17.4950008  321708.281  
3.531699940706.199   1  2736.4 2.267670e+03
19   1992  AUS  PRT  0   0  17.4950008  321708.281  
9.9630003   102890.258   0 17625.3 2.611476e+02
20   1992  AUS  SWE  0   0  17.4950008  321708.281  
8.6680002   264822.875   0 15385.4 4.653388e+02


there is 3400 observations.

3.1.1. Construct a dummy variable, EMU, that in any given year takes the
value 1 if both countries are members of the EMU and 0 otherwise. How big a
proportion of the observations are among EMU member countries?

This problem is solved with:
 euro-c(AUT,BEL,DEU,ESP,FIN,FRA,GRC,IRL,ITA,NLD,PRT)
 countries-data.frame(country1,country2,stringsAsFactors=FALSE)
 data1-cbind(data,EMU=Reduce(``, lapply(countries, function(x) x %in%
euro)))
 
 data1[EMU==TRUE,13]

 a-table(EMU)


3.1.2. Are the member and non-member country-pairs alike? 

What I need here is:
I want to find the mean of avgflow, but only for the data where 2 countries
are in the euro vector/if EMU=TRUE ?
I have tried with:
avgflowONLY-cbind(avgflow,EMU)

 NEWavgflow-rep(0,nrow(avgflowONLY))

 for (i in 1:nrow(avgflowONLY)){if
 (EMU==1){NEWavgflow[i]-mean(avgflow[i])}}

BUT it gives me: 
Warning messages:
1: In if (EMU == 1) { ... :
  the condition has length  1 and only the first element will be used
etc. ???


--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578672.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Solve an ordinary or generalized eigenvalue problem in R?

2012-04-22 Thread Jonathan Greenberg
Thanks all (particularly to you, Berend) -- I'll push forward with these
solutions and integrate them into my code.  I did come across geigen while
rooting around in the CCA code but its not formally documented (it just
says for internal use or something along those lines) and as you found
out above, it does not produce the same solution as the dggev.  It would be
nice to have a more complete set of formal packages for doing LA in R
(rather than having to hand-write .Fortran calls) but I'll leave that to
someone with more expertise in linear algebra than me.  Something that
perhaps matches the SciPy set of functions (both in terms of input and
output):

http://docs.scipy.org/doc/scipy/reference/linalg.html

Some of these are already implemented, but clearly not all of them.

--j

On Sat, Apr 21, 2012 at 1:31 PM, Berend Hasselman b...@xs4all.nl wrote:


 On 21-04-2012, at 20:20, peter dalgaard wrote:

 
 
  The eigenvalues are identical upto the printed 9 digits but the
 eigenvectors appear to be quite different.
  Maybe this is what Luke meant.
 
  Berend
 
 
 
  They look quite similar to me:
 
  ev - eigen(solve(B,A) )$vectors
  ge - geigen(A, B, TRUE , TRUE)
  ev / ge$vl
   [,1] [,2]   [,3]
  [1,] 0.9324603 0.813422 -0.7423694
  [2,] 0.9324603 0.813422 -0.7423694
  [3,] 0.9324603 0.813422 -0.7423694
  ev / ge$vr
   [,1] [,2]   [,3]
  [1,] 0.9324603 0.813422 -0.7423694
  [2,] 0.9324603 0.813422 -0.7423694
  [3,] 0.9324603 0.813422 -0.7423694
 
  (and of course, eigenvectors of any sort are only defined up to a
 constant multiplier)

 Correct. I should have checked  your way and not optically.

 Berend

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Solve an ordinary or generalized eigenvalue problem in R?

2012-04-22 Thread Berend Hasselman

On 22-04-2012, at 21:08, Jonathan Greenberg wrote:

 Thanks all (particularly to you, Berend) -- I'll push forward with these 
 solutions and integrate them into my code.  I did come across geigen while 
 rooting around in the CCA code but its not formally documented (it just says 
 for internal use or something along those lines) and as you found out 
 above, it does not produce the same solution as the dggev.  It would be nice 
 to have a more complete set of formal packages for doing LA in R (rather than 
 having to hand-write .Fortran calls) but I'll leave that to someone with more 
 expertise in linear algebra than me.  Something that perhaps matches the 
 SciPy set of functions (both in terms of input and output):
 
 http://docs.scipy.org/doc/scipy/reference/linalg.html
 
 Some of these are already implemented, but clearly not all of them.  

Package CCA has package fda as dependency.
And package fda defines a function geigen.
The first 14 lines of this function are

geigen - function(Amat, Bmat, Cmat)
{
  #  solve the generalized eigenanalysis problem
  #
  #max {tr L'AM / sqrt[tr L'BL tr M'CM] w.r.t. L and M
  #
  #  Arguments:
  #  AMAT ... p by q matrix
  #  BMAT ... order p symmetric positive definite matrix
  #  CMAT ... order q symmetric positive definite matrix
  #  Returns:
  #  VALUES ... vector of length s = min(p,q) of eigenvalues
  #  LMAT   ... p by s matrix L
  #  MMAT   ... q by s matrix M

It's not clear to me how it is used and exactly what it is doing and how that 
compares with Lapack.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assignment problems

2012-04-22 Thread R. Michael Weylandt michael.weyla...@gmail.com
Look at ?ifelse, a combination of logical subscripting and mean(), or even 
better ?ave -- I can't say too much more; there's a no homework policy on this 
list and  I recognize that first solution as mine already... (I should have 
noted that the first time)

Michael

On Apr 22, 2012, at 2:54 PM, phillip03 phillipbrig...@hotmail.com wrote:

 The text below is a part of, some work I have to do, which is due in 2 days
 and I am strung up with a lot of other stuff, so I was hoping someone would
 take 5 mins and help me ??
 
 Here is a part of my data.frame:
 
 year country1 country2 contig comlangpop1gdp1   
 pop2 gdp2 rtadist  avgflow
 11992  AUS  AUT  0   0  17.4950008  321708.281  
 7.7825189   194684.078   0 15608.4 1.075999e+02
 21992  AUS  BEL  0   0  17.4950008  321708.281 
 10.0450001   231762.094   0 16319.2 4.767162e+02
 31992  AUS  CAN  0   1  17.4950008  321708.281 
 28.5195980   570291.188   0 15391.1 7.456945e+02
 41992  AUS  CHE  0   0  17.4950008  321708.281  
 6.875   249471.422   0 16170.1 4.625214e+02
 51992  AUS  DEU  0   0  17.4950008  321708.281 
 80.6240005  2062141.500   0 15935.1 2.047573e+03
 61992  AUS  DNK  0   0  17.4950008  321708.281  
 5.171   150195.484   0 15725.5 1.453406e+02
 71992  AUS  ESP  0   0  17.4950008  321708.281 
 39.0677490   612585.250   0 17072.9 2.106880e+02
 81992  AUS  FIN  0   0  17.4950008  321708.281  
 5.0419998   109859.438   0 14849.5 2.025125e+02
 91992  AUS  FRA  0   0  17.4950008  321708.281 
 57.2422981  1371706.000   0 16513.0 1.070802e+03
 10   1992  AUS  GBR  0   1  17.4950008  321708.281 
 57.9023476  1071537.375   0 16602.3 2.279130e+03
 11   1992  AUS  GRC  0   0  17.4950008  321708.281 
 10.369   102022.352   0 14845.6 4.164985e+01
 12   1992  AUS  IRL  0   1  17.4950008  321708.281  
 3.549099954272.410   0 16895.0 1.076323e+02
 13   1992  AUS  ISL  0   0  17.4950008  321708.281  
 0.2611000 6976.168   0 16443.6 2.190602e+01
 14   1992  AUS  ITA  0   0  17.4950008  321708.281 
 56.7976494  1265800.125   0 15855.4 9.683720e+02
 15   1992  AUS  JPN  0   0  17.4950008  321708.281
 124.2289963  3766884.000   0  7827.1 1.026065e+04
 16   1992  AUS  NLD  0   0  17.4950008  321708.281 
 15.1780005   348224.562   0 16227.5 6.510009e+02
 17   1992  AUS  NOR  0   0  17.4950008  321708.281  
 4.2863998   127170.328   0 15646.2 9.357240e+01
 18   1992  AUS  NZL  0   1  17.4950008  321708.281  
 3.531699940706.199   1  2736.4 2.267670e+03
 19   1992  AUS  PRT  0   0  17.4950008  321708.281  
 9.9630003   102890.258   0 17625.3 2.611476e+02
 20   1992  AUS  SWE  0   0  17.4950008  321708.281  
 8.6680002   264822.875   0 15385.4 4.653388e+02
 
 
 there is 3400 observations.
 
 3.1.1. Construct a dummy variable, EMU, that in any given year takes the
 value 1 if both countries are members of the EMU and 0 otherwise. How big a
 proportion of the observations are among EMU member countries?
 
 This problem is solved with:
 euro-c(AUT,BEL,DEU,ESP,FIN,FRA,GRC,IRL,ITA,NLD,PRT)
 countries-data.frame(country1,country2,stringsAsFactors=FALSE)
 data1-cbind(data,EMU=Reduce(``, lapply(countries, function(x) x %in%
 euro)))
 
 data1[EMU==TRUE,13]
 
 a-table(EMU)
 
 
 3.1.2. Are the member and non-member country-pairs alike? 
 
 What I need here is:
 I want to find the mean of avgflow, but only for the data where 2 countries
 are in the euro vector/if EMU=TRUE ?
 I have tried with:
 avgflowONLY-cbind(avgflow,EMU)
 
 NEWavgflow-rep(0,nrow(avgflowONLY))
 
 for (i in 1:nrow(avgflowONLY)){if
 (EMU==1){NEWavgflow[i]-mean(avgflow[i])}}
 
 BUT it gives me: 
 Warning messages:
 1: In if (EMU == 1) { ... :
  the condition has length  1 and only the first element will be used
 etc. ???
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578672.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assignment problems

2012-04-22 Thread Rui Barradas
Hello,


phillip03 wrote
 
 The text below is a part of, some work I have to do, which is due in 2
 days and I am strung up with a lot of other stuff, so I was hoping someone
 would take 5 mins and help me ??
 
 Here is a part of my data.frame:
 
  year country1 country2 contig comlangpop1gdp1   
 pop2 gdp2 rtadist  avgflow
 11992  AUS  AUT  0   0  17.4950008  321708.281  
 7.7825189   194684.078   0 15608.4 1.075999e+02
 21992  AUS  BEL  0   0  17.4950008  321708.281 
 10.0450001   231762.094   0 16319.2 4.767162e+02
 31992  AUS  CAN  0   1  17.4950008  321708.281 
 28.5195980   570291.188   0 15391.1 7.456945e+02
 41992  AUS  CHE  0   0  17.4950008  321708.281  
 6.875   249471.422   0 16170.1 4.625214e+02
 51992  AUS  DEU  0   0  17.4950008  321708.281 
 80.6240005  2062141.500   0 15935.1 2.047573e+03
 61992  AUS  DNK  0   0  17.4950008  321708.281  
 5.171   150195.484   0 15725.5 1.453406e+02
 71992  AUS  ESP  0   0  17.4950008  321708.281 
 39.0677490   612585.250   0 17072.9 2.106880e+02
 81992  AUS  FIN  0   0  17.4950008  321708.281  
 5.0419998   109859.438   0 14849.5 2.025125e+02
 91992  AUS  FRA  0   0  17.4950008  321708.281 
 57.2422981  1371706.000   0 16513.0 1.070802e+03
 10   1992  AUS  GBR  0   1  17.4950008  321708.281 
 57.9023476  1071537.375   0 16602.3 2.279130e+03
 11   1992  AUS  GRC  0   0  17.4950008  321708.281 
 10.369   102022.352   0 14845.6 4.164985e+01
 12   1992  AUS  IRL  0   1  17.4950008  321708.281  
 3.549099954272.410   0 16895.0 1.076323e+02
 13   1992  AUS  ISL  0   0  17.4950008  321708.281  
 0.2611000 6976.168   0 16443.6 2.190602e+01
 14   1992  AUS  ITA  0   0  17.4950008  321708.281 
 56.7976494  1265800.125   0 15855.4 9.683720e+02
 15   1992  AUS  JPN  0   0  17.4950008  321708.281
 124.2289963  3766884.000   0  7827.1 1.026065e+04
 16   1992  AUS  NLD  0   0  17.4950008  321708.281 
 15.1780005   348224.562   0 16227.5 6.510009e+02
 17   1992  AUS  NOR  0   0  17.4950008  321708.281  
 4.2863998   127170.328   0 15646.2 9.357240e+01
 18   1992  AUS  NZL  0   1  17.4950008  321708.281  
 3.531699940706.199   1  2736.4 2.267670e+03
 19   1992  AUS  PRT  0   0  17.4950008  321708.281  
 9.9630003   102890.258   0 17625.3 2.611476e+02
 20   1992  AUS  SWE  0   0  17.4950008  321708.281  
 8.6680002   264822.875   0 15385.4 4.653388e+02
 
 
 there is 3400 observations.
 
 3.1.1. Construct a dummy variable, EMU, that in any given year takes the
 value 1 if both countries are members of the EMU and 0 otherwise. How big
 a proportion of the observations are among EMU member countries?
 
 This problem is solved with:
 
 euro-c(AUT,BEL,DEU,ESP,FIN,FRA,GRC,IRL,ITA,NLD,PRT)
  countries-data.frame(country1,country2,stringsAsFactors=FALSE)
  data1-cbind(data,EMU=Reduce(``, lapply(countries, function(x) x %in%
 euro)))
  
  data1[EMU==TRUE,13]
 
  a-table(EMU)
 
 
 3.1.2. Are the member and non-member country-pairs alike? 
 
 What I need here is:
 I want to find the mean of avgflow, but only for the data where 2
 countries are in the euro vector/if EMU=TRUE ?
 I have tried with:
avgflowONLY-cbind(avgflow,EMU)
 
 NEWavgflow-rep(0,nrow(avgflowONLY))
 
 for (i in 1:nrow(avgflowONLY)){if
 (EMU==1){NEWavgflow[i]-mean(avgflow[i])}}
 
 BUT it gives me: 
 Warning messages:
 1: In if (EMU == 1) { ... :
   the condition has length  1 and only the first element will be used
 etc. ???
 

You're forgeting the index in the conditon, EMU[i] == 1.
Note that since EMU is a logical vector, you don't need the explicit
comparison.

If you just want the mean of avgflow where EMU == TRUE, this is much
simpler, but
returns one value, not a vector.

mean(avgflow[ EMU ])


Hope this helps,

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578739.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need advice on using excel to check data for import into R

2012-04-22 Thread Richard M. Heiberger
This looks like a perfect case for an RExcel solution.
RExcel is an addin that allows you, among other things, to place an
arbitrary R function inside the
Excel automatic recalculation mode.  For details see
rcom.univie.ac.at
There are many references item listed on the wiki page in the left panel.
For further followup, please sign up for the rcom mailing list, again with
the
details on the web site.

Rich

On Sun, Apr 22, 2012 at 2:34 PM, Markus Weisner r...@themarkus.com wrote:

 I have created an S4 object type for conducting fire department data
 analysis.  The object includes validity check that ensures certain fields
 are present and that duplicate records don't exist for certain combinations
 of columns (e.g. no duplicate incident number / incident data / unit ID
 ensures that the data does not show the same fire engine responding twice
 on the same call).

 I am finding that I spend a lot of time taking client data, converting it
 to my S4 object, and then sending it back to the client to correct data
 validity issues.

 I am trying to figure out a clever way to have excel (typically the program
 used by my clients) check client data prior to them submitting it to me.  I
 have been working with somebody on trying to develop an excel toolbar
 add-in with limited success.

 My question is whether anybody can think of clever alternatives for clients
 to validate their data … for example, is their a R excel plugin (that would
 be easily installed by a client) where I might be able write some lines of
 R to check the data and output messages … or maybe some sort of server
 where they could upload their data and I could have some lines of R code
 that would check the code and send back potential error messages?

 I realize this is a fairly open ended question … just looking for some
 general ideas and directions to go. Getting a little frustrated with
 spending most of my work time dealing with data cleaning issues … guessing
 this is a problem shared by many of us that use R!

 Thanks,
 Markus

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to take ID of number 7.

2012-04-22 Thread Steve Lianoglou
On Sun, Apr 22, 2012 at 7:03 AM, Yellow s1010...@student.hsleiden.nl wrote:
 I figured out something new that I would like to see if I can do this more
 easy with R then Excel.

 I have these huge files with data.
 For example:

 DataFile.csv
 ID Name log2
 1 Fantasy 5.651
 2 New 7.60518
 3 Finding 8.9532
 4 Looeka -0.248652
 5 Vani 0.3548

 With like header1: ID, header 2: Name, header 3: log2

 Now I need to get the $ID out who have a log2 value higher then 7.

 I know ho to grab the $log2 values with 7+ numbers.

 Log2HigherSeven = DataFile$log2 [ DataFile$log2 = 7]

 But how can I take thise ID numbers also?

Seems like there were already a few suggestions in this thread, but
I'm surprised no one has suggested the use of `subset` yet, see
?subset:

R interesting - subset(DataFile, log2 = 7)$ID

Now play with the `interesting` data.frame to get the data you need

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using a loop with an integration

2012-04-22 Thread piltdownpunk
Hi, all. 

I've written a function that returns the survival function for a Gompertz
mortality model.  I've specified the two model parameters.  Using a simple
integration, I can calculate the life expectancy at any age.  Is there a way
I can use a loop with the integration that will quickly return life
expectancy over a range of ages, say 0 to 80, so that I don't have to
manually type in the age in which I'm interested?  Please see the code
below.  Thanks so much.

--Trey

hk.bothsex_Gompsurv - function (t)
{
x=c(0.02342671, 0.05837508)
a3-x[1]
b3-x[2]
shift-15

S.t-exp(a3/b3*(1-exp(b3*(t-shift
return-S.t
}
integrate(hk.bothsex_Gompsurv,0,Inf)$value/hk.bothsex_Gompsurv(0)   # life
expectancy at birth (change lower limit of integral and corresponding t in
denominator to calculate life expectancy at any age

-
Trey Batey---Anthropology Instructor
Division of Social Sciences
Mt. Hood Community College
Gresham, OR  97030
Alt. Email:  trey.batey[at]mhcc[dot]edu
--
View this message in context: 
http://r.789695.n4.nabble.com/using-a-loop-with-an-integration-tp4578752p4578752.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assignment problems

2012-04-22 Thread phillip03
I have tried ifelse:

 trade-data.frame(avgflow,EMU,stringsAsFactors=FALSE)

 avgflowEURO-rep(0,nrow(trade))

 trade1-(for (i in
 1:nrow(trade)){ifelse(EMU[i]==1,avgflowEURO[i]-avgflow[i],NA)}) 

--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578754.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assignment problems

2012-04-22 Thread phillip03
Does mean(avgflow[EMU]) sum the avgflows for all countrypairs where
EMU[i]==TRUE and take the mean ? Practical question: is mean(avgflow[EMU]) =
mean(avgflow[EMU==TRUE]) ??? 

--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578761.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assignment problems

2012-04-22 Thread Rui Barradas

phillip03 wrote
 
 Does mean(avgflow[EMU]) sum the avgflows for all countrypairs where
 EMU[i]==TRUE and take the mean ? Practical question: is mean(avgflow[EMU])
 = mean(avgflow[EMU==TRUE]) ???
 

Answer: yes.

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578772.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need advice on using excel to check data for import into R

2012-04-22 Thread Markus Weisner
If I go to wiki - how to install it looks like a rather complicated
installation that involves installing R followed by several command line
prompts.

It looks like it might be too much of an installation process to make sense
for a client to conduct a one-time data check.

Looks like a great tool though.  Is there a simpler way of deploying Rexcel
that I am not seeing?

Thanks,
Markus


On Sun, Apr 22, 2012 at 3:43 PM, Richard M. Heiberger r...@temple.eduwrote:

 This looks like a perfect case for an RExcel solution.
 RExcel is an addin that allows you, among other things, to place an
 arbitrary R function inside the
 Excel automatic recalculation mode.  For details see
 rcom.univie.ac.at
 There are many references item listed on the wiki page in the left panel.
 For further followup, please sign up for the rcom mailing list, again with
 the
 details on the web site.

 Rich

 On Sun, Apr 22, 2012 at 2:34 PM, Markus Weisner r...@themarkus.com wrote:

 I have created an S4 object type for conducting fire department data
 analysis.  The object includes validity check that ensures certain fields
 are present and that duplicate records don't exist for certain
 combinations
 of columns (e.g. no duplicate incident number / incident data / unit ID
 ensures that the data does not show the same fire engine responding twice
 on the same call).

 I am finding that I spend a lot of time taking client data, converting it
 to my S4 object, and then sending it back to the client to correct data
 validity issues.

 I am trying to figure out a clever way to have excel (typically the
 program
 used by my clients) check client data prior to them submitting it to me.
  I
 have been working with somebody on trying to develop an excel toolbar
 add-in with limited success.

 My question is whether anybody can think of clever alternatives for
 clients
 to validate their data … for example, is their a R excel plugin (that
 would
 be easily installed by a client) where I might be able write some lines of
 R to check the data and output messages … or maybe some sort of server
 where they could upload their data and I could have some lines of R code
 that would check the code and send back potential error messages?

 I realize this is a fairly open ended question … just looking for some
 general ideas and directions to go. Getting a little frustrated with
 spending most of my work time dealing with data cleaning issues … guessing
 this is a problem shared by many of us that use R!

 Thanks,
 Markus

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using a loop with an integration

2012-04-22 Thread David Winsemius


On Apr 22, 2012, at 3:41 PM, piltdownpunk wrote:


Hi, all.

I've written a function that returns the survival function for a  
Gompertz
mortality model.  I've specified the two model parameters.  Using a  
simple
integration, I can calculate the life expectancy at any age.  Is  
there a way

I can use a loop with the integration that will quickly return life
expectancy over a range of ages, say 0 to 80, so that I don't have to
manually type in the age in which I'm interested?  Please see the code
below.


?Vectorize

(Essentially a wrapper to mapply. No data example so no tested code.)

--
David.


--Trey

hk.bothsex_Gompsurv - function (t)
{
x=c(0.02342671, 0.05837508)
a3-x[1]
b3-x[2]
shift-15

S.t-exp(a3/b3*(1-exp(b3*(t-shift
return-S.t
}
integrate(hk.bothsex_Gompsurv,0,Inf)$value/hk.bothsex_Gompsurv(0)	#  
life
expectancy at birth (change lower limit of integral and  
corresponding t in

denominator to calculate life expectancy at any age

-
Trey Batey---Anthropology Instructor
Division of Social Sciences
Mt. Hood Community College
Gresham, OR  97030
Alt. Email:  trey.batey[at]mhcc[dot]edu
--
View this message in context: 
http://r.789695.n4.nabble.com/using-a-loop-with-an-integration-tp4578752p4578752.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Issue with message()

2012-04-22 Thread Axel Urbiz
Dear List,

I built a package under both Mac and Win 7 (both on R 2.12.0) . One of the
functions in the package is set up to print a status message using the code
below:

 if (verbose)
  if ((i %% 10) == 0  i  ntree) message( , i, out of, ntree,
trees so far...)

This works perfectly on the Mac. However, on Win 7 the message is not
printed while the function is executing, but all when it finished running.
Any hint what might be the issue?

Thanks,
Axel.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with message()

2012-04-22 Thread Rolf Turner

On 23/04/12 09:36, Axel Urbiz wrote:

Dear List,

I built a package under both Mac and Win 7 (both on R 2.12.0) . One of the
functions in the package is set up to print a status message using the code
below:

  if (verbose)
   if ((i %% 10) == 0  i  ntree) message( , i, out of, ntree,
trees so far...)

This works perfectly on the Mac. However, on Win 7 the message is not
printed while the function is executing, but all when it finished running.
Any hint what might be the issue?


This has something to do with the Windoze system not (by default)
flushing the buffer.  This behaviour can be changed, but I forget
the details.  I don't use Windoze.  A bit of searching/googling should
lead you fairly quickly to the appropriate procedure for re-setting
the behaviour.

HTH

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assignment problems

2012-04-22 Thread phillip03
Thank you Rui

Can you help me with my ifelse problem - I would like to add a list to my
data.frame where avgflow in those rows where ONLY my country pair both are
in euro

--
View this message in context: 
http://r.789695.n4.nabble.com/Assignment-problems-tp4578672p4578806.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CRAN (and crantastic) updates this week

2012-04-22 Thread Crantastic
CRAN (and crantastic) updates this week

New packages


* appell (0.0-3)
  Maintainer: Daniel Sabanes Bove
  Author(s): Daniel Sabanes Bove daniel.sabanesb...@ifspm.uzh.ch with
 contributions by F. D. Colavecchia, R. C. Forrey, G.
 Gasaneo, N. L. J. Michel, L. F.  Shampine, M. V. Stoitsov
 and H. A. Watts.
  License: GPL (= 3)
  http://crantastic.org/packages/appell

  This package wraps Fortran code by F. D. Colavecchia and G. Gasaneo
  for computing the Appell's F1 hypergeometric function. Their program
  uses Fortran code by L. F. Shampine and H. A. Watts. Moreover, the
  hypergeometric function with complex arguments is computed with
  Fortran code by N. L. J. Michel and M. V. Stoitsov or with Fortran
  code by R. C.  Forrey. See the function documentations for the
  references and please cite them accordingly.

* bayesPop (0.2-2)
  Maintainer: Hana Sevcikova
  Author(s): Hana Sevcikova, Adrian Raftery
  License: GPL (= 2)
  http://crantastic.org/packages/bayesPop

  The package allows to generate population projections for all
  countries of the world using several probabilistic components, such
  as total fertility rate (TFR) and life expectancy.

* cec2005benchmark (1.0.0)
  Maintainer: Yasser González-Fernández
  Author(s): Yasser González-Fernández ygonzalezfernan...@gmail.com and Marta
 Soto mr...@icimaf.cu
  License: GPL (= 3)
  http://crantastic.org/packages/cec2005benchmark

  This package is a wrapper for the C implementation of the 25 benchmark
  functions for the CEC 2005 Special Session on Real-Parameter
  Optimization. The original C code by Santosh Tiwari and related
  documentation are available at http://www.ntu.edu.sg/home/EPNSugan/.

* compound.Cox (1.0)
  Maintainer: Takeshi Emura, Graduate Institute of Statistics, National Central 
University, Taiwan
  Author(s): Takeshi Emura  Yi-Hau Chen
  License: GPL-2
  http://crantastic.org/packages/compound-Cox

  Calculate regression coefficients and their standard errors under the
  Cox proportional hazard model with the large number of covariates.

* dgmb (1.0)
  Maintainer: Alba Martinez-Ruiz
  Author(s): Alba Martinez-Ruiz amart...@ucsc.cl and Claudia Martinez-Araneda
 cmarti...@ucsc.cl
  License: GPL (= 2)
  http://crantastic.org/packages/dgmb

  Random data generation for PLS structural models.

* diffEq (1.0)
  Maintainer: Karline Soetaert
  Author(s): Karline Soetaert karline.soeta...@nioz.nl
  License: GPL
  http://crantastic.org/packages/diffEq

  Functions and examples from the book Solving Differential Equations in
  R by Karline Soetaert, Jeff R Cash and Francesca Mazzia.  Springer,
  2012.

* dkDNA (0.1.0)
  Maintainer: Gota Morota
  Author(s): Gota Morota and Masanori Koyama
  License: GPL-2
  http://crantastic.org/packages/dkDNA

  Compute diffusion kernels on DNA polymorphisms, including SNP and
  bi-allelic genotypes.

* frmqa (0.1-0)
  Maintainer: Thanh T. Tran
  Author(s): Thanh T. Tran
  License: GPL (= 2)
  http://crantastic.org/packages/frmqa

  R and C++ functions for financial risk management and quantative
  analysis, using the generalized perbolic and its related
  distributions.

* geospt (0.4-9)
  Maintainer: Alí Santacruz
  Author(s): Carlos Melo cm...@udistrital.edu.co, Alí Santacruz, Oscar Melo
 oome...@unal.edu.co and others
  License: GPL (= 2)
  http://crantastic.org/packages/geospt

  This package contains functions for: estimation of the variogram
  through trimmed mean, radial basis functions (optimization,
  prediction and cross-validation), summary statistics from
  cross-validation, pocket plot, and design of optimal sampling
  networks through sequential and simultaneous points methods

* JohnsonDistribution (0.24)
  Maintainer: A.I. McLeod
  Author(s): A.I. McLeod and Leanna King
  License: GPL (= 2)
  http://crantastic.org/packages/JohnsonDistribution

  Johnson curve distributions.  Implementation of AS100 and AS99.

* labeledLoop (0.1)
  Maintainer: Kohske Takahashi
  Author(s): Kohske Takahashi
  License: MIT
  http://crantastic.org/packages/labeledLoop

  Support labeled loop and escape from nested loop

* logitnorm (0.8.26)
  Maintainer: Thomas Wutzler
  Author(s): Thomas Wutzler
  License: GPL-2
  http://crantastic.org/packages/logitnorm

  Density, distribution, quantile and random generation function for the
  logitnormal distribution. Estimation of the mode and the first two
  moments. Estimation of distribution parameters.

* LOST (1.0)
  Maintainer: Jessica Arbour
  Author(s): J. Arbour and C. Brown
  License: GPL (= 2)
  http://crantastic.org/packages/LOST

  LOST includes functions for simulating missing morphometric data
  randomly, with taxonomic bias and with anatomical bias. This package
  also includes functions for estimating missing morphometric data
  based on regression.

* NHPoisson (1.0)
  Maintainer: Ana C. Cebrian
  Author(s): Ana C. Cebrian
  License: GPL (= 2)
  

Re: [R] Issue with message()

2012-04-22 Thread Duncan Murdoch

On 12-04-22 5:36 PM, Axel Urbiz wrote:

Dear List,

I built a package under both Mac and Win 7 (both on R 2.12.0) . One of the
functions in the package is set up to print a status message using the code
below:

  if (verbose)
   if ((i %% 10) == 0  i  ntree) message( , i, out of, ntree,
trees so far...)

This works perfectly on the Mac. However, on Win 7 the message is not
printed while the function is executing, but all when it finished running.
Any hint what might be the issue?



Buffered output.  Use Ctrl-W or menu item Misc|Buffered output to change it.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA sensitive to outliers?

2012-04-22 Thread Michael
I actually tried robustPca in pcaMethods on bioconductor.

It keeps giving me the warning Input data is not complete...

Reading into the function:

When there is no NAs, it will give this warning...

It seems that there is a bug in this code...

Is it reliable at all?

-


 robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...)
{
nas - is.na(Matrix)
if (!any(nas)  verbose) {
cat(Input data is not complete.\n)
cat(Scores, R2 and R2cum may be inaccurate, handle with care\n)
}





On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote:

 You can also have a look at the pcaMethods package on Bioconductor.

 Kevin


  On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote:

  Hi all,

 I found that the PCA gave chaotic results when there are big changes in a
 few data points.

 Are there improved versions of PCA in R that can help with this problem?

 Please give me some pointers...

 Thank you!

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Kevin Wright



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA sensitive to outliers?

2012-04-22 Thread Michael
Any thoughts on this error in robustSVD?

Thanks a lot!

Error in if (!all(tmp)) { : missing value where TRUE/FALSE needed

Enter a frame number, or 0 to exit

1: #73: pca(dTmp, method = robustPca, nPcs = nNumFactors, center = FALSE)

2: robustPca(prepres$data, nPcs = nPcs, ...)

3: robustSvd(Matrix)

4: apply(x, 1, L1RegCoef, bk)

5: FUN(newX[, i], ...)

6: weightedMedian(x[keep]/a, abs(a), interpolate = FALSE)

7: weightedMedian.default(x[keep]/a, abs(a), interpolate = FALSE)


On Sun, Apr 22, 2012 at 6:43 PM, Michael comtech@gmail.com wrote:

 I actually tried robustPca in pcaMethods on bioconductor.

 It keeps giving me the warning Input data is not complete...

 Reading into the function:

 When there is no NAs, it will give this warning...

 It seems that there is a bug in this code...

 Is it reliable at all?

 -


  robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...)
 {
 nas - is.na(Matrix)
 if (!any(nas)  verbose) {
 cat(Input data is not complete.\n)
 cat(Scores, R2 and R2cum may be inaccurate, handle with care\n)
 }





 On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote:

 You can also have a look at the pcaMethods package on Bioconductor.

 Kevin


  On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote:

  Hi all,

 I found that the PCA gave chaotic results when there are big changes in a
 few data points.

 Are there improved versions of PCA in R that can help with this
 problem?

 Please give me some pointers...

 Thank you!

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Kevin Wright




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA sensitive to outliers?

2012-04-22 Thread Joshua Wiley
On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote:
 I actually tried robustPca in pcaMethods on bioconductor.

 It keeps giving me the warning Input data is not complete...

 Reading into the function:

 When there is no NAs, it will give this warning...

 It seems that there is a bug in this code...

 Is it reliable at all?

 -


 robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...)
 {
    nas - is.na(Matrix)
    if (!any(nas)  verbose) {
        cat(Input data is not complete.\n)
        cat(Scores, R2 and R2cum may be inaccurate, handle with care\n)
    }

that seems to issue the notes when there are *not any missing* and
verbose is TRUE.  I would submit a bug report to the author.






 On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote:

 You can also have a look at the pcaMethods package on Bioconductor.

 Kevin


  On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com wrote:

  Hi all,

 I found that the PCA gave chaotic results when there are big changes in a
 few data points.

 Are there improved versions of PCA in R that can help with this problem?

 Please give me some pointers...

 Thank you!

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Kevin Wright



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA sensitive to outliers?

2012-04-22 Thread Michael
Even in R, there are so many of robust PCA... any survey or review of all
these different methods?

On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley jwiley.ps...@gmail.comwrote:

 On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote:
  I actually tried robustPca in pcaMethods on bioconductor.
 
  It keeps giving me the warning Input data is not complete...
 
  Reading into the function:
 
  When there is no NAs, it will give this warning...
 
  It seems that there is a bug in this code...
 
  Is it reliable at all?
 
  -
 
 
  robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...)
  {
 nas - is.na(Matrix)
 if (!any(nas)  verbose) {
 cat(Input data is not complete.\n)
 cat(Scores, R2 and R2cum may be inaccurate, handle with care\n)
 }

 that seems to issue the notes when there are *not any missing* and
 verbose is TRUE.  I would submit a bug report to the author.

 
 
 
 
 
  On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote:
 
  You can also have a look at the pcaMethods package on Bioconductor.
 
  Kevin
 
 
   On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com
 wrote:
 
   Hi all,
 
  I found that the PCA gave chaotic results when there are big changes
 in a
  few data points.
 
  Are there improved versions of PCA in R that can help with this
 problem?
 
  Please give me some pointers...
 
  Thank you!
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 http://www.r-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
  --
  Kevin Wright
 
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 Programmer Analyst II, Statistical Consulting Group
 University of California, Los Angeles
 https://joshuawiley.com/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA sensitive to outliers?

2012-04-22 Thread Bert Gunter
As I believe I already told you, look at the CRAN Robust task view.

-- Bert

On Sun, Apr 22, 2012 at 6:29 PM, Michael comtech@gmail.com wrote:
 Even in R, there are so many of robust PCA... any survey or review of all
 these different methods?

 On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley jwiley.ps...@gmail.comwrote:

 On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote:
  I actually tried robustPca in pcaMethods on bioconductor.
 
  It keeps giving me the warning Input data is not complete...
 
  Reading into the function:
 
  When there is no NAs, it will give this warning...
 
  It seems that there is a bug in this code...
 
  Is it reliable at all?
 
  -
 
 
  robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...)
  {
     nas - is.na(Matrix)
     if (!any(nas)  verbose) {
         cat(Input data is not complete.\n)
         cat(Scores, R2 and R2cum may be inaccurate, handle with care\n)
     }

 that seems to issue the notes when there are *not any missing* and
 verbose is TRUE.  I would submit a bug report to the author.

 
 
 
 
 
  On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com wrote:
 
  You can also have a look at the pcaMethods package on Bioconductor.
 
  Kevin
 
 
   On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com
 wrote:
 
   Hi all,
 
  I found that the PCA gave chaotic results when there are big changes
 in a
  few data points.
 
  Are there improved versions of PCA in R that can help with this
 problem?
 
  Please give me some pointers...
 
  Thank you!
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 http://www.r-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
  --
  Kevin Wright
 
 
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 Programmer Analyst II, Statistical Consulting Group
 University of California, Los Angeles
 https://joshuawiley.com/


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] linear model benchmarking

2012-04-22 Thread ivo welch
I cleaned up my old benchmarking code and added checks for missing
data to compare various ways of finding OLS regression coefficients.
I thought I would share this for others.  the long and short of it is
that I would recommend

   ols.crossprod = function (y, x) {
  x - as.matrix(x)
  ok - (!is.na(y))(!is.na(rowSums(x)))
  y - y[ok]; x - subset(x, ok)
  x - cbind( 1, x)

  XtX - crossprod(x)
  Xty - crossprod(x, y)
  solve(XtX, Xty)
}

for fast and stable coefficients.  (yes, stable using double
precision, even though not as stable as lm().  it works just fine with
X variables that have 99.99% correlation with as few as 100
observations.  if your situation is worse than this, you probably have
an error in your data---or you are looking for the Higgs Boson.)

I added the code below.  feel free to ignore.

/iaw



###
### code to test alternatives how fast OLS coefficients can be obtained.
### including tests to exclude missing observations where necessary.
###
### for a more informed article (and person), see Bates, 'Least Squares
### Calculations in R', Rnews 2004-1. the code here does not test his sparse
### matrix examples, or his geMatrix/poMatrix examples.
###
### Basic Results: for the examples that I tried, typical relative time
### factors of the algorithms were about
###
###  lm  lmfit solve crossprod  cholesky  (special-case 2vars)
###  1.0  0.5   0.30.15  0.17 0.1
###
### there was no advantage to cholesky, so you may as well use the simpler
### crossprod.
###
### I was also interested in algorithm scaling N and K.  yes, there were
### some changes in the factors across algorithms, but the general pattern
### wasn't too different.  for the cholesky decomposition,
###
###N=1000   N=1   N=10   N=20
###K=1   1.0   780   160
###K=10  2.5  26
###K=50 16
###K=1004370
###K=200   140
###
### some of this may well be swap/memory access, etc.  roughly speaking, we
### scale ten-times N takes twice as long. 10 and ten-times K takes 25 times
### as long.
###
### of course, ols.crossprod and ols.cholesky are not as stable as ols.lm,
### but they are still amazingly stable, given the default double precision.
### even with a correlation of 0.99(!) between the two final columns,
### they still produce exactly the same result as ols.lm with 1000
### observations.  frankly, the ill-conditioning worry is overblown with
### most real-world data.  if you really have data THIS bad, you should
### already know it; and you probably just have some measurement errors in
### your observations, and your regression is giving you garbage either way.
###
### if I made the R core decision, I would switch away from lm()'s default
### method, and make it a special option.  my guess is that it is status-quo
### bias that keeps the current method.  or, at least I would say loudly in
### the R docs that for common use, here is a much faster method...
###


MC - 100
N - 1000
K - 10
SD - 1e-3

ols - list(
ols.lm = function (y, x) { coef(lm(y ~ x)) },

ols.lmfit = function (y, x) {
  x - as.matrix(x)
  ok - (!is.na(y))(!is.na(rowSums(x)))
  y - y[ok]; x - subset(x, ok)
  x - as.matrix(cbind( 1, x))
  lm.fit(x, y)$coefficients
},

ols.solve = function (y, x) {
  x - as.matrix(x)
  ok - (!is.na(y))(!is.na(rowSums(x)))
  y - y[ok]; x - subset(x, ok)
  x - cbind(1, x)
  xy - t(x)%*%y
  xxi - solve(t(x)%*%x)
  b - as.vector(xxi%*%xy)
  b
},

ols.crossprod = function (y, x) {
  x - as.matrix(x)
  ok - (!is.na(y))(!is.na(rowSums(x)))
  y - y[ok]; x - subset(x, ok)
  x - cbind( 1, x)

  XtX - crossprod(x)
  Xty - crossprod(x, y)
  solve(XtX, Xty)
},

ols.cholesky = function (y, x) {
  x - as.matrix(x)
  ok - (!is.na(y))(!is.na(rowSums(x)))
  y - y[ok]; x - subset(x, ok)
  x - cbind( 1, x)

  ch - chol( crossprod(x) )
  backsolve(ch, forwardsolve(ch, crossprod(x,y),
upper=TRUE, trans=TRUE))
}

)

set.seed(0)
y - matrix(rnorm(N*MC), N, MC)
x - array(rnorm(MC*K*N), c(N, K, MC))

cat(N=, N, K=, K,   (MC=, MC, ))
if (K1) {
  sum.cor - 0
  for (mc in 1:MC) {
x[,K,mc] - x[,K-1,mc]+rnorm(N, sd=SD)
sum.cor - sum.cor + cor(x[,K,mc], x[,K-1,mc], use=pair)
  }
  options(digit=10)
  cat(  sd=, SD, The bad corr= 1+, sum.cor/MC-1 )
} else {
  ols$ols.xy.Kis1 - function(y, x) {

[R] Scrape data from Scopus: login through R?

2012-04-22 Thread mdvaan
Hello,

The Scopus bibliographic database allows one to manually download batches of
2000 publications. The data is rich but does not provide one with a field
containing the author id. However, author id's can be retrieved through the
hyperlinks on the Scopus website. I have two questions:

1. My institution has a Scopus license, so I need to login. How do I do that
in R (through Rcurl, XML?)?
2. How do I scrape hyperlinks?

Your help is appreciated.

Thanks

Math

--
View this message in context: 
http://r.789695.n4.nabble.com/Scrape-data-from-Scopus-login-through-R-tp4579261p4579261.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROCR for combination of markers

2012-04-22 Thread suo
Hi Eik or other who might help:

I got this error:
Error in roc.formula(form = y1 ~ x + z, plot = ROC) : 
  Invalid formula: exactly 1 predictor is required in a formula of type
response~predictor.

when I ran out=ROC( form = y1 ~ x + z, plot=ROC)  from your code.

How to fix it?
Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/ROCR-for-combination-of-markers-tp3480010p4579092.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] more boa plots questions

2012-04-22 Thread Chihuahuin
boa.plot('trace')

--
View this message in context: 
http://r.789695.n4.nabble.com/more-boa-plots-questions-tp3330312p4579163.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA sensitive to outliers?

2012-04-22 Thread Michael
yes, but that is not a good Review or Survey... thx

On Sun, Apr 22, 2012 at 9:47 PM, Bert Gunter gunter.ber...@gene.com wrote:

 As I believe I already told you, look at the CRAN Robust task view.

 -- Bert

 On Sun, Apr 22, 2012 at 6:29 PM, Michael comtech@gmail.com wrote:
  Even in R, there are so many of robust PCA... any survey or review of
 all
  these different methods?
 
  On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley jwiley.ps...@gmail.com
 wrote:
 
  On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote:
   I actually tried robustPca in pcaMethods on bioconductor.
  
   It keeps giving me the warning Input data is not complete...
  
   Reading into the function:
  
   When there is no NAs, it will give this warning...
  
   It seems that there is a bug in this code...
  
   Is it reliable at all?
  
   -
  
  
   robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...)
   {
  nas - is.na(Matrix)
  if (!any(nas)  verbose) {
  cat(Input data is not complete.\n)
  cat(Scores, R2 and R2cum may be inaccurate, handle with
 care\n)
  }
 
  that seems to issue the notes when there are *not any missing* and
  verbose is TRUE.  I would submit a bug report to the author.
 
  
  
  
  
  
   On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com
 wrote:
  
   You can also have a look at the pcaMethods package on Bioconductor.
  
   Kevin
  
  
On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com
  wrote:
  
Hi all,
  
   I found that the PCA gave chaotic results when there are big changes
  in a
   few data points.
  
   Are there improved versions of PCA in R that can help with this
  problem?
  
   Please give me some pointers...
  
   Thank you!
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
 http://www.r-project.org/posting-guide.html
  http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
  
  
  
   --
   Kevin Wright
  
  
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
 http://www.r-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Joshua Wiley
  Ph.D. Student, Health Psychology
  Programmer Analyst II, Statistical Consulting Group
  University of California, Los Angeles
  https://joshuawiley.com/
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA sensitive to outliers?

2012-04-22 Thread Steve Lianoglou
On Mon, Apr 23, 2012 at 12:01 AM, Michael comtech@gmail.com wrote:
 yes, but that is not a good Review or Survey... thx

But the packages listed there do have their own documentation and
vignettes. For instance the rrcov package seems to have a nice
vignette about its design as well as methods it implements, and
references to these methods for further reading:

http://cran.r-project.org/web/packages/rrcov/vignettes/rrcov.pdf

You'll see at least a few mentions of PCA, which will lead you to
other package/papers/etc.

Enjoy,

-steve


 On Sun, Apr 22, 2012 at 9:47 PM, Bert Gunter gunter.ber...@gene.com wrote:

 As I believe I already told you, look at the CRAN Robust task view.

 -- Bert

 On Sun, Apr 22, 2012 at 6:29 PM, Michael comtech@gmail.com wrote:
  Even in R, there are so many of robust PCA... any survey or review of
 all
  these different methods?
 
  On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley jwiley.ps...@gmail.com
 wrote:
 
  On Sun, Apr 22, 2012 at 4:43 PM, Michael comtech@gmail.com wrote:
   I actually tried robustPca in pcaMethods on bioconductor.
  
   It keeps giving me the warning Input data is not complete...
  
   Reading into the function:
  
   When there is no NAs, it will give this warning...
  
   It seems that there is a bug in this code...
  
   Is it reliable at all?
  
   -
  
  
   robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...)
   {
      nas - is.na(Matrix)
      if (!any(nas)  verbose) {
          cat(Input data is not complete.\n)
          cat(Scores, R2 and R2cum may be inaccurate, handle with
 care\n)
      }
 
  that seems to issue the notes when there are *not any missing* and
  verbose is TRUE.  I would submit a bug report to the author.
 
  
  
  
  
  
   On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright kw.s...@gmail.com
 wrote:
  
   You can also have a look at the pcaMethods package on Bioconductor.
  
   Kevin
  
  
    On Thu, Apr 19, 2012 at 11:20 PM, Michael comtech@gmail.com
  wrote:
  
    Hi all,
  
   I found that the PCA gave chaotic results when there are big changes
  in a
   few data points.
  
   Are there improved versions of PCA in R that can help with this
  problem?
  
   Please give me some pointers...
  
   Thank you!
  
          [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
 http://www.r-project.org/posting-guide.html
  http://www.r-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  
  
  
  
   --
   Kevin Wright
  
  
  
          [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
 http://www.r-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  Joshua Wiley
  Ph.D. Student, Health Psychology
  Programmer Analyst II, Statistical Consulting Group
  University of California, Los Angeles
  https://joshuawiley.com/
 
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] slanted stacked bar graphs?

2012-04-22 Thread Susanna Makela
Hi Barry,

Thanks so much for the Junk Charts link. Maybe it'll help me make my case
for why we shouldn't present our data like this.

Susanna

On Mon, Apr 9, 2012 at 1:07 PM, Barry Rowlingson 
b.rowling...@lancaster.ac.uk wrote:

 On Mon, Apr 9, 2012 at 7:29 AM, Susanna Makela
 susanna.m.mak...@gmail.com wrote:
  Hello R users,
 
  I would like to generate slanted stacked bar graphs like those on
  the bottom of pages 1 and 2 in this document:
 
 http://www.wssinfo.org/fileadmin/user_upload/resources/JMP-Snapshot-SWA-HLM.pdf
  . I've also attached the file to this email (pdf). Does anyone know if
  this is possible in R? I have tried googling and searching the R help
  archives, and it seems like ggplot2 might be able to make such graphs,
  but I'm not familiar enough with graphics in R to know for sure.
 
  (I personally don't feel that these slanted bar graphs - not sure if
  they have an actual name - convey the intended information very well,
  but I have to try and make them all the same. However, I am open to
  alternative suggestions for visualizing similar data if anyone has
  ideas.)
 

 These exact charts have been critiqued on the Junk Charts blog:

 http://junkcharts.typepad.com/junk_charts/2010/02/cousin-misfit.html

  and you'll even find some ggplot code in the comments for doing them.
 If you still want to...

  I just did a google image search for 'ggplot stacked' and there they were.

 Barry

 --
 blog: http://geospaced.blogspot.com/
 web: http://www.maths.lancs.ac.uk/~rowlings
 web: http://www.rowlingson.com/
 twitter: http://twitter.com/geospacedman
 pics: http://www.flickr.com/photos/spacedman


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.