from:"James Reilly"

Re: [R] R survey package again

2007-09-08 Thread James Reilly


On 7/9/07 11:42 PM, eugen pircalabelu wrote:
I have a sample from a survey where household were interviewed. The 
sample has 4 criteria on which the stratification was based: REGION, 
SIZE OF HOUSEHOLD, SIZE OF LOCALITY, AGE OF HEAD OF HOUSEHOLD. Since i 
don't have the whole information in each cell of the cross 
region*sizehh*sizeloc*age i can't use the postStratify function from 
Survey package. Is that correct? (I think so but i need a competent answer)
 
The only additional info that i have is the size of a cell from a 
2*2 crossing (eg: I know the population size for all the strata defined 
by region*sizehh, region*sizeloc, sizeloc*age) so i have the 
behaviour of the population but in a 2 by 2 cross for each of these 
criteria.


You're right, poststratification can't work from two-way marginal 
distributions, but raking or calibration can.

However it seems odd that you only have this much information, since the 
full joint distribution would have been needed for stratification. 
Usually these details would be documented as part of the sample design. 
Can you get this information from those responsible for the sample 
design? It would also be good to check your understanding of the design. 
A sampling frame listing details of household size and age of household 
head would have been needed to do the four-way stratification you 
mention, but in my experience such frames aren't very common.

James
-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Survey package

2007-09-07 Thread James Reilly


On 7/9/07 12:36 AM, eugen pircalabelu wrote:
I'm trying to use the Survey package for a stratified sample which 
has 4 criteria on which the stratification is based. I would like to get 
the corrected weights and for every element i get  a weight of 1
 
E.g: tipping
 
 design - svydesign (id=~1, strata= ~regiune + size_loc + 
age_rec_hhh + size_hh, data= tabel)
 and then  weights(design)
 
gives me:  1,1,1,1,1,1,1,1,1,1,1,... for each element

The weights are all 1 because you haven't told R how they should be 
calculated. If the sampling weights should be constant within strata, 
you can simply specify the population figures for each stratum in 
svydesign's fpc argument and it will calculate the weights for you. 
Various techniques for adjusting the weights are also supported; see 
http://faculty.washington.edu/tlumley/survey/example-poststrat.html

James
-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Excel

2007-08-30 Thread James Reilly


On 30/8/07 6:42 AM, Erich Neuwirth wrote:
 There is one feature in Excel which is extremely convenient, Pivot
 tables. Anybody doing any work as statistical consultant really ought to
 know about Pivot tables, and I am still surprised how many statisticians
 do not know about it. Neither Gnumeric nor OpenOffice Calc offer
 comparably convenient ways working with multidimensional tables.

I'm not familiar with Gnumeric, but OpenOffice has provided the similar 
DataPilot feature for at least three years; see for example 
http://www.openofficetips.com/blog/archives/2004/09/datapilot_101.html

James
-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] getting strata/cluster level values with survey package?

2006-02-09 Thread James Reilly


Try the examples here:
?ftable.svystat

On 8/02/2006 12:23 p.m., Jeff D. Hamann wrote:
 First, I appoligise for the rooky question, but...
 
 I'm trying to obtain standard errors, confidence intervals, etc. from a
 sample design and have been trouble getting the results for anything other
 than the basic total or mean for the overall survey from the survey
 package.
 
 For example, using the following dataset,
 
 strata,cluster,vol
 A,1,18.58556192
 A,1,12.55175443
 A,1,21.65882438
 A,1,17.11172946
 A,1,15.41713348
 A,2,13.9344623
 A,2,17.13104821
 A,2,14.6806479
 A,2,14.68357291
 A,2,18.86017714
 A,2,20.67642515
 A,2,15.15295351
 A,2,13.82121102
 A,2,12.9110477
 A,2,14.83153677
 A,2,21.90772687
 A,3,18.69795427
 A,3,18.45636428
 A,3,15.77175793
 A,3,15.54715217
 A,3,20.31948393
 A,3,19.26391445
 A,3,15.54750775
 A,3,19.18724018
 A,4,12.89572151
 A,4,12.92047701
 A,4,12.64958757
 A,4,19.85888418
 A,4,19.64057669
 A,4,19.19188964
 A,4,18.81619298
 A,4,21.73670878
 A,5,15.99430802
 A,5,18.6517
 A,5,21.80441654
 A,5,14.22081904
 A,5,16.01576433
 A,5,14.92497202
 A,5,17.95123218
 A,5,19.82027165
 A,5,19.35698273
 A,5,19.10826519
 B,6,13.40892677
 B,6,14.3956207
 B,6,13.82113391
 B,6,16.37338569
 B,6,19.70159575
 B,7,14.74334178
 B,7,16.55125245
 B,7,12.38329798
 B,7,18.16472408
 B,7,16.32938475
 B,7,16.06465494
 B,7,12.63086062
 B,7,14.46114813
 B,7,21.90134013
 B,7,13.81025827
 B,7,15.85805494
 B,7,20.18195326
 B,8,19.05120792
 B,8,12.83856639
 B,8,12.61360139
 B,8,21.30434314
 B,8,14.19960469
 B,8,17.38397826
 B,8,15.66477339
 B,8,22.07182834
 B,8,12.07487394
 B,8,20.36357359
 B,8,20.2543677
 B,9,14.44499362
 B,9,17.77235228
 B,9,13.01620902
 B,9,18.10976359
 B,10,18.22350661
 B,10,18.41504728
 B,10,17.94735486
 B,10,18.39173938
 B,10,14.21729704
 B,10,16.95753684
 B,10,21.11643087
 B,10,16.09688752
 B,10,19.54707452
 B,10,22.00450065
 B,10,15.15308873
 B,10,14.72488972
 B,10,17.65280737
 B,10,14.61615255
 B,10,12.89525607
 B,11,22.35831089
 B,11,18.0853187
 B,11,22.12815791
 B,11,17.74562214
 B,11,21.45724242
 B,11,20.57933779
 B,11,19.97397415
 B,11,16.34967424
 B,12,22.14385376
 B,12,17.82816113
 B,12,18.37056381
 B,12,16.13152759
 B,12,22.06764318
 B,12,12.80924472
 B,12,18.95522175
 B,13,20.40554286
 B,13,19.72951878
 C,14,15.51581
 C,14,15.4836358
 C,14,13.35882363
 C,14,13.16072916
 C,14,21.69168971
 C,14,19.09686303
 C,14,14.47450457
 C,14,12.04870424
 C,14,13.33096141
 C,14,17.38388981
 C,14,16.29015289
 C,14,16.32707754
 C,14,16.2784054
 C,15,15.0170597
 C,15,14.95767365
 C,15,15.20739614
 C,15,22.10458509
 C,15,12.3362457
 C,15,19.87895753
 C,15,18.8363682
 C,15,16.43738666
 C,15,12.84570744
 C,15,15.99869357
 C,15,14.42551321
 C,15,13.63489872
 C,15,15.67179885
 C,16,14.61700901
 C,16,14.64864676
 C,16,14.13014582
 C,16,21.7637441
 C,16,20.66825543
 C,16,17.05977818
 C,16,17.80118916
 C,16,15.16641698
 
 where this is read into stand.data. When I use the following survey designs,
 
 srv1 - svydesign(ids=~1, strata=~strata, data=stand.data )
 
 or,
 
 srv1 - svydesign(ids=~cluster, strata=~strata, data=stand.data )
 
 with,
 
 print( svytotal( ~vol, srv1 ) )
 
 I only obtain the total,
 
 print( svytotal( ~vol, srv1 ) )
 total SE
 vol  2377 34.464
 
 or worse,
 
 print( svytotal( ~vol + strata, srv1 ) )
  total SE
 vol 2377.0 34.464
 strataA   42.0  0.000
 strataB   64.0  0.000
 strataC   34.0  0.000
 
 which reports the number of observations in each of the strata. I'm sure
 this is a RTFM question, but I just need a start. The size of each plot
 is 0.04 units (hectares) and I want to be able to quickly examine working
 up each sample with and without clusters (this is going to be part of a
 larger simulation study).
 
 I'm trying to not use SAS for this and hate to admit defeat.
 
 Thanks,
 Jeff.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] handling NA by mean replacement

2006-01-30 Thread James Reilly


Here are a couple of documents that make much the same point (e.g. mean
value imputation is not recommended), and discuss several alternatives.

http://nces.ed.gov/statprog/2002/appendixb3.asp
http://www2.chass.ncsu.edu/garson/pa765/missing.htm

I think we'd need more information on the context to provide any real
advice. Another possible source of help is the Impute mailing list:
http://lists.utsouthwestern.edu/mailman/listinfo/impute

Cheers,
James
-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

On 31/01/2006 6:20 a.m., Berton Gunter wrote:
 Lots of other folks will give you the simple answer (hint: ?'['  ?is.na)
 
 Yours is one of those iceberg questions  -- 2/3 hidden underwater.
 
 Two points:
 
 Point 1: Generally you **don't have to do such replacement** as most of R's
 functions have a na.rm or na.action argument (unfortunately, for historical
 reasons, the argument names and meanings aren't consistent) that does
 basically what you want anyway.
 
 Point 2: Doing what you ask is probably a bad idea, as it creates mythical
 degrees of freedom and biases results -- gives wrong statistical answers.
 
 As a general matter, handling missing values correctly is a difficult
 statistical issue that you may want to avoid if you can (R has plenty of
 packages that can deal with it, but it requires background expertise).
 Honestly, I'm not sure if you can makes any sense here (how do you know?),
 but let's just say that I think your potential for mischief is reduced if
 you use R's inbuilt arguments for ignoring missings rather than imputing
 them naively.
 
 Having said that, I believe that clustering procedures, for example, may not
 permit this (but they have builtin missing imputation capabilities of their
 own, do they not?), so you may have to impute. In this case, try to do so
 wisely (e.g. via multiple imputation?). 
 
 Perhaps this will stimulate real experts to offer you some advice. Good
 luck.
 
 Cheers,
 Bert
  
 Bert Gunter
 Genentech
 
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Julie Bernauer
 Sent: Monday, January 30, 2006 8:50 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] handling NA by mean replacement

 Hello

 I am sorry fuch such a stupid question. Suppose I have a 
 table of data having a
 lot of NAs and I want to replace those NAs by the mean of the 
 column before NA
 replacement. How is it possible to do that efficiently ?

 Thanks in advance,

 Julie

 -- 
 Julie Bernauer
 Yeast Structural Genomics
 http://www.genomics.eu.org

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] solving a complicated equation

2005-11-06 Thread James Reilly

Try ?uniroot and ?pnorm.

On 7/11/2005 11:47 a.m., Cunningham Kerry wrote:
 I want to solve the following equation for x
 
 p=a*exp(-x^2/2)+b*P(Zx)
 
 where p,a,b are known, Z is a standard normal
 variable. Clearly there is no analytic form for
 P(Zx). 
 
 I am wondering if any expert could direct one easy way
 on this. Thank you.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Anything like associative arrays in R?

2005-11-02 Thread James Reilly


Also, if you want to read in various data files as in your Perl example,
you could replace the line inside both loops with something like this:
x[[i]][[j]] - read.table(paste('/path/to/data/',i,'_',j,'.dat',sep=''))

See http://cran.r-project.org/doc/manuals/R-intro.html#Character-vectors
for another paste() function example.


On 3/11/2005 2:30 a.m., jim holtman wrote:
 Is this what you want?
  x - list()
 for (i in c('test', 'some', 'more')){
 for(j in c('lv1', 'lv2', 'lv3')){
 x[[i]][[j]] - runif(10)
 }
 }
 x
 x[['some']][['lv2']]
 
 
  On 11/2/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 




Let me preface my question by stressing that I am much less interested
in the answer than in learning a way I could have *found the answer
myself*. (As helpful as the participants in this list are, I have far
too many R-related questions to resolve by posting here, and as I've
written before, in my experience the R documentation has not been very
helpful, but I remain hopeful that I may have managed to miss some
crucial document.)

The task I want to accomplish is very simple: to define and
sequentially initialize M x N variables *programmatically*, according
to two different categories, containing N and M values, respectively.

In languages with associative arrays, the typical way to do this is to
define a 2-d associative array; e.g. in Perl one could do

for $i ( 'foo', 'bar', 'baz' ) {

for $j ( 'eenie', 'meenie', 'minie', 'moe' ) {

$table{ $i }{ $j } = read_table( path/to/data/${i}_${j}.dat );
}

}

How does one do this in R? In particular, what's the equivalent of
the above in R?

Most importantly, how could I have found out this answer from the
R docs?

Many thanks in advance,

kj

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

 
 
 
 
 --
 Jim Holtman
 Cincinnati, OH
 +1 513 247 0281
 
 What the problem you are trying to solve?
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] enter a survey design in survey2.9

2005-10-16 Thread James Reilly

svydesign needs to be told what data frame to use, via the data= argument.

I get a slightly different error message from you:
Error in eval(expr, envir, enclos) : object subdiv not found
but this may be because I'm using survey version 3.3.1, not 2.9. There
are several new features (listed at
http://faculty.washington.edu/tlumley/survey/NEWS) so you may want to
upgrade.

On 10/10/2005 9:15 a.m., justin bem wrote:
Hi dears,

I expect that Mr Thomas Lumley will read this message.
I have data from a complexe stratified survey. The population is divide in 12
regions and a region consist to and urban area and rural one. there to region
just with urbain area.

stratification variable is a combinaison of region and area type (urban/rural)

In rural area, subdivision are sample with probabilties proporionnal to size
in population then enuration area are sample in selected division and finally
households are selected in those EA.

In urban area, EA are directly selected and finally household are selected.
to schematise we have:

(12 regions)
each region is divised in two regions / Urbain and rural. this are strata
in Rural : PSU are subdivision , SSU are EA and TSU are households
in Urban : PSU are EA , SSU are households.

I use svydesign function as follow :
esi-svydesign(id=~subdiv+EA+HHID,strata=~REGION+AREATYP,fpc=~FPC1+FPC2FPC3,weig=~pw,nest=T)
FPC1: number of subdivision in each strata
FPC2: number of EA in each subdivision
FPC3: number of HH in each EA.
pw : sampling weights

but I have this error message : erron in data.frame(strata, 1:i,...) I dont
understand why !
Can someone help me ?

Sincerly.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Conjoint in R

2005-06-07 Thread James Reilly


Hi,

The conf.design package should help you handle the experimental design
side of your problem. Depending on your application, it may be unwise to
assume that main effects will be enough, as interactions can often turn
out to be important (at least in my experience with discrete conjoint).

Hope this helps,
James

On 7/06/2005 10:18 p.m., Cela, Jimmy (IHG) wrote:
 Hello all,
 
 
 I am trying to apply a conjoint analysis in order to determine the best
 profile that captures the most preferred combination of levels of given
 categorical factors.
 
 For this a set of factors is given and initially a fractional factorial
 design has to be produced as a subset of all possible factor levels
 combinations, sufficient to estimate the main effects utilities. 
 
 Then the preference for each chosen combination is assessed via surveys on
 subjects (clients). Preferences are given by ranking profiles ordinally.
 
 Conjoint analysis then is applied on the preference data to estimate the
 utility values - or the part worth for each factor level.
 
 SPSS 13.0 has a module called CONJOINT that handles such problem from
 designing the fractional factorial design to the conjoint analysis.
 
 What would be the best way to implement such analysis in R?
 
 Thank you.
 
 Jimmy Cela
 
 Decision Sciences
 InterContinental Hotels Group Inc
 Atlanta, GA 
 USA
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help with possible bug (assigning NA value to data.frame) ?

2005-06-07 Thread James Reilly

  5.000  5.00
5  6  1.323529  1.1082482  1.547222
6  7  1.00  7.000  7.00
7  8  1.10  0.9021282  1.287672
8 10  1.142857  0.8766731  1.403327
9 11  1.00 11.000 11.00


Which is again incorrect and unpredicted (as above). 


Please let me know what to do to report this problem better, 
or if I just
missed something silly.

I am RH9, R-2.1.0 (compiled from source), latest boot from 
CRAN (if that
makes a difference).

Cheers,
Dan.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] help with kolmogorov smirnov test

2005-04-04 Thread James Reilly

Agnes Gault wrote:
Hello!
I am an 'R beginner'. I am trying to check if my data follow a negative 
binomial function.
the command i've typed in is:

   nbdo=rnegbin(58,mu=27.82759,theta=0.7349851)
  ks.test(do$DO,nbdo)
Each time i do that, p given is different
The p-values are different each time because you are using a two-sample 
test, where one of the samples is randomly generated (and thus will be 
different each time). ks.test offers a one-sample test against a 
specified distribution, but this will still have problems with the ties.

---
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html