Re: [R] R survey package again
On 7/9/07 11:42 PM, eugen pircalabelu wrote: I have a sample from a survey where household were interviewed. The sample has 4 criteria on which the stratification was based: REGION, SIZE OF HOUSEHOLD, SIZE OF LOCALITY, AGE OF HEAD OF HOUSEHOLD. Since i don't have the whole information in each cell of the cross region*sizehh*sizeloc*age i can't use the postStratify function from Survey package. Is that correct? (I think so but i need a competent answer) The only additional info that i have is the size of a cell from a 2*2 crossing (eg: I know the population size for all the strata defined by region*sizehh, region*sizeloc, sizeloc*age) so i have the behaviour of the population but in a 2 by 2 cross for each of these criteria. You're right, poststratification can't work from two-way marginal distributions, but raking or calibration can. However it seems odd that you only have this much information, since the full joint distribution would have been needed for stratification. Usually these details would be documented as part of the sample design. Can you get this information from those responsible for the sample design? It would also be good to check your understanding of the design. A sampling frame listing details of household size and age of household head would have been needed to do the four-way stratification you mention, but in my experience such frames aren't very common. James -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survey package
On 7/9/07 12:36 AM, eugen pircalabelu wrote: I'm trying to use the Survey package for a stratified sample which has 4 criteria on which the stratification is based. I would like to get the corrected weights and for every element i get a weight of 1 E.g: tipping design - svydesign (id=~1, strata= ~regiune + size_loc + age_rec_hhh + size_hh, data= tabel) and then weights(design) gives me: 1,1,1,1,1,1,1,1,1,1,1,... for each element The weights are all 1 because you haven't told R how they should be calculated. If the sampling weights should be constant within strata, you can simply specify the population figures for each stratum in svydesign's fpc argument and it will calculate the weights for you. Various techniques for adjusting the weights are also supported; see http://faculty.washington.edu/tlumley/survey/example-poststrat.html James -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Excel
On 30/8/07 6:42 AM, Erich Neuwirth wrote: There is one feature in Excel which is extremely convenient, Pivot tables. Anybody doing any work as statistical consultant really ought to know about Pivot tables, and I am still surprised how many statisticians do not know about it. Neither Gnumeric nor OpenOffice Calc offer comparably convenient ways working with multidimensional tables. I'm not familiar with Gnumeric, but OpenOffice has provided the similar DataPilot feature for at least three years; see for example http://www.openofficetips.com/blog/archives/2004/09/datapilot_101.html James -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting strata/cluster level values with survey package?
Try the examples here: ?ftable.svystat On 8/02/2006 12:23 p.m., Jeff D. Hamann wrote: First, I appoligise for the rooky question, but... I'm trying to obtain standard errors, confidence intervals, etc. from a sample design and have been trouble getting the results for anything other than the basic total or mean for the overall survey from the survey package. For example, using the following dataset, strata,cluster,vol A,1,18.58556192 A,1,12.55175443 A,1,21.65882438 A,1,17.11172946 A,1,15.41713348 A,2,13.9344623 A,2,17.13104821 A,2,14.6806479 A,2,14.68357291 A,2,18.86017714 A,2,20.67642515 A,2,15.15295351 A,2,13.82121102 A,2,12.9110477 A,2,14.83153677 A,2,21.90772687 A,3,18.69795427 A,3,18.45636428 A,3,15.77175793 A,3,15.54715217 A,3,20.31948393 A,3,19.26391445 A,3,15.54750775 A,3,19.18724018 A,4,12.89572151 A,4,12.92047701 A,4,12.64958757 A,4,19.85888418 A,4,19.64057669 A,4,19.19188964 A,4,18.81619298 A,4,21.73670878 A,5,15.99430802 A,5,18.6517 A,5,21.80441654 A,5,14.22081904 A,5,16.01576433 A,5,14.92497202 A,5,17.95123218 A,5,19.82027165 A,5,19.35698273 A,5,19.10826519 B,6,13.40892677 B,6,14.3956207 B,6,13.82113391 B,6,16.37338569 B,6,19.70159575 B,7,14.74334178 B,7,16.55125245 B,7,12.38329798 B,7,18.16472408 B,7,16.32938475 B,7,16.06465494 B,7,12.63086062 B,7,14.46114813 B,7,21.90134013 B,7,13.81025827 B,7,15.85805494 B,7,20.18195326 B,8,19.05120792 B,8,12.83856639 B,8,12.61360139 B,8,21.30434314 B,8,14.19960469 B,8,17.38397826 B,8,15.66477339 B,8,22.07182834 B,8,12.07487394 B,8,20.36357359 B,8,20.2543677 B,9,14.44499362 B,9,17.77235228 B,9,13.01620902 B,9,18.10976359 B,10,18.22350661 B,10,18.41504728 B,10,17.94735486 B,10,18.39173938 B,10,14.21729704 B,10,16.95753684 B,10,21.11643087 B,10,16.09688752 B,10,19.54707452 B,10,22.00450065 B,10,15.15308873 B,10,14.72488972 B,10,17.65280737 B,10,14.61615255 B,10,12.89525607 B,11,22.35831089 B,11,18.0853187 B,11,22.12815791 B,11,17.74562214 B,11,21.45724242 B,11,20.57933779 B,11,19.97397415 B,11,16.34967424 B,12,22.14385376 B,12,17.82816113 B,12,18.37056381 B,12,16.13152759 B,12,22.06764318 B,12,12.80924472 B,12,18.95522175 B,13,20.40554286 B,13,19.72951878 C,14,15.51581 C,14,15.4836358 C,14,13.35882363 C,14,13.16072916 C,14,21.69168971 C,14,19.09686303 C,14,14.47450457 C,14,12.04870424 C,14,13.33096141 C,14,17.38388981 C,14,16.29015289 C,14,16.32707754 C,14,16.2784054 C,15,15.0170597 C,15,14.95767365 C,15,15.20739614 C,15,22.10458509 C,15,12.3362457 C,15,19.87895753 C,15,18.8363682 C,15,16.43738666 C,15,12.84570744 C,15,15.99869357 C,15,14.42551321 C,15,13.63489872 C,15,15.67179885 C,16,14.61700901 C,16,14.64864676 C,16,14.13014582 C,16,21.7637441 C,16,20.66825543 C,16,17.05977818 C,16,17.80118916 C,16,15.16641698 where this is read into stand.data. When I use the following survey designs, srv1 - svydesign(ids=~1, strata=~strata, data=stand.data ) or, srv1 - svydesign(ids=~cluster, strata=~strata, data=stand.data ) with, print( svytotal( ~vol, srv1 ) ) I only obtain the total, print( svytotal( ~vol, srv1 ) ) total SE vol 2377 34.464 or worse, print( svytotal( ~vol + strata, srv1 ) ) total SE vol 2377.0 34.464 strataA 42.0 0.000 strataB 64.0 0.000 strataC 34.0 0.000 which reports the number of observations in each of the strata. I'm sure this is a RTFM question, but I just need a start. The size of each plot is 0.04 units (hectares) and I want to be able to quickly examine working up each sample with and without clusters (this is going to be part of a larger simulation study). I'm trying to not use SAS for this and hate to admit defeat. Thanks, Jeff. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] handling NA by mean replacement
Here are a couple of documents that make much the same point (e.g. mean value imputation is not recommended), and discuss several alternatives. http://nces.ed.gov/statprog/2002/appendixb3.asp http://www2.chass.ncsu.edu/garson/pa765/missing.htm I think we'd need more information on the context to provide any real advice. Another possible source of help is the Impute mailing list: http://lists.utsouthwestern.edu/mailman/listinfo/impute Cheers, James -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand On 31/01/2006 6:20 a.m., Berton Gunter wrote: Lots of other folks will give you the simple answer (hint: ?'[' ?is.na) Yours is one of those iceberg questions -- 2/3 hidden underwater. Two points: Point 1: Generally you **don't have to do such replacement** as most of R's functions have a na.rm or na.action argument (unfortunately, for historical reasons, the argument names and meanings aren't consistent) that does basically what you want anyway. Point 2: Doing what you ask is probably a bad idea, as it creates mythical degrees of freedom and biases results -- gives wrong statistical answers. As a general matter, handling missing values correctly is a difficult statistical issue that you may want to avoid if you can (R has plenty of packages that can deal with it, but it requires background expertise). Honestly, I'm not sure if you can makes any sense here (how do you know?), but let's just say that I think your potential for mischief is reduced if you use R's inbuilt arguments for ignoring missings rather than imputing them naively. Having said that, I believe that clustering procedures, for example, may not permit this (but they have builtin missing imputation capabilities of their own, do they not?), so you may have to impute. In this case, try to do so wisely (e.g. via multiple imputation?). Perhaps this will stimulate real experts to offer you some advice. Good luck. Cheers, Bert Bert Gunter Genentech -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Julie Bernauer Sent: Monday, January 30, 2006 8:50 AM To: r-help@stat.math.ethz.ch Subject: [R] handling NA by mean replacement Hello I am sorry fuch such a stupid question. Suppose I have a table of data having a lot of NAs and I want to replace those NAs by the mean of the column before NA replacement. How is it possible to do that efficiently ? Thanks in advance, Julie -- Julie Bernauer Yeast Structural Genomics http://www.genomics.eu.org __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] solving a complicated equation
Try ?uniroot and ?pnorm. On 7/11/2005 11:47 a.m., Cunningham Kerry wrote: I want to solve the following equation for x p=a*exp(-x^2/2)+b*P(Zx) where p,a,b are known, Z is a standard normal variable. Clearly there is no analytic form for P(Zx). I am wondering if any expert could direct one easy way on this. Thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Anything like associative arrays in R?
Also, if you want to read in various data files as in your Perl example, you could replace the line inside both loops with something like this: x[[i]][[j]] - read.table(paste('/path/to/data/',i,'_',j,'.dat',sep='')) See http://cran.r-project.org/doc/manuals/R-intro.html#Character-vectors for another paste() function example. On 3/11/2005 2:30 a.m., jim holtman wrote: Is this what you want? x - list() for (i in c('test', 'some', 'more')){ for(j in c('lv1', 'lv2', 'lv3')){ x[[i]][[j]] - runif(10) } } x x[['some']][['lv2']] On 11/2/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Let me preface my question by stressing that I am much less interested in the answer than in learning a way I could have *found the answer myself*. (As helpful as the participants in this list are, I have far too many R-related questions to resolve by posting here, and as I've written before, in my experience the R documentation has not been very helpful, but I remain hopeful that I may have managed to miss some crucial document.) The task I want to accomplish is very simple: to define and sequentially initialize M x N variables *programmatically*, according to two different categories, containing N and M values, respectively. In languages with associative arrays, the typical way to do this is to define a 2-d associative array; e.g. in Perl one could do for $i ( 'foo', 'bar', 'baz' ) { for $j ( 'eenie', 'meenie', 'minie', 'moe' ) { $table{ $i }{ $j } = read_table( path/to/data/${i}_${j}.dat ); } } How does one do this in R? In particular, what's the equivalent of the above in R? Most importantly, how could I have found out this answer from the R docs? Many thanks in advance, kj __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] enter a survey design in survey2.9
svydesign needs to be told what data frame to use, via the data= argument. I get a slightly different error message from you: Error in eval(expr, envir, enclos) : object subdiv not found but this may be because I'm using survey version 3.3.1, not 2.9. There are several new features (listed at http://faculty.washington.edu/tlumley/survey/NEWS) so you may want to upgrade. On 10/10/2005 9:15 a.m., justin bem wrote: Hi dears, I expect that Mr Thomas Lumley will read this message. I have data from a complexe stratified survey. The population is divide in 12 regions and a region consist to and urban area and rural one. there to region just with urbain area. stratification variable is a combinaison of region and area type (urban/rural) In rural area, subdivision are sample with probabilties proporionnal to size in population then enuration area are sample in selected division and finally households are selected in those EA. In urban area, EA are directly selected and finally household are selected. to schematise we have: (12 regions) each region is divised in two regions / Urbain and rural. this are strata in Rural : PSU are subdivision , SSU are EA and TSU are households in Urban : PSU are EA , SSU are households. I use svydesign function as follow : esi-svydesign(id=~subdiv+EA+HHID,strata=~REGION+AREATYP,fpc=~FPC1+FPC2FPC3,weig=~pw,nest=T) FPC1: number of subdivision in each strata FPC2: number of EA in each subdivision FPC3: number of HH in each EA. pw : sampling weights but I have this error message : erron in data.frame(strata, 1:i,...) I dont understand why ! Can someone help me ? Sincerly. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Conjoint in R
Hi, The conf.design package should help you handle the experimental design side of your problem. Depending on your application, it may be unwise to assume that main effects will be enough, as interactions can often turn out to be important (at least in my experience with discrete conjoint). Hope this helps, James On 7/06/2005 10:18 p.m., Cela, Jimmy (IHG) wrote: Hello all, I am trying to apply a conjoint analysis in order to determine the best profile that captures the most preferred combination of levels of given categorical factors. For this a set of factors is given and initially a fractional factorial design has to be produced as a subset of all possible factor levels combinations, sufficient to estimate the main effects utilities. Then the preference for each chosen combination is assessed via surveys on subjects (clients). Preferences are given by ranking profiles ordinally. Conjoint analysis then is applied on the preference data to estimate the utility values - or the part worth for each factor level. SPSS 13.0 has a module called CONJOINT that handles such problem from designing the fractional factorial design to the conjoint analysis. What would be the best way to implement such analysis in R? Thank you. Jimmy Cela Decision Sciences InterContinental Hotels Group Inc Atlanta, GA USA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with possible bug (assigning NA value to data.frame) ?
5.000 5.00 5 6 1.323529 1.1082482 1.547222 6 7 1.00 7.000 7.00 7 8 1.10 0.9021282 1.287672 8 10 1.142857 0.8766731 1.403327 9 11 1.00 11.000 11.00 Which is again incorrect and unpredicted (as above). Please let me know what to do to report this problem better, or if I just missed something silly. I am RH9, R-2.1.0 (compiled from source), latest boot from CRAN (if that makes a difference). Cheers, Dan. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] help with kolmogorov smirnov test
Agnes Gault wrote: Hello! I am an 'R beginner'. I am trying to check if my data follow a negative binomial function. the command i've typed in is: nbdo=rnegbin(58,mu=27.82759,theta=0.7349851) ks.test(do$DO,nbdo) Each time i do that, p given is different The p-values are different each time because you are using a two-sample test, where one of the samples is randomly generated (and thus will be different each time). ks.test offers a one-sample test against a specified distribution, but this will still have problems with the ties. --- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html