Re: [R] unique/subset problem

Sarah Goslee Fri, 26 Jan 2007 08:22:25 -0800

Without knowing more about your data, it is hard to say for certain,
but might you be confusing unique _values_ with _factor levels_?


> mydata <- as.factor(sort(rep(1:5, 2)))
# mydata has 10 values, 5 unique values, and 5 factor levels
> mydata
 [1] 1 1 2 2 3 3 4 4 5 5
Levels: 1 2 3 4 5
> unique(mydata)
[1] 1 2 3 4 5
Levels: 1 2 3 4 5
> mydata.subset <- mydata[1:4]
# the subset now has only 2 unique values, but the output
# still lists all five factor levels
> unique(mydata.subset)
[1] 1 2
Levels: 1 2 3 4 5

# try drop=TRUE as an option to subset
> mydata.subset <- mydata[1:4, drop=TRUE]
> unique(mydata.subset)
[1] 1 2
Levels: 1 2

Alternatively, if this is the problem and you don't need those
data to be factors, you could always convert them to a more
appropriate form.

Sarah

> > On 1/25/07, lalitha viswanath
> > <[EMAIL PROTECTED]> wrote:
> > > Hi
> > > I am new to R programming and am using subset to
> > > extract part of a data as follows
> > >
> > > names(dataset) =
> > > c("genome1","genome2","dist","score");
> > > prunedrelatives <- subset(dataset, score < -5);
> > >
> > > However when I use unique to find the number of
> > unique
> > > genomes now present in prunedrelatives I get
> > results
> > > identical to calling unique(dataset$genome1)
> > although
> > > subset has eliminated many genomes and records.
> > >
> > > I would greatly appreciate your input about using
> > > "unique" correctly  in this regard.
> > >
> > > Thanks
> > > Lalitha
> > >

-- 
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unique/subset problem

Reply via email to