On Jul 14, 2011, at 6:15 PM, Tyler Rinker wrote:
Good Afternoon R Community,
I often work with very large data bases and want to search for
select cases by a particular word or numeric value. I created the
following simple function to do just that. It searchs a particular
column for the phrase and returns a data frame with the rows that
contain that phrase (for a particular column).
Search<-function(term, dataframe, column.name, variation=.02,...){
te<-substitute(term)
te<-as.character(te)
cn<-substitute(column.name)
cn<-as.character(cn)
HUNT<-agrep(te,dataframe[,cn],ignore.case
=TRUE,max.distance=variation,...)
### dataframe[c(HUNT),]
HUNTL <- (1:NROW(dataframe) %in% HUNT)
}
You would make life simpler by keeping your results as logical vectors
the same length as your dataframe.
Then:
logHunt <- sapply(dfrmname, Search, term=term, )
indexL <- rowSums(logHunt) >=1
dfrmname[indexL, ]
Untested in absence of test data.
--
David.
I would like to modify this to search all columns for the phrase
keep only the unique rows and return a data frame for any columns
(minus repeated rows) that contain the phrase.
I assumed this would be an easy task for me using sapply() and
unique() or union(). Because this argument takes more than one
argument (vector{column} is not the only argument) I dont know how
to set it up. Could someone tell me how to apply this function to
multiple columns and return one data frame with all the agrep
matches (Ill figure out how to deal with duplicates after that;
thats the easy part).
Thank you in advance for your help,
Tyler Rinker
PS if your idea is a for loop please explain it well or provide the
code because I do not have a programming background and for loops
are very difficult to wrap my head around.
Running windows 7
R version 2.14.0 (beta)
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.