It may be easy or difficult depending on what your data are like. "GALAXY ACE S 5830" vs "S 5830 GALAXY ACE"
One easy and reasonably general way would be to divide each such bit into 4 "words" and then compare if set 2 contains exactly all words in set 1 but possibly in different order. x1 <- "GALAXY ACE S 5830" x2 <- "S 5830 GALAXY ACE" x3 <- "S 5830 GALAXY ZOMBIE" divide <- function(x) strsplit(x1, " ")[[1]] check <- function(x, y) all(divide(x) %in% divide(y)) check(x1,x2) # [1] TRUE check(x1,x3) #FALSE Or you could try reading in your data in a different way so that "S", "GALAXY", "ACE", and "5830" would be in different variables (if all product names have identical structure i.e 4 elements, or is S 5830 supposed to be the price?). Or build a catalogue of all possible product names and then compare each name to it. etc htmh On 9/26/12, Tammy Ma <metal_lical...@live.com> wrote: > > Dear R user: > > > I have got the following problem: > > I have imported two data sets into R: one set includes price information, > another one includes volume information. but I noticed the wrong data order > problem in the product name, > > for instance, > > in one data set, > > "GALAXY ACE S 5830" > > in another one, > > it is "S 5830 GALAXY ACE" > > both represent same product. how do i map two name into one in R? > > there are so many product name having this problem. i hope there is some > mechanism which can autimatically map those. thanks for your help.. > > > Kind regards, > Tammy > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.