Thanks a lot Jim. I am struggling to solve the problem.I do appreciate for your helps and advice.
Greg On Tue, Jun 14, 2016 at 3:39 AM, Jim Lemon <drjimle...@gmail.com> wrote: > Hi Greg, > This is obviously a problem with the data. The first error indicates > that the sequence of integers regrange[1]:regrange[2] is too long to > be allocated. Most likely this is because one or both of the endpoints > are infinite. Maybe if you can find where the NAs are you can fix it. > > Jim > > On Tue, Jun 14, 2016 at 12:29 AM, greg holly <mak.hho...@gmail.com> wrote: > > Hi Jim; > > > > I do apologize if bothering. I have run on the real data and here is the > > error message I got: > > > > Thanks, > > > > Greg > > > > start end > > Error in regrange[1]:regrange[2] : result would be too long a vector > > In addition: Warning messages: > > 1: In min(x) : no non-missing arguments to min; returning Inf > > 2: In max(x) : no non-missing arguments to max; returning -Inf > > > > On Mon, Jun 13, 2016 at 10:28 AM, greg holly <mak.hho...@gmail.com> > wrote: > >> > >> Hi Jim; > >> > >> I do apologize if bothering. I have run on the real data and here is the > >> error message I got: > >> > >> Thanks, > >> > >> Greg > >> > >> start end > >> Error in regrange[1]:regrange[2] : result would be too long a vector > >> In addition: Warning messages: > >> 1: In min(x) : no non-missing arguments to min; returning Inf > >> 2: In max(x) : no non-missing arguments to max; returning -Inf > >> > >> > >> On Mon, Jun 13, 2016 at 3:19 AM, Jim Lemon <drjimle...@gmail.com> > wrote: > >>> > >>> Hi Greg, > >>> Okay, I have a better idea now of what you want. The problem of > >>> multiple matches is still there, but here is a start: > >>> > >>> # this data frame actually contains all the values in ref in the "reg" > >>> field > >>> map<-read.table(text="reg p rate > >>> 10276 0.700 3.867e-18 > >>> 71608 0.830 4.542e-16 > >>> 29220 0.430 1.948e-15 > >>> 99542 0.220 1.084e-15 > >>> 26441 0.880 9.675e-14 > >>> 95082 0.090 7.349e-13 > >>> 36169 0.480 9.715e-13 > >>> 55572 0.500 9.071e-12 > >>> 65255 0.300 1.688e-11 > >>> 51960 0.970 1.163e-10 > >>> 55652 0.388 3.750e-10 > >>> 63933 0.250 9.128e-10 > >>> 35170 0.720 7.355e-09 > >>> 06491 0.370 1.634e-08 > >>> 85508 0.470 1.057e-07 > >>> 86666 0.580 7.862e-07 > >>> 04758 0.810 9.501e-07 > >>> 06169 0.440 1.104e-06 > >>> 63933 0.750 2.624e-06 > >>> 41838 0.960 8.119e-06 > >>> 74806 0.810 9.501e-07 > >>> 92643 0.470 1.057e-07 > >>> 73732 0.090 7.349e-13 > >>> 82451 0.960 8.119e-06 > >>> 86042 0.480 9.715e-13 > >>> 93502 0.500 9.071e-12 > >>> 85508 0.370 1.634e-08 > >>> 95082 0.830 4.542e-16", > >>> header=TRUE) > >>> # same as in your example > >>> ref<-read.table(text="reg1 reg2 > >>> 29220 63933 > >>> 26441 41838 > >>> 06169 10276 > >>> 74806 92643 > >>> 73732 82451 > >>> 86042 93502 > >>> 85508 95082", > >>> header=TRUE) > >>> # sort the "map" data frame > >>> map2<-map[order(map$reg),] > >>> # get a field for the counts > >>> ref$n<-NA > >>> # and a field for the minimum p values > >>> ref$min_p<-NA > >>> # get the number of rows in "ref" > >>> nref<-dim(ref)[1] > >>> for(i in 1:nref) { > >>> start<-which(map2$reg==ref$reg1[i]) > >>> end<-which(map2$reg==ref$reg2[i]) > >>> cat("start",start,"end",end,"\n") > >>> # get the range of matches > >>> regrange<-range(c(start,end)) > >>> # convert this to a sequence spanning all matches > >>> allreg<-regrange[1]:regrange[2] > >>> ref$n[i]<-sum(map2$p[allreg] > 0.85) > >>> ref$min_p[i]<-min(map2$p[allreg]) > >>> } > >>> > >>> This example uses the span from the first match of "reg1" to the last > >>> match of "reg2". This may not be what you want, so let me know if > >>> there are further constraints. > >>> > >>> Jim > >>> > >>> On Mon, Jun 13, 2016 at 12:35 PM, greg holly <mak.hho...@gmail.com> > >>> wrote: > >>> > Hi Bert; > >>> > > >>> > I do appreciate for this. I need check your codes on task2 tomorrow > at > >>> > my > >>> > office on the real data as I have difficulty (because a technical > >>> > issue) to > >>> > remote connection. I am sure it will work well. > >>> > > >>> > I am sorry that I was not able to explain my first question. > Basically > >>> > > >>> > Values in ref data represent the region of chromosome. I need choose > >>> > these > >>> > regions in map (all regions values in ref data are exist in map data > in > >>> > the > >>> > first column -column map$reg). And then summing up the column > "map$rate > >>> > and > >>> > count the numbers that gives >0.85. For example, consider the first > >>> > row in > >>> > data ref. They are 29220 and 63933. After sorting the first column > >>> > in > >>> > map then summing column "map$rate" only between 29220 to 63933 in > >>> > sorted > >>> > map and cut off at >0.85. Then count how many rows in sorted map > gives > >>> >>0.85. For example consider there are 38 rows between 29220 in > 63933 > >>> >> in sorted > >>> > map$reg and only summing first 12 of them gives>0.85. Then my answer > >>> > is > >>> > going to be 12 for 29220 - 63933 in ref. > >>> > > >>> > Thanks I lot for your patience. > >>> > > >>> > Cheers, > >>> > Greg > >>> > > >>> > On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hho...@gmail.com> > >>> > wrote: > >>> > > >>> >> Hi Bert; > >>> >> > >>> >> I do appreciate for this. I need check your codes on task2 tomorrow > at > >>> >> my > >>> >> office on the real data as I have difficulty (because a technical > >>> >> issue) to > >>> >> remote connection. I am sure it will work well. > >>> >> > >>> >> I am sorry that I was not able to explain my first question. > Basically > >>> >> > >>> >> Values in ref data represent the region of chromosome. I need choose > >>> >> these > >>> >> regions in map (all regions values in ref data are exist in map data > >>> >> in the > >>> >> first column -column map$reg). And then summing up the column > >>> >> "map$rate and > >>> >> count the numbers that gives >0.85. For example, consider the first > >>> >> row in > >>> >> data ref. They are 29220 and 63933. After sorting the first > column > >>> >> in > >>> >> map then summing column "map$rate" only between 29220 to 63933 in > >>> >> sorted map and cut off at >0.85. Then count how many rows in sorted > >>> >> map > >>> >> gives >0.85. For example consider there are 38 rows between 29220 > in > >>> >> 63933 in sorted map$reg and only summing first 12 of them > >>> >> gives>0.85. > >>> >> Then my answer is going to be 12 for 29220 - 63933 in ref. > >>> >> > >>> >> Thanks I lot for your patience. > >>> >> > >>> >> Cheers, > >>> >> Greg > >>> >> > >>> >> On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter < > bgunter.4...@gmail.com> > >>> >> wrote: > >>> >> > >>> >>> Greg: > >>> >>> > >>> >>> I was not able to understand your task 1. Perhaps others can. > >>> >>> > >>> >>> My understanding of your task 2 is that for each row of ref, you > wish > >>> >>> to find all rows,of map such that the reg values in those rows fall > >>> >>> between the reg1 and reg2 values in ref (inclusive change <= to < > if > >>> >>> you don't want the endpoints), and then you want the minimum map$p > >>> >>> values of all those rows. If that is correct, I believe this will > do > >>> >>> it (but caution, untested, as you failed to provide data in a > >>> >>> convenient form, e.g. using dput() ) > >>> >>> > >>> >>> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i) > >>> >>> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0)) > >>> >>> > >>> >>> > >>> >>> If my understanding is incorrect, please ignore both the above and > >>> >>> the > >>> >>> following: > >>> >>> > >>> >>> > >>> >>> The "solution" I have given above seems inefficient, so others may > be > >>> >>> able to significantly improve it if you find that it takes too > long. > >>> >>> OTOH, my understanding of your specification is that you need to > >>> >>> search for all rows in map data frame that meet the criterion for > >>> >>> each > >>> >>> row of ref, and without further information, I don't know how to do > >>> >>> this without just repeating the search 560 times. > >>> >>> > >>> >>> > >>> >>> Cheers, > >>> >>> Bert > >>> >>> > >>> >>> > >>> >>> Bert Gunter > >>> >>> > >>> >>> "The trouble with having an open mind is that people keep coming > >>> >>> along > >>> >>> and sticking things into it." > >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>> >>> > >>> >>> > >>> >>> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hho...@gmail.com> > >>> >>> wrote: > >>> >>> > Dear all; > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > I have two data sets, data=map and data=ref). A small part of > each > >>> >>> > data > >>> >>> set > >>> >>> > are given below. Data map has more than 27 million and data ref > has > >>> >>> about > >>> >>> > 560 rows. Basically I need run two different task. My R codes for > >>> >>> > these > >>> >>> > task are given below but they do not work properly. > >>> >>> > > >>> >>> > I sincerely do appreciate your helps. > >>> >>> > > >>> >>> > > >>> >>> > Regards, > >>> >>> > > >>> >>> > Greg > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > Task 1) > >>> >>> > > >>> >>> > For example, the first and second columns for row 1 in data ref > are > >>> >>> 29220 > >>> >>> > 63933. So I need write an R code normally first look the first > row > >>> >>> > in > >>> >>> ref > >>> >>> > (which they are 29220 and 63933) than summing the column of > >>> >>> > "map$rate" > >>> >>> and > >>> >>> > give the number of rows that >0.85. Then do the same for the > >>> >>> > second, > >>> >>> > third....in ref. At the end I would like a table gave below (the > >>> >>> results I > >>> >>> > need). Please notice the all value specified in ref data file are > >>> >>> > exist > >>> >>> in > >>> >>> > map$reg column. > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > Task2) > >>> >>> > > >>> >>> > Again example, the first and second columns for row 1 in data ref > >>> >>> > are > >>> >>> 29220 > >>> >>> > 63933. So I need write an R code give the minimum map$p for the > >>> >>> > 29220 > >>> >>> > -63933 intervals in map file. Than > >>> >>> > > >>> >>> > do the same for the second, third....in ref. > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > #my attempt for the first question > >>> >>> > > >>> >>> > temp<-map[order(map$reg, map$p),] > >>> >>> > > >>> >>> > count<-1 > >>> >>> > > >>> >>> > temp<-unique(temp$reg > >>> >>> > > >>> >>> > for(i in 1:length(ref) { > >>> >>> > > >>> >>> > for(j in 1:length(ref) > >>> >>> > > >>> >>> > { > >>> >>> > > >>> >>> > temp1<-if (temp[pos[i]==ref[ref$reg1,] & > >>> >>> > (temp[pos[j]==ref[ref$reg2,] > >>> >>> > & temp[cumsum(temp$rate) > >>> >>> >>0.70,]) > >>> >>> > > >>> >>> > count=count+1 > >>> >>> > > >>> >>> > } > >>> >>> > > >>> >>> > } > >>> >>> > > >>> >>> > #my attempt for the second question > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > temp<-map[order(map$reg, map$p),] > >>> >>> > > >>> >>> > count<-1 > >>> >>> > > >>> >>> > temp<-unique(temp$reg > >>> >>> > > >>> >>> > for(i in 1:length(ref) { > >>> >>> > > >>> >>> > for(j in 1:length(ref) > >>> >>> > > >>> >>> > { > >>> >>> > > >>> >>> > temp2<-if (temp[pos[i]==ref[ref$reg1,] & > >>> >>> > (temp[pos[j]==ref[ref$reg2,]) > >>> >>> > > >>> >>> > output<-temp2[temp2$p==min(temp2$p),] > >>> >>> > > >>> >>> > } > >>> >>> > > >>> >>> > } > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > Data sets > >>> >>> > > >>> >>> > > >>> >>> > Data= map > >>> >>> > > >>> >>> > reg p rate > >>> >>> > > >>> >>> > 10276 0.700 3.867e-18 > >>> >>> > > >>> >>> > 71608 0.830 4.542e-16 > >>> >>> > > >>> >>> > 29220 0.430 1.948e-15 > >>> >>> > > >>> >>> > 99542 0.220 1.084e-15 > >>> >>> > > >>> >>> > 26441 0.880 9.675e-14 > >>> >>> > > >>> >>> > 95082 0.090 7.349e-13 > >>> >>> > > >>> >>> > 36169 0.480 9.715e-13 > >>> >>> > > >>> >>> > 55572 0.500 9.071e-12 > >>> >>> > > >>> >>> > 65255 0.300 1.688e-11 > >>> >>> > > >>> >>> > 51960 0.970 1.163e-10 > >>> >>> > > >>> >>> > 55652 0.388 3.750e-10 > >>> >>> > > >>> >>> > 63933 0.250 9.128e-10 > >>> >>> > > >>> >>> > 35170 0.720 7.355e-09 > >>> >>> > > >>> >>> > 06491 0.370 1.634e-08 > >>> >>> > > >>> >>> > 85508 0.470 1.057e-07 > >>> >>> > > >>> >>> > 86666 0.580 7.862e-07 > >>> >>> > > >>> >>> > 04758 0.810 9.501e-07 > >>> >>> > > >>> >>> > 06169 0.440 1.104e-06 > >>> >>> > > >>> >>> > 63933 0.750 2.624e-06 > >>> >>> > > >>> >>> > 41838 0.960 8.119e-06 > >>> >>> > > >>> >>> > > >>> >>> > data=ref > >>> >>> > > >>> >>> > reg1 reg2 > >>> >>> > > >>> >>> > 29220 63933 > >>> >>> > > >>> >>> > 26441 41838 > >>> >>> > > >>> >>> > 06169 10276 > >>> >>> > > >>> >>> > 74806 92643 > >>> >>> > > >>> >>> > 73732 82451 > >>> >>> > > >>> >>> > 86042 93502 > >>> >>> > > >>> >>> > 85508 95082 > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > the results I need > >>> >>> > > >>> >>> > reg1 reg2 n > >>> >>> > > >>> >>> > 29220 63933 12 > >>> >>> > > >>> >>> > 26441 41838 78 > >>> >>> > > >>> >>> > 06169 10276 125 > >>> >>> > > >>> >>> > 74806 92643 11 > >>> >>> > > >>> >>> > 73732 82451 47 > >>> >>> > > >>> >>> > 86042 93502 98 > >>> >>> > > >>> >>> > 85508 95082 219 > >>> >>> > > >>> >>> > [[alternative HTML version deleted]] > >>> >>> > > >>> >>> > ______________________________________________ > >>> >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > see > >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help > >>> >>> > PLEASE do read the posting guide > >>> >>> http://www.R-project.org/posting-guide.html > >>> >>> > and provide commented, minimal, self-contained, reproducible > code. > >>> >>> > >>> >> > >>> >> > >>> > > >>> > [[alternative HTML version deleted]] > >>> > > >>> > ______________________________________________ > >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> > https://stat.ethz.ch/mailman/listinfo/r-help > >>> > PLEASE do read the posting guide > >>> > http://www.R-project.org/posting-guide.html > >>> > and provide commented, minimal, self-contained, reproducible code. > >> > >> > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.