well, you can pull a sample with replacement by constructing the permutation matrix slightly differently (optionally you could also sort the sample if required):
P = table(seq(1,N), sample(nrow(X),N,TRUE), N, nrow(X)); Xsample = P %*% X; Btw, your script didn't work because removeEmpty with selection vector expects a non-zero indicator by position not by value (e.g., non-zero in 7th cell indicates that you want to select the 7th row which ignores the actual value you feed in). Regards, Matthias On Sun, Apr 30, 2017 at 1:47 AM, arijit chakraborty <ak...@hotmail.com> wrote: > Hi, > > > The solution Matthias gave works perfectly when we are doing random sample > of the dataframe without replacement. But it's not working with > replacement. E.g. if I've the original dataframe of the form > matrix(seq(1,100,100, 1) and want to select randomly 20 rows. With Matthias > example, we can randomly sample that and the new matrix might look like this > > > matrix("1 2 3 21 29 36 37 40 45 53 55 56 71 72 79 82 90 96 97 99", 20,1). > > > But if I want a matrix of this form, (which can be possible with random > sampling with replacement) > > > matrix("1 2 3 21 21 21 37 40 45 53 53 56 71 79 79 82 90 96 97 99", 20,1). > > > I'm not getting it. > > > I tried the following code: > > > data_ind = matrix(seq(1,nrow(actual_data), 1), nrow(bdframe_bt_subset_1), > 1) > > data_sample = sample(nrow(data_ind), 100, TRUE) > > data_sample_matrix= matrix(data_sample, 100, 1) > > a = matrix(0, (nrow(data_ind)- nrow(data_sample_matrix)), 1) > > data_sample1 = rbind(data_sample, a) > > b = removeEmpty(target=actual_data, margin="rows", select = data_sample1); > > But this is not giving me the repeated row even though I can see in > "data_sample_matrix" I've repeated position in the data. > > I also tried the follow "sample.dlm" in "utils" folder, but that also not > giving me the answer I'm looking for. > > We can use the for-loop in this case using "data_sample_matrix" matrix. > But want to avoid looping. > > Can anyone please help? > > Thank you! > Arijit > > > > > ________________________________ > From: arijit chakraborty <ak...@hotmail.com> > Sent: Saturday, April 22, 2017 12:45 PM > To: dev@systemml.incubator.apache.org > Subject: Re: Randomly Selecting rows from a dataframe > > Thank you Matthias! You are most helpful! > > > Thanks again! > > Arijit > > ________________________________ > From: Matthias Boehm <mboe...@googlemail.com> > Sent: Saturday, April 22, 2017 2:20:48 AM > To: dev@systemml.incubator.apache.org > Subject: Re: Randomly Selecting rows from a dataframe > > you can take for example a 1% sample of rows via a permutation matrix > (specifically selection matrix) as follows > > I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01); > P = removeEmpty(target=diag(I), margin="rows"); > Xsample = P %*% X; > > or via removeEmpty and selection vector > > I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01); > Xsample = removeEmpty(target=X, margin="rows", select=I); > > Both should be compiled internally to very similar plans. > > Regards, > Matthias > > On Fri, Apr 21, 2017 at 1:42 PM, arijit chakraborty <ak...@hotmail.com> > wrote: > > > Hi, > > > > > > Suppose I've a dataframe of 10 variables (X1-X10) and have 1000 rows. Now > > I want to randomly select rows so that I've a subset of the dataset. > > > > > > Can anyone please help me to solve this problem? > > > > > > I tried the following code: > > > > > > randSample = sample(nrow(dataframe), 200); > > > > > > This gives me a column matrix with position of the row randomly selected. > > But I could not able to solve how from this matrix I can subset data from > > original dataframe. > > > > > > Thank you! > > > > > > Arijit > > >