absolutely, if your want to count or aggregate values of the two groups, you should definitely go with the aggregate() call instead. The snippets I provided are just for the case where you want to run some other analysis over the subsets (e.g., running an algorithm over a sample or fold).
Regards, Matthias From: Ethan Xu <[email protected]> To: [email protected] Date: 03/31/2016 11:31 AM Subject: Re: Logical indexing? Ah I missed the 'removeEmpty()' function. That's a smart ways to trim matrix. Thanks Matthias! Also from your answer I realized 'ind = (X[,1] > 10);' is acceptable, so aggregation would work with ind = (X[,1] > 10) + 1; F = aggregate(target = X[,2], groups = ind, fn = "sum"); Ethan On Thu, Mar 31, 2016 at 1:22 PM, Matthias Boehm <[email protected]> wrote: > just a quick correction of option 2: > > Ind = (X[,1]>10); > Y = removeEmpty(target=X, select=Ind); > > Regards, > Matthias > > [image: Inactive hide details for Matthias Boehm---03/31/2016 10:14:50 > AM---that's a good question - no SystemML does not support set i]Matthias > Boehm---03/31/2016 10:14:50 AM---that's a good question - no SystemML does > not support set indexing yet but you can emulate it via pe > > From: Matthias Boehm/Almaden/IBM@IBMUS > To: [email protected] > Date: 03/31/2016 10:14 AM > Subject: Re: Logical indexing? > ------------------------------ > > > > that's a good question - no SystemML does not support set indexing yet but > you can emulate it via permutation matrices or similar transformations. > Here are some examples: > > # option 1: via permutation (aka selection) matrices > P = removeEmpty(target=diag(X[,1]>10), margin="rows"); > Y = P %*% X; > > # option 2: via removeEmpty > Ind = diag(X[,1]>10); > Y = removeEmpty(target=X, select=Ind); > > > Regards, > Matthias > > Ethan Xu ---03/31/2016 08:47:43 AM---Does SystemML support logical > indexing? For example if X is a numerical matrix with 2 columns and n > > From: Ethan Xu <[email protected]> > To: [email protected] > Date: 03/31/2016 08:47 AM > Subject: Logical indexing? > ------------------------------ > > > > Does SystemML support logical indexing? > > For example if X is a numerical matrix with 2 columns and n rows (in my > case n ~ 35 million). I'd like to split the matrix row-wise according to > values of the first column. This is useful when I need to find > distributions of subgroups of population. In R I can do > > Y = X[ X[ ,1] > 10, ] > > OR > > ind = which(X[ ,1] > 10) > Y = X[ind, ] > > It seems neither syntex works in SystemML. > > I noticed there's an aggregate() function for SystemML, but it supports > coded categorical variable. > > Perhaps one way to do that is creating an indicator n by 1 matrix Z that > takes values 1 and 2 where 1 corresponds to X[, 1] <= 10 and 2 corresponds > to X[,1] > 10. Then aggregate() X[,2] with respect to Z. > > It seems transform() with 'bin' option is one obvious way to create such a > Z, however the 'bin' method only supports 'equi-width' currently. > > Is looping through X[,1] the best option? Maybe I missed some other > convenient functions. > > Any suggestions are greatly appreciated! > > Best, > > Ethan > > > >
