Thanks a lot Matthias. Following up with your suggestions, I tried option 2:
# option 2: via removeEmpty Ind = (X[,1]>10); Y = removeEmpty(target=X, select=Ind); SystemML throws a complaint (maybe I'm not using the correct version?): Named parameter 'margin' missing. Please specify 'rows' or 'cols'. It works correctly after adding the 'margin' argument: # option 2: via removeEmpty Ind = (X[,1]>10); Y = removeEmpty(target=X, margin = "rows", select=Ind); I know the document is being updated continuously, just want to point out the help file for 'removeEmpty()' (link below) does not contain an explanation of the 'select' argument yet:) http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-construction-manipulation-and-aggregation-built-in-functions Thanks again for your help Best, Ethan On Sun, Apr 3, 2016 at 12:45 AM, Matthias Boehm <[email protected]> wrote: > absolutely, if your want to count or aggregate values of the two groups, > you should definitely go with the aggregate() call instead. The snippets I > provided are just for the case where you want to run some other analysis > over the subsets (e.g., running an algorithm over a sample or fold). > > Regards, > Matthias > > > [image: Inactive hide details for Ethan Xu ---03/31/2016 11:31:32 AM---Ah > I missed the 'removeEmpty()' function. That's a smart ways to]Ethan Xu > ---03/31/2016 11:31:32 AM---Ah I missed the 'removeEmpty()' function. > That's a smart ways to trim matrix. Thanks Matthias! > > From: Ethan Xu <[email protected]> > To: [email protected] > Date: 03/31/2016 11:31 AM > Subject: Re: Logical indexing? > ------------------------------ > > > > Ah I missed the 'removeEmpty()' function. That's a smart ways to trim > matrix. Thanks Matthias! > > Also from your answer I realized 'ind = (X[,1] > 10);' is acceptable, so > aggregation would work with > > ind = (X[,1] > 10) + 1; > F = aggregate(target = X[,2], groups = ind, fn = "sum"); > > Ethan > > > On Thu, Mar 31, 2016 at 1:22 PM, Matthias Boehm <[email protected]> wrote: > > > just a quick correction of option 2: > > > > Ind = (X[,1]>10); > > Y = removeEmpty(target=X, select=Ind); > > > > Regards, > > Matthias > > > > [image: Inactive hide details for Matthias Boehm---03/31/2016 10:14:50 > > AM---that's a good question - no SystemML does not support set i]Matthias > > Boehm---03/31/2016 10:14:50 AM---that's a good question - no SystemML > does > > not support set indexing yet but you can emulate it via pe > > > > From: Matthias Boehm/Almaden/IBM@IBMUS > > To: [email protected] > > Date: 03/31/2016 10:14 AM > > Subject: Re: Logical indexing? > > ------------------------------ > > > > > > > > that's a good question - no SystemML does not support set indexing yet > but > > you can emulate it via permutation matrices or similar transformations. > > Here are some examples: > > > > # option 1: via permutation (aka selection) matrices > > P = removeEmpty(target=diag(X[,1]>10), margin="rows"); > > Y = P %*% X; > > > > # option 2: via removeEmpty > > Ind = diag(X[,1]>10); > > Y = removeEmpty(target=X, select=Ind); > > > > > > Regards, > > Matthias > > > > Ethan Xu ---03/31/2016 08:47:43 AM---Does SystemML support logical > > indexing? For example if X is a numerical matrix with 2 columns and n > > > > From: Ethan Xu <[email protected]> > > To: [email protected] > > Date: 03/31/2016 08:47 AM > > Subject: Logical indexing? > > ------------------------------ > > > > > > > > Does SystemML support logical indexing? > > > > For example if X is a numerical matrix with 2 columns and n rows (in my > > case n ~ 35 million). I'd like to split the matrix row-wise according to > > values of the first column. This is useful when I need to find > > distributions of subgroups of population. In R I can do > > > > Y = X[ X[ ,1] > 10, ] > > > > OR > > > > ind = which(X[ ,1] > 10) > > Y = X[ind, ] > > > > It seems neither syntex works in SystemML. > > > > I noticed there's an aggregate() function for SystemML, but it supports > > coded categorical variable. > > > > Perhaps one way to do that is creating an indicator n by 1 matrix Z that > > takes values 1 and 2 where 1 corresponds to X[, 1] <= 10 and 2 > corresponds > > to X[,1] > 10. Then aggregate() X[,2] with respect to Z. > > > > It seems transform() with 'bin' option is one obvious way to create such > a > > Z, however the 'bin' method only supports 'equi-width' currently. > > > > Is looping through X[,1] the best option? Maybe I missed some other > > convenient functions. > > > > Any suggestions are greatly appreciated! > > > > Best, > > > > Ethan > > > > > > > > > > >
