BACtaki commented on PR #1740:
URL: https://github.com/apache/systemds/pull/1740#issuecomment-1330084042

   I have added the following API design notes to the JIRA:
   
   1. Row aggregation: remove duplicate rows
   
       * R: unique() removes duplicate rows, eg
   
       ```R
           > df <- matrix(rep(1:6,length.out=9),nrow = 3,ncol=3,byrow = T)
           > df
               [,1] [,2] [,3]
           [1,]    1    2    3
           [2,]    4    5    6
           [3,]    1    2    3
           > unique(df)
               [,1] [,2] [,3]
           [1,]    1    2    3
           [2,]    4    5    6
       ```
   
       * SystemDS: the same can be achieved using the unique() sketch like so:
   
           unique(X, dir="r")
   
   2. Col aggregation: remove duplicate cols
   
       * R: unique() removes duplicate rows, so we can obtain the desired 
result using transpose, like so:
   
       ```R
           > df <- matrix(rep(1:6,length.out=9),nrow = 3,ncol=3)
           > df
               [,1] [,2] [,3]
           [1,]    1    4    1
           [2,]    2    5    2
           [3,]    3    6    3
           > df_t
               [,1] [,2] [,3]
           [1,]    1    2    3
           [2,]    4    5    6
           [3,]    1    2    3
           > df_t_ = unique(df_t)
           > df_t_
               [,1] [,2] [,3]
           [1,]    1    2    3
           [2,]    4    5    6
           > df_t_t = t(df_t_)
           > df_t_t
               [,1] [,2]
           [1,]    1    4
           [2,]    2    5
           [3,]    3    6
       ```
   
       * SystemDS: the same can be achieved using the unique() sketch like so:
   
           unique(X, dir="c")
   
   3. RowCol aggregation: return only unique values in given matrix
   
       * SystemDS
   
           X = [[1, 1], [2, 2], [3, 3]]
           unique(X) will return X' = [[1], [2], [3]]
       
       * R
   
       This is similar to how unique() operates on vectors in R:
   
       ```R
           > df <- c(1, 1, 2, 2, 3, 3)
           > df
           [1] 1 1 2 2 3 3
           > unique(df)
           [1] 1 2 3
       ```
   
       The difference is that SystemDS' unique() will support the same for not 
only vectors, but also matrices.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to