[jira] [Comment Edited] (SYSTEMML-2336) Data partition

Matthias Boehm (JIRA) Sun, 20 May 2018 16:55:15 -0700

    [ 
https://issues.apache.org/jira/browse/SYSTEMML-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482086#comment-16482086
 ]


Matthias Boehm edited comment on SYSTEMML-2336 at 5/20/18 11:54 PM:
--------------------------------------------------------------------

I would recommend to simply leverage existing operations. Similar to our 
approach for constant folding, you can temporarily construct hops and execute 
the generated instructions to perform the data partitioning. In detail, here is 
how I would map the different schemes:
* Disjoint_Contiguous: for each worker, use a right indexing operation 
{{X[beg:end,]}} to obtain contiguous, non-overlapping partitions of rows.
* Disjoint_Round_Robin: for each worker, use a permutation multiply or simpler 
a removeEmpty such as {{removeEmpty(target=X, margin=rows, 
select=(seq(1,nrow(X))%%k)==id)}}.
* Disjoint_Random: for each worker, use a permutation multiply {{P[beg:end,] 
%*% X}}, where P is constructed for example with 
{{P=table(seq(1,nrow(X),sample(nrow(X), nrow(X))))}}, i.e., sampling without 
replacement to ensure disjointness.
* Overlap_Reshuffle: Similar to the above, except you create a new permutation 
matrix for each worker and without the indexing on P.

It's probably a good idea to start simple. Hence, I would recommend to 
implement disjoint_contiguous first, and get a basic local parameter server 
running. Once, this is done, we can come back to the other data partitioning 
schemes.


was (Author: mboehm7):
I would recommend to simply leverage existing operations. Similar to our 
approach for constant folding, you can temporarily constructs hops and execute 
the generated instructions to perform the data partitioning. In detail, here is 
how I would map the different schemes:
* Disjoint_Contiguous: for each worker, use a right indexing operation 
{{X[beg:end,]}} to obtain contiguous, non-overlapping partitions of rows.
* Disjoint_Round_Robin: for each worker, use a permutation multiply or simpler 
a removeEmpty such as {{removeEmpty(target=X, margin=rows, 
select=(seq(1,nrow(X))%%k)==id)}}.
* Disjoint_Random: for each worker, use a permutation multiply {{P[beg:end,] 
%*% X}}, where P is constructed for example with 
{{P=table(seq(1,nrow(X),sample(nrow(X), nrow(X))))}}, i.e., sampling without 
replacement to ensure disjointness.
* Overlap_Reshuffle: Similar to the above, except you create a new permutation 
matrix for each worker and without the indexing on P.

It's probably a good idea to start simple. Hence, I would recommend to 
implement disjoint_contiguous first, and get a basic local parameter server 
running. Once, this is done, we can come back to the other data partitioning 
schemes.

> Data partition
> --------------
>
>                 Key: SYSTEMML-2336
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2336
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> It aims to implement the four different schemes (i.e., disjoint_contiguous, 
> disjoint_round_robin, disjoint_random, overlap_reshuffle) of data partition 
> for paramserv builtin function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (SYSTEMML-2336) Data partition

Reply via email to