[
https://issues.apache.org/jira/browse/SYSTEMML-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482086#comment-16482086
]
Matthias Boehm edited comment on SYSTEMML-2336 at 5/20/18 11:54 PM:
--------------------------------------------------------------------
I would recommend to simply leverage existing operations. Similar to our
approach for constant folding, you can temporarily construct hops and execute
the generated instructions to perform the data partitioning. In detail, here is
how I would map the different schemes:
* Disjoint_Contiguous: for each worker, use a right indexing operation
{{X[beg:end,]}} to obtain contiguous, non-overlapping partitions of rows.
* Disjoint_Round_Robin: for each worker, use a permutation multiply or simpler
a removeEmpty such as {{removeEmpty(target=X, margin=rows,
select=(seq(1,nrow(X))%%k)==id)}}.
* Disjoint_Random: for each worker, use a permutation multiply {{P[beg:end,]
%*% X}}, where P is constructed for example with
{{P=table(seq(1,nrow(X),sample(nrow(X), nrow(X))))}}, i.e., sampling without
replacement to ensure disjointness.
* Overlap_Reshuffle: Similar to the above, except you create a new permutation
matrix for each worker and without the indexing on P.
It's probably a good idea to start simple. Hence, I would recommend to
implement disjoint_contiguous first, and get a basic local parameter server
running. Once, this is done, we can come back to the other data partitioning
schemes.
was (Author: mboehm7):
I would recommend to simply leverage existing operations. Similar to our
approach for constant folding, you can temporarily constructs hops and execute
the generated instructions to perform the data partitioning. In detail, here is
how I would map the different schemes:
* Disjoint_Contiguous: for each worker, use a right indexing operation
{{X[beg:end,]}} to obtain contiguous, non-overlapping partitions of rows.
* Disjoint_Round_Robin: for each worker, use a permutation multiply or simpler
a removeEmpty such as {{removeEmpty(target=X, margin=rows,
select=(seq(1,nrow(X))%%k)==id)}}.
* Disjoint_Random: for each worker, use a permutation multiply {{P[beg:end,]
%*% X}}, where P is constructed for example with
{{P=table(seq(1,nrow(X),sample(nrow(X), nrow(X))))}}, i.e., sampling without
replacement to ensure disjointness.
* Overlap_Reshuffle: Similar to the above, except you create a new permutation
matrix for each worker and without the indexing on P.
It's probably a good idea to start simple. Hence, I would recommend to
implement disjoint_contiguous first, and get a basic local parameter server
running. Once, this is done, we can come back to the other data partitioning
schemes.
> Data partition
> --------------
>
> Key: SYSTEMML-2336
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2336
> Project: SystemML
> Issue Type: Sub-task
> Reporter: LI Guobao
> Assignee: LI Guobao
> Priority: Major
>
> It aims to implement the four different schemes (i.e., disjoint_contiguous,
> disjoint_round_robin, disjoint_random, overlap_reshuffle) of data partition
> for paramserv builtin function.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)