[
https://issues.apache.org/jira/browse/SYSTEMML-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias Boehm updated SYSTEMML-2169:
-------------------------------------
Description:
The introduction of nary cbind and rbinds in SYSTEMML-1986 added support for
operations like {{E = cbind(A,B,C,D)}} which concatenates the matrices A, B, C,
D column-wise without the need for intermediates as requires by traditional
binary cbind operations ({{cbind(cbind(cbind(A,B),C),D)}}). SystemML also
provides rewrites to automatically collapse chains of cbind or rbind operations
into their nary counter-parts.
However, for distributed spark operations, the binary cbind is still much
better optimized than the nary operation, which only provides a general case
operation based on repartition joins.
This tasks aims to address this by extending {{BuiltinNarySPInstruction}} at
runtime level (i.e., within {{processInstruction}}). Given the unlimited number
of inputs, this runtime approach seems more appropriate than dedicated physical
operations at compiler level. In detail, we need to evaluate if a subset of
input fits into the broadcast budget, and if so provide alternative code path
for nary cbind/rbind operations with broadcast joins.
Note that distributed codegen operations have a similar characteristics of
unlimited inputs and already leverage broadcast variables when possible. Hence,
we can probably use a similar approach as done in {{SpoofSPInstruction}}.
was:
The introduction of nary cbind and rbinds in SYSTEMML-1986 added support for
operations like {{E = cbind(A,B,C,D)}} which concatenates the matrices A, B, C,
D column-wise without the need for intermediates as requires by traditional
binary cbind operations ({{cbind(cbind(cbind(A,B),C),D)}}). SystemML also
provides rewrites to automatically collapse chains of cbind or rbind operations
into their nary counter-parts.
However, for distributed spark operations, the binary cbind is still much
better optimized than the nary operation, which only provides a general case
operation based on repartition joins.
This tasks aims to address this by extending {{BuiltinNarySPInstruction}} at
runtime level. Given the unlimited number of inputs, this runtime approach
seems more appropriate than dedicated physical operations at compiler level. In
detail, we need to evaluate if a subset of input fits into the broadcast
budget, and if so provide alternative code path for nary cbind/rbind operations
with broadcast joins.
Note that distributed codegen operations have a similar characteristics of
unlimited inputs and already leverage broadcast variables when possible. Hence,
we can probably use a similar approach as done in {{SpoofSPInstruction}}.
> Spark nary cbind/rbind with broadcasts
> --------------------------------------
>
> Key: SYSTEMML-2169
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2169
> Project: SystemML
> Issue Type: Task
> Reporter: Matthias Boehm
> Priority: Major
> Labels: beginner
>
> The introduction of nary cbind and rbinds in SYSTEMML-1986 added support for
> operations like {{E = cbind(A,B,C,D)}} which concatenates the matrices A, B,
> C, D column-wise without the need for intermediates as requires by
> traditional binary cbind operations ({{cbind(cbind(cbind(A,B),C),D)}}).
> SystemML also provides rewrites to automatically collapse chains of cbind or
> rbind operations into their nary counter-parts.
> However, for distributed spark operations, the binary cbind is still much
> better optimized than the nary operation, which only provides a general case
> operation based on repartition joins.
> This tasks aims to address this by extending {{BuiltinNarySPInstruction}} at
> runtime level (i.e., within {{processInstruction}}). Given the unlimited
> number of inputs, this runtime approach seems more appropriate than dedicated
> physical operations at compiler level. In detail, we need to evaluate if a
> subset of input fits into the broadcast budget, and if so provide alternative
> code path for nary cbind/rbind operations with broadcast joins.
> Note that distributed codegen operations have a similar characteristics of
> unlimited inputs and already leverage broadcast variables when possible.
> Hence, we can probably use a similar approach as done in
> {{SpoofSPInstruction}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)