Matthias Boehm updated SYSTEMML-2169:
    Labels: beginner  (was: )

> Spark nary cbind/rbind with broadcasts
> --------------------------------------
>                 Key: SYSTEMML-2169
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2169
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>            Priority: Major
>              Labels: beginner
> The introduction of nary cbind and rbinds in SYSTEMML-1986 added support for 
> operations like {{E = cbind(A,B,C,D)}} which concatenates the matrices A, B, 
> C, D column-wise without the need for intermediates as requires by 
> traditional binary cbind operations ({{cbind(cbind(cbind(A,B),C),D)}}). 
> SystemML also provides rewrites to automatically collapse chains of cbind or 
> rbind operations into their nary counter-parts. 
> However, for distributed spark operations, the binary cbind is still much 
> better optimized than the nary operation, which only provides a general case 
> operation based on repartition joins. 
> This tasks aims to address this by extending {{BuiltinNarySPInstruction}} at 
> runtime level. Given the unlimited number of inputs, this runtime approach 
> seems more appropriate than dedicated physical operations at compiler level. 
> In detail, we need to evaluate if a subset of input fits into the broadcast 
> budget, and if so provide alternative code path for nary cbind/rbind 
> operations with broadcast joins.
> Note that distributed codegen operations have a similar characteristics of 
> unlimited inputs and already leverage broadcast variables when possible. 
> Hence, we can probably use a similar approach as done in 
> {{SpoofSPInstruction}}.

This message was sent by Atlassian JIRA

Reply via email to