[ 
https://issues.apache.org/jira/browse/SYSTEMML-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1977:
-------------------------------------
    Description: 
On Kmeans, the fusion heuristic fnr is failing with index of of bounds on 
distributed (i.e., spark) codegen row operations. The root cause is misplaced 
meta data management, that implicitly assumes that the first side input is 
broadcast, which fails if this side input is also large and taken as an 
additional rdd input. Specifically, its failing when executing the following 
operator:

{code}
public final class TMP64 extends SpoofRowwise { 
  public TMP64() {
    super(RowType.COL_AGG_B1_T, -1, false, 1);
  }
  protected void genexec(double[] a, int ai, SideInput[] b, double[] scalars, 
double[] c, int len, int rix) { 
    LibSpoofPrimitives.vectOuterMultAdd(a, b[0].values(rix), c, ai, 
b[0].pos(rix), 0, len, b[0].clen);
  }
  protected void genexec(double[] avals, int[] aix, int ai, SideInput[] b, 
double[] scalars, double[] c, int alen, int len, int rix) { 
    LibSpoofPrimitives.vectOuterMultAdd(avals, b[0].values(rix), c, aix, ai, 
b[0].pos(rix), 0, alen, len, b[0].clen);
  }
}
{code}

  was:On Kmeans, the fusion heuristic fnr is failing with index of of bounds on 
distributed (i.e., spark) codegen row operations. The root cause is misplaced 
meta data management, that implicitly assumes that the first side input is 
broadcast, which fails if this side input is also large and taken as an 
additional rdd input.


> Codegen spark row ops failing w/ index-out-of-bounds
> ----------------------------------------------------
>
>                 Key: SYSTEMML-1977
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1977
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Matthias Boehm
>
> On Kmeans, the fusion heuristic fnr is failing with index of of bounds on 
> distributed (i.e., spark) codegen row operations. The root cause is misplaced 
> meta data management, that implicitly assumes that the first side input is 
> broadcast, which fails if this side input is also large and taken as an 
> additional rdd input. Specifically, its failing when executing the following 
> operator:
> {code}
> public final class TMP64 extends SpoofRowwise { 
>   public TMP64() {
>     super(RowType.COL_AGG_B1_T, -1, false, 1);
>   }
>   protected void genexec(double[] a, int ai, SideInput[] b, double[] scalars, 
> double[] c, int len, int rix) { 
>     LibSpoofPrimitives.vectOuterMultAdd(a, b[0].values(rix), c, ai, 
> b[0].pos(rix), 0, len, b[0].clen);
>   }
>   protected void genexec(double[] avals, int[] aix, int ai, SideInput[] b, 
> double[] scalars, double[] c, int alen, int len, int rix) { 
>     LibSpoofPrimitives.vectOuterMultAdd(avals, b[0].values(rix), c, aix, ai, 
> b[0].pos(rix), 0, alen, len, b[0].clen);
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to