[
https://issues.apache.org/jira/browse/SYSTEMML-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755685#comment-15755685
]
Matthias Boehm commented on SYSTEMML-1156:
------------------------------------------
thanks for reporting this [~iyounus] - I had a look and this is an issue of our
checkpoint injection rewrite (preparation step not the actual checkpoint
compilation), which is only applied in case of spark exec modes. This bug
affect scripts, where an index identifier (H[,i]) is the only consumer on a
matrix that is amendable to checkpointing (spark caching).
> problem with MLContext and QR
> -----------------------------
>
> Key: SYSTEMML-1156
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1156
> Project: SystemML
> Issue Type: Bug
> Components: Runtime
> Environment: spark 1.6.2
> centOS7
> Reporter: Imran Younus
>
> I'm trying to run this simple code to get QR
> {code}
> X = rand(rows=4, cols=2)
> [H, R] = qr(X)
> print(toString(H))
> print ("X is of size : " + nrow(X) + "," + ncol(X))
> print ("H is of size : " + nrow(H) + "," + ncol(H))
> print ("R is of size : " + nrow(R) + "," + ncol(R))
> n = ncol(H)
> for( j in n:1 ) {
> print(j);
> V = H[,j];
> print ("V is of size : " + nrow(V) + "," + ncol(V))
> VTV = t(V) %*% V
> print(toString(VTV))
> }
> {code}
> I ran this in CP mode and in hybrid spark mode.
> In the CP mode this works perfectly fine.
> But, when I run this with spark then the behavior is strange.
> The problem is that inside the for loop, when I assign {{H\[,j\]}} to {{V}},
> it becomes {{H}} instead of just a column of {{H}}. So, {{VTV}} then becomes
> a matrix instead of just a number which I want. This only happens inside the
> for loop. If I do this without for loop then there is no problem. Also, this
> is occurs only for matrix {{H}}. If I replace {{H}} with {{X}} instead, then
> there is no problem. Here is the out of the code when I run it with spark:
> {code}
> 16/12/16 11:53:27 INFO api.DMLScript: BEGIN DML run 12/16/2016 11:53:27
> 16/12/16 11:53:27 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> X is of size : 4,2
> H is of size : 4,2
> R is of size : 4,2
> 1.526 0.000
> 0.459 1.905
> 0.280 -0.202
> 0.659 0.373
> 2
> V is of size : 4,1
> 3.051 1.064
> 1.064 3.811
> 1
> V is of size : 4,1
> 3.051 1.064
> 1.064 3.811
> 16/12/16 11:53:27 INFO api.DMLScript: SystemML Statistics:
> Total execution time: 0.624 sec.
> Number of executed Spark inst: 0.
> 16/12/16 11:53:27 INFO api.DMLScript: END DML run 12/16/2016 11:53:27
> {code}
> As you can see from the output, the size of {{V}} is correct. Its supposed to
> be a column vector. But, {{VTV}} is a 2x2 matrix instead of a number because
> {{V}} is just {{H}}. We print {{V}} and see that.
> Here is correct output form CP mode:
> {code}
> ================================================================================
> ================================================================================
> 16/12/16 11:54:56 INFO api.DMLScript: BEGIN DML run 12/16/2016 11:54:56
> 16/12/16 11:54:57 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> X is of size : 4,2
> H is of size : 4,2
> R is of size : 4,2
> 1.575 0.000
> 0.476 1.591
> 0.296 -0.772
> 0.596 0.233
> 2
> V is of size : 4,1
> 3.182
> 1
> V is of size : 4,1
> 3.151
> 16/12/16 11:54:57 INFO api.DMLScript: SystemML Statistics:
> Total execution time: 0.199 sec.
> Number of executed MR Jobs: 0.
> 16/12/16 11:54:57 INFO api.DMLScript: END DML run 12/16/2016 11:54:57
> {code}
> [~mboehm7] [~niketanpansare]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)