[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868468#comment-15868468
 ] 

Imran Younus edited comment on SYSTEMML-1238 at 2/15/17 8:04 PM:
-----------------------------------------------------------------

I tested LinearRegCG.dml script with the same data set that is being used in 
this test and get the correct results from the dml script. Here is how I ran it:

{code}
$SPARK_HOME/bin/spark-submit --master=local --driver-memory=6g 
$SYSTEMML_HOME/target/SystemML.jar -f 
$SYSTEMML_HOME/scripts/algorithms/LinearRegCG.dml -nvargs 
X=/user/iyounus/data/diabetes_X_train.txt 
Y=/user/iyounus/data/diabetes_y_train.txt B="beta.txt" icpt=1
{code}

Here are the stats:

{code}
Running the CG algorithm...
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.013822097249150999
Iteration 2:  ||r|| / ||r init|| = 7.063617429825055E-14
Warning: the maximum number of iterations has been reached.
The CG algorithm is done.
Computing the statistics...
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-1.081722178918495E-11
STDEV_RES_Y,63.03850633761024
DISPERSION,3973.8532812769263
PLAIN_R2,0.3351312506863876
ADJUSTED_R2,0.33354822985468857
PLAIN_R2_NOBIAS,0.3351312506863876
ADJUSTED_R2_NOBIAS,0.33354822985468857
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
17/02/15 11:45:20 INFO api.DMLScript: SystemML Statistics:
Total execution time:           0.374 sec.
Number of executed Spark inst:  2.
{code}

The values of betas I get from this script are 
{code}
1 1 938.2368795072023
2 1 152.91886229044422
{code}

But if I run the python test, then I get incorrect results. Just to complete, 
here is how I'm running the test:

{code}
$SPARK_HOME/bin/spark-submit --master=local --driver-memory=6g  
--driver-class-path $SYSTEMML_HOME/target/SystemML.jar test_mllearn_df.py
{code}

and here are the stats:

{code}
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.01378813951373333
Iteration 2:  ||r|| / ||r init|| = 4.3730800595678527E-14
The CG algorithm is done.
Computing the statistics...
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-6.688193969161777E-12
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
{code}

and the values of betas are {{458.489, 153.146}}.

I hope this helps.


was (Author: iyounus):
I tested LinearRegCG.dml script with the same data set that is being used in 
this test and get the correct results from the dml script. Here is how I ran it:

{code}
$SPARK_HOME/bin/spark-submit --master=local --driver-memory=6g 
$SYSTEMML_HOME/target/SystemML.jar -f 
$SYSTEMML_HOME/scripts/algorithms/LinearRegCG.dml -nvargs 
X=/user/iyounus/data/diabetes_X_train.txt 
Y=/user/iyounus/data/diabetes_y_train.txt B="beta.txt" icpt=1
{code}

But if I run the python test, then I get incorrect results. Just to complete, 
here is how I'm running the test:

{code}
$SPARK_HOME/bin/spark-submit --master=local --driver-memory=6g  
--driver-class-path $SYSTEMML_HOME/target/SystemML.jar test_mllearn_df.py
{code}

I hope this helps.

> Python test failing for LinearRegCG
> -----------------------------------
>
>                 Key: SYSTEMML-1238
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1238
>             Project: SystemML
>          Issue Type: Bug
>          Components: Algorithms, APIs
>    Affects Versions: SystemML 0.13
>            Reporter: Imran Younus
>            Assignee: Niketan Pansare
>         Attachments: python_LinearReg_test_spark.1.6.log, 
> python_LinearReg_test_spark.2.1.log
>
>
> [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) 
> with spark 2.1.0 was failing because the test score from linear regression 
> was very low ({{~ 0.24}}). I did a some investigation and it turns out the 
> the model parameters computed by the dml script are incorrect. In 
> systemml.12, the values of betas from linear regression model are 
> {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I 
> also tested this with sklearn). But the values of betas from systemml.13 
> (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not 
> correct and therefore the test score is much lower than expected. The data 
> going into DML script is correct. I printed out the valued of {{X}} and {{Y}} 
> in dml and I didn't see any issue there.
> Attached are the log files for two different tests (systemml0.12 and 0.13) 
> with explain flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to