juliale-15 opened a new pull request #1152:
URL: https://github.com/apache/systemds/pull/1152


   @A-Postl and I have created this first version of the python script 
generator.
   We've divided the generator into:
   
   1. A `parser.py` which takes the *.dml scripts and uses the function 
definition to create a python script. Additionally the parse parses the dml 
header. The information in the header is used to create a documentation for the 
python script as well as to check if header information matches the dml 
function definition.
   2. A `generator.py` which takes the parsed data from the `parser.py` and 
generates a python file. Additionally the generator creates an `__init__.py` 
file where all generated scripts are added.
   
   The scripts are placed under the folder structure 
`systemds/operator/algorithm/builtin`.
   We have changed a set of dml files in a way that our generator accepts them 
(kmeans, kmeanspredict, lm, l2svm, pca, multiLogReg, multiLogRegpredict). Those 
are also builtin functions which already existed in the python API because we 
could then reuse the testcases for those functions.
   
   Our generated scripts can be imported as before when the functions where 
placed in `algorthms.py`. Therefore we think that the file `algorithm.py` can 
be removed.
   
   However we have 2 concerns regarding the *.dml scripts.
   1. Some builtin functions such as `multiLogReg` for example require 
additional shape checks. 
   ````
       if -1 in x.shape:
           output_shape = (-1,)
       else:
           output_shape = (x.shape[1],)
   ````
   We were wondering how our parses could use the dml file to create those 
checks, since they kind of depend on the function itself. 
   
   2. In the `pca.dml` file 4 return values are defined. However for the python 
script we were only allowed to set the `OperationNode`like this 
`number_of_outputs=2, output_types=[OutputType.MATRIX, OutputType.MATRIX]`.
   Why is it that the `pca.dml` defined 4 return values but the OperationNode 
only allowes the `number_of_outputs=2`?
   
   We would really appreciate some feedback and maybe some suggestions on how 
we could handle the two problems mentioned above.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to