juliale-15 opened a new pull request #1152:
URL: https://github.com/apache/systemds/pull/1152
@A-Postl and I have created this first version of the python script
generator.
We've divided the generator into:
1. A `parser.py` which takes the *.dml scripts and uses the function
definition to create a python script. Additionally the parse parses the dml
header. The information in the header is used to create a documentation for the
python script as well as to check if header information matches the dml
function definition.
2. A `generator.py` which takes the parsed data from the `parser.py` and
generates a python file. Additionally the generator creates an `__init__.py`
file where all generated scripts are added.
The scripts are placed under the folder structure
`systemds/operator/algorithm/builtin`.
We have changed a set of dml files in a way that our generator accepts them
(kmeans, kmeanspredict, lm, l2svm, pca, multiLogReg, multiLogRegpredict). Those
are also builtin functions which already existed in the python API because we
could then reuse the testcases for those functions.
Our generated scripts can be imported as before when the functions where
placed in `algorthms.py`. Therefore we think that the file `algorithm.py` can
be removed.
However we have 2 concerns regarding the *.dml scripts.
1. Some builtin functions such as `multiLogReg` for example require
additional shape checks.
````
if -1 in x.shape:
output_shape = (-1,)
else:
output_shape = (x.shape[1],)
````
We were wondering how our parses could use the dml file to create those
checks, since they kind of depend on the function itself.
2. In the `pca.dml` file 4 return values are defined. However for the python
script we were only allowed to set the `OperationNode`like this
`number_of_outputs=2, output_types=[OutputType.MATRIX, OutputType.MATRIX]`.
Why is it that the `pca.dml` defined 4 return values but the OperationNode
only allowes the `number_of_outputs=2`?
We would really appreciate some feedback and maybe some suggestions on how
we could handle the two problems mentioned above.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]