[ 
https://issues.apache.org/jira/browse/SYSTEMML-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826951#comment-15826951
 ] 

Mike Dusenberry commented on SYSTEMML-1185:
-------------------------------------------

[PR 347 | https://github.com/apache/incubator-systemml/pull/347] created.

{quote}
This is the initial commit of the SystemML breast cancer project!

Please reference the attached `README.md` for an overview, background 
information, goals, our approach, etc.

At a high level, this PR introduces the following new files/folders:
* `README.md`: Project information, etc.
* `Preprocessing.ipynb`: PySpark notebook for preprocessing our histopathology 
slides into an appropriate `DataFrame` for consumption by SystemML.
* `MachineLearning.ipynb`: PySpark/SystemML notebook for our machine learning 
approach thus far.  We started simple, and are currently in need of engine 
improvements in order to proceed forward.
* `softmax_clf.dml`: Basic softmax model (multiclass logistic regression with 
normalized probabilities) as a sanity check.
* `convnet.dml`: Our current deep convnet model.  We are starting simple with a 
slightly extended "LeNet"-like network architecture.  The goal will be to 
improve engine performance so that this model can be efficiently trained, and 
then move on to larger, more recent types of model architectures.
* `hyperparam_tuning.dml`: A separate script for performing a hyperparameter 
search for our current convnet model.  This has been extracted from the 
notebook as the current `parfor` engine implementation is not yet sufficient 
for this type of necessary job.
* `data`: A placeholder folder into which the data could be downloaded.
* `nn`: A softlink that will point to the SystemML-NN library.
* `approach.svg`: Image of our overall pipeline used in `README.md`.

Overall, I want this project to serve as a large-scale, end-to-end machine 
learning project that can drive necessary core improvements for SystemML.
{quote}

> SystemML Breast Cancer Project
> ------------------------------
>
>                 Key: SYSTEMML-1185
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1185
>             Project: SystemML
>          Issue Type: New Feature
>            Reporter: Mike Dusenberry
>            Assignee: Mike Dusenberry
>
> This issue tracks the new SystemML breast cancer project!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to