[
https://issues.apache.org/jira/browse/SYSTEMML-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826951#comment-15826951
]
Mike Dusenberry commented on SYSTEMML-1185:
-------------------------------------------
[PR 347 | https://github.com/apache/incubator-systemml/pull/347] created.
{quote}
This is the initial commit of the SystemML breast cancer project!
Please reference the attached `README.md` for an overview, background
information, goals, our approach, etc.
At a high level, this PR introduces the following new files/folders:
* `README.md`: Project information, etc.
* `Preprocessing.ipynb`: PySpark notebook for preprocessing our histopathology
slides into an appropriate `DataFrame` for consumption by SystemML.
* `MachineLearning.ipynb`: PySpark/SystemML notebook for our machine learning
approach thus far. We started simple, and are currently in need of engine
improvements in order to proceed forward.
* `softmax_clf.dml`: Basic softmax model (multiclass logistic regression with
normalized probabilities) as a sanity check.
* `convnet.dml`: Our current deep convnet model. We are starting simple with a
slightly extended "LeNet"-like network architecture. The goal will be to
improve engine performance so that this model can be efficiently trained, and
then move on to larger, more recent types of model architectures.
* `hyperparam_tuning.dml`: A separate script for performing a hyperparameter
search for our current convnet model. This has been extracted from the
notebook as the current `parfor` engine implementation is not yet sufficient
for this type of necessary job.
* `data`: A placeholder folder into which the data could be downloaded.
* `nn`: A softlink that will point to the SystemML-NN library.
* `approach.svg`: Image of our overall pipeline used in `README.md`.
Overall, I want this project to serve as a large-scale, end-to-end machine
learning project that can drive necessary core improvements for SystemML.
{quote}
> SystemML Breast Cancer Project
> ------------------------------
>
> Key: SYSTEMML-1185
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1185
> Project: SystemML
> Issue Type: New Feature
> Reporter: Mike Dusenberry
> Assignee: Mike Dusenberry
>
> This issue tracks the new SystemML breast cancer project!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)