[SYSTEMML-1185] SystemML Breast Cancer Project

This is the initial commit of the SystemML breast cancer project!

Please reference the attached `README.md` for an overview, background
information, goals, our approach, etc.

At a high level, this PR introduces the following new files/folders:
* `README.md`: Project information, etc.
* `Preprocessing.ipynb`: PySpark notebook for preprocessing our
histopathology slides into an appropriate `DataFrame` for consumption by
SystemML.
* `MachineLearning.ipynb`: PySpark/SystemML notebook for our machine
learning approach thus far.  We started simple, and are currently in
need of engine improvements in order to proceed forward.
* `softmax_clf.dml`: Basic softmax model (multiclass logistic regression
with normalized probabilities) as a sanity check.
* `convnet.dml`: Our current deep convnet model.  We are starting simple
with a slightly extended "LeNet"-like network architecture.  The goal
will be to improve engine performance so that this model can be
efficiently trained, and then move on to larger, more recent types of
model architectures.
* `hyperparam_tuning.dml`: A separate script for performing a
hyperparameter search for our current convnet model.  This has been
extracted from the notebook as the current `parfor` engine
implementation is not yet sufficient for this type of necessary job.
* `data`: A placeholder folder into which the data could be downloaded.
* `nn`: A softlink that will point to the SystemML-NN library.
* `approach.svg`: Image of our overall pipeline used in `README.md`.

Overall, this project aim to serve as a large-scale, end-to-end
machine learning project that can drive necessary core improvements for
SystemML.

Closes #347


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/cc6f3c7e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/cc6f3c7e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/cc6f3c7e

Branch: refs/heads/gh-pages
Commit: cc6f3c7ea934e19bca13f0359cfe3fa63398dbe0
Parents: 94cf7c1
Author: Mike Dusenberry <[email protected]>
Authored: Fri Jan 20 12:02:55 2017 -0800
Committer: Mike Dusenberry <[email protected]>
Committed: Fri Jan 20 12:02:55 2017 -0800

----------------------------------------------------------------------
 img/projects/breast_cancer/approach.svg | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


Reply via email to