[SYSTEMML-1185] SystemML Breast Cancer Project This is the initial commit of the SystemML breast cancer project!
Please reference the attached `README.md` for an overview, background information, goals, our approach, etc. At a high level, this PR introduces the following new files/folders: * `README.md`: Project information, etc. * `Preprocessing.ipynb`: PySpark notebook for preprocessing our histopathology slides into an appropriate `DataFrame` for consumption by SystemML. * `MachineLearning.ipynb`: PySpark/SystemML notebook for our machine learning approach thus far. We started simple, and are currently in need of engine improvements in order to proceed forward. * `softmax_clf.dml`: Basic softmax model (multiclass logistic regression with normalized probabilities) as a sanity check. * `convnet.dml`: Our current deep convnet model. We are starting simple with a slightly extended "LeNet"-like network architecture. The goal will be to improve engine performance so that this model can be efficiently trained, and then move on to larger, more recent types of model architectures. * `hyperparam_tuning.dml`: A separate script for performing a hyperparameter search for our current convnet model. This has been extracted from the notebook as the current `parfor` engine implementation is not yet sufficient for this type of necessary job. * `data`: A placeholder folder into which the data could be downloaded. * `nn`: A softlink that will point to the SystemML-NN library. * `approach.svg`: Image of our overall pipeline used in `README.md`. Overall, this project aim to serve as a large-scale, end-to-end machine learning project that can drive necessary core improvements for SystemML. Closes #347 Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/cc6f3c7e Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/cc6f3c7e Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/cc6f3c7e Branch: refs/heads/gh-pages Commit: cc6f3c7ea934e19bca13f0359cfe3fa63398dbe0 Parents: 94cf7c1 Author: Mike Dusenberry <[email protected]> Authored: Fri Jan 20 12:02:55 2017 -0800 Committer: Mike Dusenberry <[email protected]> Committed: Fri Jan 20 12:02:55 2017 -0800 ---------------------------------------------------------------------- img/projects/breast_cancer/approach.svg | 4 ++++ 1 file changed, 4 insertions(+) ----------------------------------------------------------------------
