[jira] [Updated] (SYSTEMML-1185) SystemML Breast Cancer Project

Mike Dusenberry (JIRA) Wed, 22 Mar 2017 11:53:07 -0700

     [ 
https://issues.apache.org/jira/browse/SYSTEMML-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mike Dusenberry updated SYSTEMML-1185:
--------------------------------------
    Description: 
This issue tracks the new SystemML breast cancer project!

>From a systems perspective, we aim to support multi-node, multi-GPU 
>distributed SGD training to support large-scale experiments for the specific 
>breast cancer use case.

To achieve this goal, the following steps as necessary:
# Single-node, CPU mini-batch SGD training (1 mini-batch at a time).
# Single-node, single-GPU mini-batch SGD training (1 mini-batch at a time).
# Single-node, multi-GPU data-parallel mini-batch SGD training (`n` parallel 
mini-batches for `n` GPUs at a time).
# Multi-node, CPU data-parallel mini-batch SGD training (`n` parallel 
mini-batches for `n` parallel tasks at a time).
# Multi-node, single-GPU data-parallel mini-batch SGD training (`n` parallel 
mini-batches for `n` total GPUs across the cluster at a time).
# Multi-node, multi-GPU data-parallel mini-batch SGD training (`n` parallel 
mini-batches for `n` total GPUs across the cluster at a time).

----

Here is a list of past and present JIRA epics and issues that have blocked, or 
are currently blocking progress on the breast cancer project.
 
Overall Deep Learning Epic
  * https://issues.apache.org/jira/browse/SYSTEMML-540
  *This is the overall "Deep Learning" JIRA epic, with all issues either within 
or related to the epic.

Past
* https://issues.apache.org/jira/browse/SYSTEMML-633
* https://issues.apache.org/jira/browse/SYSTEMML-951
  ** Issue that completely blocked mini-batch training approaches.
* https://issues.apache.org/jira/browse/SYSTEMML-914
  ** Epic containing issues related to input DataFrame conversions that blocked 
getting data into the system entirely.  Most of the issues specifically refer 
to existing, internal converters.  993 was a particularly large issue, and 
triggered a large body of work related to internal memory estimates that were 
incorrect.  Also see 919, 946, & 994.
* https://issues.apache.org/jira/browse/SYSTEMML-1076
* https://issues.apache.org/jira/browse/SYSTEMML-1077
* https://issues.apache.org/jira/browse/SYSTEMML-948

Present
* https://issues.apache.org/jira/browse/SYSTEMML-1160
  ** Current open blocker to efficiently using a stochastic gradient descent 
approach.
* https://issues.apache.org/jira/browse/SYSTEMML-1078
  ** Current open blocker to training even an initial deep learning model for 
the project.  This is another example of an internal compiler bug.
* https://issues.apache.org/jira/browse/SYSTEMML-686
  ** We need distributed convolution and max pooling operators.
* https://issues.apache.org/jira/browse/SYSTEMML-1159
  ** This is the main issue that discusses the need for the `parfor` construct 
to support efficient, parallel hyperparameter tuning on a cluster with large 
datasets.  The broken remote parfor in 1129 blocked this issue, which in turned 
blocked any meaningful work on training a deep neural net for the project.
* https://issues.apache.org/jira/browse/SYSTEMML-1142
  ** This was one of the blockers to doing hyperparameter tuning.
* https://issues.apache.org/jira/browse/SYSTEMML-1129
  ** This is an epic for the issue in which the `parfor` construct was broken 
for remote Spark cases, and was one of the blockers for doing hyperparameter 
tuning.

  was:
This issue tracks the new SystemML breast cancer project!

Here is a list of past and present JIRA epics and issues that have blocked, or 
are currently blocking progress on the breast cancer project.
 
Overall Deep Learning Epic
  * https://issues.apache.org/jira/browse/SYSTEMML-540
  *This is the overall "Deep Learning" JIRA epic, with all issues either within 
or related to the epic.

Past
* https://issues.apache.org/jira/browse/SYSTEMML-633
* https://issues.apache.org/jira/browse/SYSTEMML-951
  ** Issue that completely blocked mini-batch training approaches.
* https://issues.apache.org/jira/browse/SYSTEMML-914
  ** Epic containing issues related to input DataFrame conversions that blocked 
getting data into the system entirely.  Most of the issues specifically refer 
to existing, internal converters.  993 was a particularly large issue, and 
triggered a large body of work related to internal memory estimates that were 
incorrect.  Also see 919, 946, & 994.
* https://issues.apache.org/jira/browse/SYSTEMML-1076
* https://issues.apache.org/jira/browse/SYSTEMML-1077
* https://issues.apache.org/jira/browse/SYSTEMML-948

Present
* https://issues.apache.org/jira/browse/SYSTEMML-1160
  ** Current open blocker to efficiently using a stochastic gradient descent 
approach.
* https://issues.apache.org/jira/browse/SYSTEMML-1078
  ** Current open blocker to training even an initial deep learning model for 
the project.  This is another example of an internal compiler bug.
* https://issues.apache.org/jira/browse/SYSTEMML-686
  ** We need distributed convolution and max pooling operators.
* https://issues.apache.org/jira/browse/SYSTEMML-1159
  ** This is the main issue that discusses the need for the `parfor` construct 
to support efficient, parallel hyperparameter tuning on a cluster with large 
datasets.  The broken remote parfor in 1129 blocked this issue, which in turned 
blocked any meaningful work on training a deep neural net for the project.
* https://issues.apache.org/jira/browse/SYSTEMML-1142
  ** This was one of the blockers to doing hyperparameter tuning.
* https://issues.apache.org/jira/browse/SYSTEMML-1129
  ** This is an epic for the issue in which the `parfor` construct was broken 
for remote Spark cases, and was one of the blockers for doing hyperparameter 
tuning.


> SystemML Breast Cancer Project
> ------------------------------
>
>                 Key: SYSTEMML-1185
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1185
>             Project: SystemML
>          Issue Type: New Feature
>            Reporter: Mike Dusenberry
>            Assignee: Mike Dusenberry
>
> This issue tracks the new SystemML breast cancer project!
> From a systems perspective, we aim to support multi-node, multi-GPU 
> distributed SGD training to support large-scale experiments for the specific 
> breast cancer use case.
> To achieve this goal, the following steps as necessary:
> # Single-node, CPU mini-batch SGD training (1 mini-batch at a time).
> # Single-node, single-GPU mini-batch SGD training (1 mini-batch at a time).
> # Single-node, multi-GPU data-parallel mini-batch SGD training (`n` parallel 
> mini-batches for `n` GPUs at a time).
> # Multi-node, CPU data-parallel mini-batch SGD training (`n` parallel 
> mini-batches for `n` parallel tasks at a time).
> # Multi-node, single-GPU data-parallel mini-batch SGD training (`n` parallel 
> mini-batches for `n` total GPUs across the cluster at a time).
> # Multi-node, multi-GPU data-parallel mini-batch SGD training (`n` parallel 
> mini-batches for `n` total GPUs across the cluster at a time).
> ----
> Here is a list of past and present JIRA epics and issues that have blocked, 
> or are currently blocking progress on the breast cancer project.
>  
> Overall Deep Learning Epic
>   * https://issues.apache.org/jira/browse/SYSTEMML-540
>   *This is the overall "Deep Learning" JIRA epic, with all issues either 
> within or related to the epic.
> Past
> * https://issues.apache.org/jira/browse/SYSTEMML-633
> * https://issues.apache.org/jira/browse/SYSTEMML-951
>   ** Issue that completely blocked mini-batch training approaches.
> * https://issues.apache.org/jira/browse/SYSTEMML-914
>   ** Epic containing issues related to input DataFrame conversions that 
> blocked getting data into the system entirely.  Most of the issues 
> specifically refer to existing, internal converters.  993 was a particularly 
> large issue, and triggered a large body of work related to internal memory 
> estimates that were incorrect.  Also see 919, 946, & 994.
> * https://issues.apache.org/jira/browse/SYSTEMML-1076
> * https://issues.apache.org/jira/browse/SYSTEMML-1077
> * https://issues.apache.org/jira/browse/SYSTEMML-948
> Present
> * https://issues.apache.org/jira/browse/SYSTEMML-1160
>   ** Current open blocker to efficiently using a stochastic gradient descent 
> approach.
> * https://issues.apache.org/jira/browse/SYSTEMML-1078
>   ** Current open blocker to training even an initial deep learning model for 
> the project.  This is another example of an internal compiler bug.
> * https://issues.apache.org/jira/browse/SYSTEMML-686
>   ** We need distributed convolution and max pooling operators.
> * https://issues.apache.org/jira/browse/SYSTEMML-1159
>   ** This is the main issue that discusses the need for the `parfor` 
> construct to support efficient, parallel hyperparameter tuning on a cluster 
> with large datasets.  The broken remote parfor in 1129 blocked this issue, 
> which in turned blocked any meaningful work on training a deep neural net for 
> the project.
> * https://issues.apache.org/jira/browse/SYSTEMML-1142
>   ** This was one of the blockers to doing hyperparameter tuning.
> * https://issues.apache.org/jira/browse/SYSTEMML-1129
>   ** This is an epic for the issue in which the `parfor` construct was broken 
> for remote Spark cases, and was one of the blockers for doing hyperparameter 
> tuning.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (SYSTEMML-1185) SystemML Breast Cancer Project

Reply via email to