[ 
https://issues.apache.org/jira/browse/SYSTEMML-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-1160:
--------------------------------------
    Affects Version/s: SystemML 1.0

> Enable Prefetching of Mini-Batches
> ----------------------------------
>
>                 Key: SYSTEMML-1160
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1160
>             Project: SystemML
>          Issue Type: New Feature
>    Affects Versions: SystemML 1.0
>            Reporter: Mike Dusenberry
>            Priority: Blocker
>
> For efficient training of large deep learning models, a mini-batch training 
> approach is preferred.  On SystemML with the Spark backend, this currently 
> equates to grabbing a mini-batch from an RDD (via a PartitionPruning RDD -- 
> see SYSTEMML-951), and then using entirely single-node instructions for each 
> mini-batch.  While the fetching of partitions has been made efficient, we 
> currently have to pause after each training step to grab the next partition.  
> For large models, training time is already an issue even for GPUs with 
> saturated input pipelines.  Thus, we need to enable prefetching of 
> mini-batches that runs in parallel to the training loop.  One possibility 
> would be to create an input queue that is fed from a prefetch thread, and 
> that then feeds the training loop. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to