[
https://issues.apache.org/jira/browse/SYSTEMML-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Niketan Pansare updated SYSTEMML-1160:
--------------------------------------
Affects Version/s: SystemML 1.0
> Enable Prefetching of Mini-Batches
> ----------------------------------
>
> Key: SYSTEMML-1160
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1160
> Project: SystemML
> Issue Type: New Feature
> Affects Versions: SystemML 1.0
> Reporter: Mike Dusenberry
> Priority: Blocker
>
> For efficient training of large deep learning models, a mini-batch training
> approach is preferred. On SystemML with the Spark backend, this currently
> equates to grabbing a mini-batch from an RDD (via a PartitionPruning RDD --
> see SYSTEMML-951), and then using entirely single-node instructions for each
> mini-batch. While the fetching of partitions has been made efficient, we
> currently have to pause after each training step to grab the next partition.
> For large models, training time is already an issue even for GPUs with
> saturated input pipelines. Thus, we need to enable prefetching of
> mini-batches that runs in parallel to the training loop. One possibility
> would be to create an input queue that is fed from a prefetch thread, and
> that then feeds the training loop.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)