Mike Dusenberry created SYSTEMML-1160:
-----------------------------------------
Summary: Enable Prefetching of Mini-Batches
Key: SYSTEMML-1160
URL: https://issues.apache.org/jira/browse/SYSTEMML-1160
Project: SystemML
Issue Type: New Feature
Reporter: Mike Dusenberry
Priority: Critical
For efficient training of large deep learning models, a mini-batch training
approach is preferred. On SystemML with the Spark backend, this currently
equates to grabbing a mini-batch from an RDD (via a PartitionPruning RDD -- see
SYSTEMML-951), and then using entirely single-node instructions for each
mini-batch. While the fetching of partitions has been made efficient, we
currently have to pause after each training step to grab the next partition.
For large models, training time is already an issue even for GPUs with
saturated input pipelines. Thus, we need to enable prefetching of mini-batches
that runs in parallel to the training loop. One possibility would be to create
an input queue that is fed from a prefetch thread, and that then feeds the
training loop.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)