[
https://issues.apache.org/jira/browse/SYSTEMML-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513357#comment-16513357
]
Matthias Boehm commented on SYSTEMML-2396:
------------------------------------------
In principle yes, but it seems that the description currently intermixes two
things: (1) the order of batch slicing, and (2) interleaving of compute and
slicing.
* Order of batch slicing: Currently we perform pull (blocking), slice, and
compute. A simple approach to reduce the waiting time is to perform slice, pull
(blocking), compute. If we would wait a while on pull this can hide the slice
overhead.
* Interleaving: Additionally we could interleave computation and slicing of the
next batch by using double buffering or in general a blocking queue for n
batches (and yes with a dedicated prefetch thread).
While (1) is generally a good idea and does not introduce complexity, for (2)
we need to see the experimental results because it would add complexity to the
design. Please run a couple of local experiments with your new stats output and
investigate the slicing of dense and sparse data.
> Batch pre-fetching per workers
> ------------------------------
>
> Key: SYSTEMML-2396
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2396
> Project: SystemML
> Issue Type: Sub-task
> Reporter: LI Guobao
> Assignee: LI Guobao
> Priority: Major
>
> This task aims to improve the performance of workers. Currently, in each
> iteration of mini-batch, we need to slice the matrix, execute the gradients
> computation and then send them to the ps for updating the model. While the ps
> is doing the aggregation work, the worker pauses due to waiting for the new
> model. Hence the idea is to completely use this free slot to pre-fetch the
> mini-batch in order to accelerate the future iteration.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)