[ 
https://issues.apache.org/jira/browse/SYSTEMML-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513357#comment-16513357
 ] 

Matthias Boehm commented on SYSTEMML-2396:
------------------------------------------

In principle yes, but it seems that the description currently intermixes two 
things: (1) the order of batch slicing, and (2) interleaving of compute and 
slicing.
* Order of batch slicing: Currently we perform pull (blocking), slice, and 
compute. A simple approach to reduce the waiting time is to perform slice, pull 
(blocking), compute. If we would wait a while on pull this can hide the slice 
overhead.
* Interleaving: Additionally we could interleave computation and slicing of the 
next batch by using double buffering or in general a blocking queue for n 
batches (and yes with a dedicated prefetch thread).

While (1) is generally a good idea and does not introduce complexity, for (2) 
we need to see the experimental results because it would add complexity to the 
design. Please run a couple of local experiments with your new stats output and 
investigate the slicing of dense and sparse data.

> Batch pre-fetching per workers
> ------------------------------
>
>                 Key: SYSTEMML-2396
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2396
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> This task aims to improve the performance of workers. Currently, in each 
> iteration of mini-batch, we need to slice the matrix, execute the gradients 
> computation and then send them to the ps for updating the model. While the ps 
> is doing the aggregation work, the worker pauses due to waiting for the new 
> model. Hence the idea is to completely use this free slot to pre-fetch the 
> mini-batch in order to accelerate the future iteration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to