[
https://issues.apache.org/jira/browse/SINGA-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231526#comment-15231526
]
ASF subversion and git services commented on SINGA-130:
-------------------------------------------------------
Commit a0bdd0b85ddba7d670ab04c5de04a29c8366e868 in incubator-singa's branch
refs/heads/master from [~ug93tad]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=a0bdd0b ]
SINGA-130 Data prefetching layer
Extended StoreInputLayer to support prefetching of data. It maintains a buffer
for (key,value) pairs read from the storage
layer. In Setup(), it launches a new thread for reading data into the buffer.
This thread stores data into the buffer. The
ComputeFeature() method waits for thread to finish (join) before parsing it
into data_ and aux_ field. Finally, it launches
another thread.
In terms of memory consumption, this prefetching use extra
(batchsize*recordsize) bytes for the buffer. However, we observe
no visible runtime improvement, as I/O time is very small (in order of
milliseconds without prefetching, and tens of microsecond
with prefetching) compared to CPU time.
> Implement a layer subclass for data prefetching
> -----------------------------------------------
>
> Key: SINGA-130
> URL: https://issues.apache.org/jira/browse/SINGA-130
> Project: Singa
> Issue Type: New Feature
> Reporter: wangwei
> Assignee: Anh Dinh
> Labels: data, multi-threading, prefetch
>
> Data prefetching is important for training with GPU, because the IO would
> become the bottleneck when the computation is very fast.
> One idea is to create a general prefetch layer which embeds the application
> specific data loading layers.
> {code}
> PrefetchLayer::ComptueFeature() {
> wait until the pretch thread finishes.
> swap the prefeth_data_ and data_ blobs.
> if (first time)
> load data into data_ blobs
> spawn a new thread to call functions from data loading layers for loading
> data into prefetch_data_.
> }
> {code}
>
> If the prefetch layer has multiple loading layers and is connected to
> multiple destination layers, then different destination layer may want data
> loaded by different loading layers. This case should be handled properly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)