[
https://issues.apache.org/jira/browse/FLINK-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen closed FLINK-5782.
-------------------------------
Resolution: Unresolved
Assignee: (was: Kate Eri)
> Support GPU calculations
> ------------------------
>
> Key: FLINK-5782
> URL: https://issues.apache.org/jira/browse/FLINK-5782
> Project: Flink
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.3.0
> Reporter: Kate Eri
> Priority: Minor
>
> This ticket was initiated as continuation of the dev discussion thread: [New
> Flink team member - Kate Eri (Integration with DL4J
> topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser]
>
> Recently we have proposed the idea to integrate
> [Deeplearning4J|https://deeplearning4j.org/index.html] with Apache Flink.
> It is known that DL models training is resource demanding process, so
> training on CPU could converge much longer than on GPU.
> But not only for DL training GPU usage could be supposed, but also for
> optimization of graph analytics and other typical data manipulations, nice
> overview of GPU related problems is presented [Accelerating Spark workloads
> using
> GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus].
> Currently the community pointed the following issues to consider:
> 1) Flink would like to avoid to write one more time its own GPU support,
> to reduce engineering burden. That’s why such libraries like
> [ND4J|http://nd4j.org/userguide] should be considered.
> 2) Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to
> optimize linear algebra calculations, ND4J can’t be integrated as is, because
> it still doesn’t support [sparse arrays|http://nd4j.org/userguide#faq]. Maybe
> this issue should be simply contributed to ND4J to enable its usage?
> 3) The calculations would have to work with both available and not
> available GPUs. If the system detects that GPUs are available, then ideally
> it would exploit them. Thus GPU resource management could be incorporated in
> [FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only
> suggested).
> 4) It was mentioned that as far Flink takes care of shipping data around
> the cluster, also it will perform its dump out to GPU for calculation and
> load back up. In practice, the lack of a persist method for intermediate
> results makes this troublesome (not because of GPUs but for calculating any
> sort of complex algorithm we expect to be able to cache intermediate results).
> That’s why the Ticket
> [FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730] must be
> implemented to solve such problem.
> 5) Also it was recommended to take a look at Apache Mahout, at least to
> get the experience with GPU integration and check its
> https://github.com/apache/mahout/tree/master/viennacl-omp
> https://github.com/apache/mahout/tree/master/viennacl
> 6) For now, GPU proposed only for batch calculations optimization, to
> support GPU for streaming should be started another ticket, because
> optimization of streaming by GPU requires additional research.
> 7) Also experience of Netflix regarding this question could be considered:
> [Distributed Neural Networks with GPUs in the AWS
> Cloud|http://techblog.netflix.com/search/label/CUDA]
> This is considered as master ticket for GPU related ticktes
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)