[ 
https://issues.apache.org/jira/browse/IGNITE-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Dmitriev updated IGNITE-8670:
-----------------------------------
    Description: 
 

*What is the goal?*

TensorFlow on Apache Ignite should consists of three major components: _Ignite 
Dataset_ that provides an ability to feed training data from Apache Ignite, 
_IGFS Plugin_ that allows to use Apache Ignite File System for checkpointing 
and communication with TensorBoard, and _Distributed Training_ that makes it 
possible to run model training instantly inside Apache Ignite cluster to 
minimize data transfers and provide so called Zero ETL.

 

*Ignite Dataset*

Ignite Dataset represents an integration between Apache Ignite and TensorFlow 
that allows to use Apache Ignite as a data source for neural network training, 
inference and all other computations supported by TensorFlow. Using of Ignite 
Dataset has a lot of advantages, just a few of them: TensorFlow gets a fast 
access to distributed database that can contain training data and data for 
inference; objects feeded by Ignite Dataset can have any structure thus all 
preprocessing can be done in TensorFlow pipeline; SSL, Windows and distributed 
training are also supported. 

For now Ignite Dataset is a part of TensorFlow, so you don’t need to install 
any third-party packages and you can use it out of the box. The integration is 
based on [tf.data|https://www.tensorflow.org/api_docs/python/tf/data] from 
TensorFlow side and [Binary Client 
Protocol|https://apacheignite.readme.io/v2.6/docs/binary-client-protocol] from 
Apache Ignite side.

 

*IGFS Plugin*

In addition to database functionality Apache Ignite provides a distributed file 
system called [IGFS|https://ignite.apache.org/features/igfs.html]. IGFS 
delivers a similar functionality to Hadoop HDFS, but only in-memory. IGFS 
Plugin for TensorFlow allows to use IGFS for checkpointing (for reliability and 
fault-tolerance) and for communication with TensorBoard (even when TensorBoard 
runs in a different process or machine).

For now IGFS Plugin is a part of TensorFlow, so you don’t need to install any 
third-party packages and you can use it out of the box. The integration is 
based on [custom filesystem 
plugin|https://www.tensorflow.org/extend/add_filesys] from TensorFlow side and 
[IGFS Native API|https://ignite.apache.org/features/igfs.html] from Apache 
Ignite side.

 

*Distributed Training*

Distributed training allows to utilize computational resources of the whole 
cluster and thus speed up training of deep learning model. TensorFlow is a 
machine learning framework that [natively 
supports|https://www.tensorflow.org/deploy/distributed] distributed neural 
network training, inference and other computations.

Distributed Training in TensorFlow on Apache Ignite is based on [standalone 
client 
mode|https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute#standalone-client-mode]
 of distributed multi-worker training. Standalone client mode assumes that we 
have a cluster of workers with started TensorFlow servers and we have a client 
that actually contains model code. When the client calls 
tf.estimator.train_and_evaluate TensorFlow uses specified distribution strategy 
to distribute computations across workers so that most computationally 
intensive part performs on workers.

> Umbrella: TensorFlow integration
> --------------------------------
>
>                 Key: IGNITE-8670
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8670
>             Project: Ignite
>          Issue Type: New Feature
>          Components: ml
>            Reporter: Yury Babak
>            Assignee: Yury Babak
>            Priority: Major
>             Fix For: 2.7
>
>
>  
> *What is the goal?*
> TensorFlow on Apache Ignite should consists of three major components: 
> _Ignite Dataset_ that provides an ability to feed training data from Apache 
> Ignite, _IGFS Plugin_ that allows to use Apache Ignite File System for 
> checkpointing and communication with TensorBoard, and _Distributed Training_ 
> that makes it possible to run model training instantly inside Apache Ignite 
> cluster to minimize data transfers and provide so called Zero ETL.
>  
> *Ignite Dataset*
> Ignite Dataset represents an integration between Apache Ignite and TensorFlow 
> that allows to use Apache Ignite as a data source for neural network 
> training, inference and all other computations supported by TensorFlow. Using 
> of Ignite Dataset has a lot of advantages, just a few of them: TensorFlow 
> gets a fast access to distributed database that can contain training data and 
> data for inference; objects feeded by Ignite Dataset can have any structure 
> thus all preprocessing can be done in TensorFlow pipeline; SSL, Windows and 
> distributed training are also supported. 
> For now Ignite Dataset is a part of TensorFlow, so you don’t need to install 
> any third-party packages and you can use it out of the box. The integration 
> is based on [tf.data|https://www.tensorflow.org/api_docs/python/tf/data] from 
> TensorFlow side and [Binary Client 
> Protocol|https://apacheignite.readme.io/v2.6/docs/binary-client-protocol] 
> from Apache Ignite side.
>  
> *IGFS Plugin*
> In addition to database functionality Apache Ignite provides a distributed 
> file system called [IGFS|https://ignite.apache.org/features/igfs.html]. IGFS 
> delivers a similar functionality to Hadoop HDFS, but only in-memory. IGFS 
> Plugin for TensorFlow allows to use IGFS for checkpointing (for reliability 
> and fault-tolerance) and for communication with TensorBoard (even when 
> TensorBoard runs in a different process or machine).
> For now IGFS Plugin is a part of TensorFlow, so you don’t need to install any 
> third-party packages and you can use it out of the box. The integration is 
> based on [custom filesystem 
> plugin|https://www.tensorflow.org/extend/add_filesys] from TensorFlow side 
> and [IGFS Native API|https://ignite.apache.org/features/igfs.html] from 
> Apache Ignite side.
>  
> *Distributed Training*
> Distributed training allows to utilize computational resources of the whole 
> cluster and thus speed up training of deep learning model. TensorFlow is a 
> machine learning framework that [natively 
> supports|https://www.tensorflow.org/deploy/distributed] distributed neural 
> network training, inference and other computations.
> Distributed Training in TensorFlow on Apache Ignite is based on [standalone 
> client 
> mode|https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute#standalone-client-mode]
>  of distributed multi-worker training. Standalone client mode assumes that we 
> have a cluster of workers with started TensorFlow servers and we have a 
> client that actually contains model code. When the client calls 
> tf.estimator.train_and_evaluate TensorFlow uses specified distribution 
> strategy to distribute computations across workers so that most 
> computationally intensive part performs on workers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to