[
https://issues.apache.org/jira/browse/IGNITE-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Dmitriev updated IGNITE-8670:
-----------------------------------
Description:
*What is the goal?*
TensorFlow on Apache Ignite should consists of three major components: _Ignite
Dataset_ that provides an ability to feed training data from Apache Ignite,
_IGFS Plugin_ that allows to use Apache Ignite File System for checkpointing
and communication with TensorBoard, and _Distributed Training_ that makes it
possible to run model training instantly inside Apache Ignite cluster to
minimize data transfers and provide so called Zero ETL.
*Ignite Dataset*
Ignite Dataset represents an integration between Apache Ignite and TensorFlow
that allows to use Apache Ignite as a data source for neural network training,
inference and all other computations supported by TensorFlow. Using of Ignite
Dataset has a lot of advantages, just a few of them: TensorFlow gets a fast
access to distributed database that can contain training data and data for
inference; objects feeded by Ignite Dataset can have any structure thus all
preprocessing can be done in TensorFlow pipeline; SSL, Windows and distributed
training are also supported.
For now Ignite Dataset is a part of TensorFlow, so you don’t need to install
any third-party packages and you can use it out of the box. The integration is
based on [tf.data|https://www.tensorflow.org/api_docs/python/tf/data] from
TensorFlow side and [Binary Client
Protocol|https://apacheignite.readme.io/v2.6/docs/binary-client-protocol] from
Apache Ignite side.
*IGFS Plugin*
In addition to database functionality Apache Ignite provides a distributed file
system called [IGFS|https://ignite.apache.org/features/igfs.html]. IGFS
delivers a similar functionality to Hadoop HDFS, but only in-memory. IGFS
Plugin for TensorFlow allows to use IGFS for checkpointing (for reliability and
fault-tolerance) and for communication with TensorBoard (even when TensorBoard
runs in a different process or machine).
For now IGFS Plugin is a part of TensorFlow, so you don’t need to install any
third-party packages and you can use it out of the box. The integration is
based on [custom filesystem
plugin|https://www.tensorflow.org/extend/add_filesys] from TensorFlow side and
[IGFS Native API|https://ignite.apache.org/features/igfs.html] from Apache
Ignite side.
*Distributed Training*
Distributed training allows to utilize computational resources of the whole
cluster and thus speed up training of deep learning model. TensorFlow is a
machine learning framework that [natively
supports|https://www.tensorflow.org/deploy/distributed] distributed neural
network training, inference and other computations.
Distributed Training in TensorFlow on Apache Ignite is based on [standalone
client
mode|https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute#standalone-client-mode]
of distributed multi-worker training. Standalone client mode assumes that we
have a cluster of workers with started TensorFlow servers and we have a client
that actually contains model code. When the client calls
tf.estimator.train_and_evaluate TensorFlow uses specified distribution strategy
to distribute computations across workers so that most computationally
intensive part performs on workers.
> Umbrella: TensorFlow integration
> --------------------------------
>
> Key: IGNITE-8670
> URL: https://issues.apache.org/jira/browse/IGNITE-8670
> Project: Ignite
> Issue Type: New Feature
> Components: ml
> Reporter: Yury Babak
> Assignee: Yury Babak
> Priority: Major
> Fix For: 2.7
>
>
>
> *What is the goal?*
> TensorFlow on Apache Ignite should consists of three major components:
> _Ignite Dataset_ that provides an ability to feed training data from Apache
> Ignite, _IGFS Plugin_ that allows to use Apache Ignite File System for
> checkpointing and communication with TensorBoard, and _Distributed Training_
> that makes it possible to run model training instantly inside Apache Ignite
> cluster to minimize data transfers and provide so called Zero ETL.
>
> *Ignite Dataset*
> Ignite Dataset represents an integration between Apache Ignite and TensorFlow
> that allows to use Apache Ignite as a data source for neural network
> training, inference and all other computations supported by TensorFlow. Using
> of Ignite Dataset has a lot of advantages, just a few of them: TensorFlow
> gets a fast access to distributed database that can contain training data and
> data for inference; objects feeded by Ignite Dataset can have any structure
> thus all preprocessing can be done in TensorFlow pipeline; SSL, Windows and
> distributed training are also supported.
> For now Ignite Dataset is a part of TensorFlow, so you don’t need to install
> any third-party packages and you can use it out of the box. The integration
> is based on [tf.data|https://www.tensorflow.org/api_docs/python/tf/data] from
> TensorFlow side and [Binary Client
> Protocol|https://apacheignite.readme.io/v2.6/docs/binary-client-protocol]
> from Apache Ignite side.
>
> *IGFS Plugin*
> In addition to database functionality Apache Ignite provides a distributed
> file system called [IGFS|https://ignite.apache.org/features/igfs.html]. IGFS
> delivers a similar functionality to Hadoop HDFS, but only in-memory. IGFS
> Plugin for TensorFlow allows to use IGFS for checkpointing (for reliability
> and fault-tolerance) and for communication with TensorBoard (even when
> TensorBoard runs in a different process or machine).
> For now IGFS Plugin is a part of TensorFlow, so you don’t need to install any
> third-party packages and you can use it out of the box. The integration is
> based on [custom filesystem
> plugin|https://www.tensorflow.org/extend/add_filesys] from TensorFlow side
> and [IGFS Native API|https://ignite.apache.org/features/igfs.html] from
> Apache Ignite side.
>
> *Distributed Training*
> Distributed training allows to utilize computational resources of the whole
> cluster and thus speed up training of deep learning model. TensorFlow is a
> machine learning framework that [natively
> supports|https://www.tensorflow.org/deploy/distributed] distributed neural
> network training, inference and other computations.
> Distributed Training in TensorFlow on Apache Ignite is based on [standalone
> client
> mode|https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute#standalone-client-mode]
> of distributed multi-worker training. Standalone client mode assumes that we
> have a cluster of workers with started TensorFlow servers and we have a
> client that actually contains model code. When the client calls
> tf.estimator.train_and_evaluate TensorFlow uses specified distribution
> strategy to distribute computations across workers so that most
> computationally intensive part performs on workers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)