[
https://issues.apache.org/jira/browse/IGNITE-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Ozerov updated IGNITE-8670:
------------------------------------
Fix Version/s: 2.7
> Umbrella: TensorFlow integration
> --------------------------------
>
> Key: IGNITE-8670
> URL: https://issues.apache.org/jira/browse/IGNITE-8670
> Project: Ignite
> Issue Type: New Feature
> Components: ml
> Reporter: Yury Babak
> Assignee: Yury Babak
> Priority: Major
> Fix For: 2.7
>
>
>
> *What is the goal?*
> TensorFlow on Apache Ignite should consists of three major components:
> _Ignite Dataset_ that provides an ability to feed training data from Apache
> Ignite, _IGFS Plugin_ that allows to use Apache Ignite File System for
> checkpointing and communication with TensorBoard, and _Distributed Training_
> that makes it possible to run model training instantly inside Apache Ignite
> cluster to minimize data transfers and provide so called Zero ETL.
>
> *Ignite Dataset*
> Ignite Dataset represents an integration between Apache Ignite and TensorFlow
> that allows to use Apache Ignite as a data source for neural network
> training, inference and all other computations supported by TensorFlow. Using
> of Ignite Dataset has a lot of advantages, just a few of them: TensorFlow
> gets a fast access to distributed database that can contain training data and
> data for inference; objects feeded by Ignite Dataset can have any structure
> thus all preprocessing can be done in TensorFlow pipeline; SSL, Windows and
> distributed training are also supported.
> For now Ignite Dataset is a part of TensorFlow, so you don’t need to install
> any third-party packages and you can use it out of the box. The integration
> is based on [tf.data|https://www.tensorflow.org/api_docs/python/tf/data] from
> TensorFlow side and [Binary Client
> Protocol|https://apacheignite.readme.io/v2.6/docs/binary-client-protocol]
> from Apache Ignite side.
>
> *IGFS Plugin*
> In addition to database functionality Apache Ignite provides a distributed
> file system called [IGFS|https://ignite.apache.org/features/igfs.html]. IGFS
> delivers a similar functionality to Hadoop HDFS, but only in-memory. IGFS
> Plugin for TensorFlow allows to use IGFS for checkpointing (for reliability
> and fault-tolerance) and for communication with TensorBoard (even when
> TensorBoard runs in a different process or machine).
> For now IGFS Plugin is a part of TensorFlow, so you don’t need to install any
> third-party packages and you can use it out of the box. The integration is
> based on [custom filesystem
> plugin|https://www.tensorflow.org/extend/add_filesys] from TensorFlow side
> and [IGFS Native API|https://ignite.apache.org/features/igfs.html] from
> Apache Ignite side.
>
> *Distributed Training*
> Distributed training allows to utilize computational resources of the whole
> cluster and thus speed up training of deep learning model. TensorFlow is a
> machine learning framework that [natively
> supports|https://www.tensorflow.org/deploy/distributed] distributed neural
> network training, inference and other computations.
> Distributed Training in TensorFlow on Apache Ignite is based on [standalone
> client
> mode|https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute#standalone-client-mode]
> of distributed multi-worker training. Standalone client mode assumes that we
> have a cluster of workers with started TensorFlow servers and we have a
> client that actually contains model code. When the client calls
> tf.estimator.train_and_evaluate TensorFlow uses specified distribution
> strategy to distribute computations across workers so that most
> computationally intensive part performs on workers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)