I would like to start a discussion for a new Streaming design in Ignite. In
this email I would like to gather all  opinions from the community, and
then create a Jira ticket outlining main concepts.

Currently Ignite Streaming works the following way:

   - Users deploy multiple Streamers
   - Data is added to Streamers via API calls
   - Once data is added to Streamers, it passes through multiple
   pre-configured Sliding Windows.
   - Indexes for sliding windows are manually maintained through listener
   callbacks.
   - Users can then use Java-based predicate queries to query into Sliding
   Windows.

More information on Ignite Streaming can be found here:
http://doc.gridgain.org/latest/Data+Streaming

Here are the disadvantages of this approach:

   - The biggest disadvantage is that the data does not end up in the
   Ignite Cache and, instead, ends up in another construct, called Sliding
   Windows.
   - Another disadvantage is that there is no integration with well
   established products, like Kafka or Storm.
   - Also, usability is not ideal, as Indexes are manually maintained and
   must be explicitly queried via API (as opposed to SQL, where indexes get
   automatically utilized in the background).

The new design I am proposing would support streaming of the data directly
into Ignite caches. Caches already support data ingest, in many ways
analogous to steaming, so we can reuse it to minimize the effort. The main
advantage here is that we automatically get SQL querying and indexing
capabilities into the streamed data, in addition to many other standard
cache features available in Ignite.

Here are the main design concepts:

   - We already have IgniteDataLoader API which allows to ingest streaming
   data into Ignite Caches. We can create wrappers for it to ingest data from
   different systems, like Kafka, Storm, etc... Basically we will have
   KafkaDataLoader, StormDataLoader, or even plain SocketDataLoader.

   - Sliding Windows would be handled via eviction policies in caches. We
   already have FIFO based eviction policy (equivalent to the Size-based
   sliding window), and we can add batch-based FIFO, Time-based FIFO, and
   Time-batch-based FIFO  eviction policies in order to support all the
   sliding windows we support today.

In my view this new approach is more natural and is simpler to use, since
it will utilize standard Ignite Cache APIs for direct data access, and
standard SQL for data querying.

Thoughts?

D.

Reply via email to