I would like to start a discussion for a new Streaming design in Ignite. In this email I would like to gather all opinions from the community, and then create a Jira ticket outlining main concepts.
Currently Ignite Streaming works the following way: - Users deploy multiple Streamers - Data is added to Streamers via API calls - Once data is added to Streamers, it passes through multiple pre-configured Sliding Windows. - Indexes for sliding windows are manually maintained through listener callbacks. - Users can then use Java-based predicate queries to query into Sliding Windows. More information on Ignite Streaming can be found here: http://doc.gridgain.org/latest/Data+Streaming Here are the disadvantages of this approach: - The biggest disadvantage is that the data does not end up in the Ignite Cache and, instead, ends up in another construct, called Sliding Windows. - Another disadvantage is that there is no integration with well established products, like Kafka or Storm. - Also, usability is not ideal, as Indexes are manually maintained and must be explicitly queried via API (as opposed to SQL, where indexes get automatically utilized in the background). The new design I am proposing would support streaming of the data directly into Ignite caches. Caches already support data ingest, in many ways analogous to steaming, so we can reuse it to minimize the effort. The main advantage here is that we automatically get SQL querying and indexing capabilities into the streamed data, in addition to many other standard cache features available in Ignite. Here are the main design concepts: - We already have IgniteDataLoader API which allows to ingest streaming data into Ignite Caches. We can create wrappers for it to ingest data from different systems, like Kafka, Storm, etc... Basically we will have KafkaDataLoader, StormDataLoader, or even plain SocketDataLoader. - Sliding Windows would be handled via eviction policies in caches. We already have FIFO based eviction policy (equivalent to the Size-based sliding window), and we can add batch-based FIFO, Time-based FIFO, and Time-batch-based FIFO eviction policies in order to support all the sliding windows we support today. In my view this new approach is more natural and is simpler to use, since it will utilize standard Ignite Cache APIs for direct data access, and standard SQL for data querying. Thoughts? D.
