I agree that this functionality should be deeply revised on migration to Ignite, because of the issues you already mentioned.
After reading your email I think that your proposal seems to be based on assumption that streamer should always be used with windows or data have to be persisted. I think it may not be so sometimes. It seems that better concept would be - stages that need windowing, querying or persistence may be backed by cache and we will provide all the necessary facilities. Event routing, in my understanding, should stay pretty much the same with what we have now. As far as integration with other products, give me a couple of days to think it over. Thanks! -- Yakov Zhdanov 2015-01-08 20:50 GMT+03:00 Dmitriy Setrakyan <[email protected]>: > I would like to start a discussion for a new Streaming design in Ignite. In > this email I would like to gather all opinions from the community, and > then create a Jira ticket outlining main concepts. > > Currently Ignite Streaming works the following way: > > - Users deploy multiple Streamers > - Data is added to Streamers via API calls > - Once data is added to Streamers, it passes through multiple > pre-configured Sliding Windows. > - Indexes for sliding windows are manually maintained through listener > callbacks. > - Users can then use Java-based predicate queries to query into Sliding > Windows. > > More information on Ignite Streaming can be found here: > http://doc.gridgain.org/latest/Data+Streaming > > Here are the disadvantages of this approach: > > - The biggest disadvantage is that the data does not end up in the > Ignite Cache and, instead, ends up in another construct, called Sliding > Windows. > - Another disadvantage is that there is no integration with well > established products, like Kafka or Storm. > - Also, usability is not ideal, as Indexes are manually maintained and > must be explicitly queried via API (as opposed to SQL, where indexes get > automatically utilized in the background). > > The new design I am proposing would support streaming of the data directly > into Ignite caches. Caches already support data ingest, in many ways > analogous to steaming, so we can reuse it to minimize the effort. The main > advantage here is that we automatically get SQL querying and indexing > capabilities into the streamed data, in addition to many other standard > cache features available in Ignite. > > Here are the main design concepts: > > - We already have IgniteDataLoader API which allows to ingest streaming > data into Ignite Caches. We can create wrappers for it to ingest data > from > different systems, like Kafka, Storm, etc... Basically we will have > KafkaDataLoader, StormDataLoader, or even plain SocketDataLoader. > > - Sliding Windows would be handled via eviction policies in caches. We > already have FIFO based eviction policy (equivalent to the Size-based > sliding window), and we can add batch-based FIFO, Time-based FIFO, and > Time-batch-based FIFO eviction policies in order to support all the > sliding windows we support today. > > In my view this new approach is more natural and is simpler to use, since > it will utilize standard Ignite Cache APIs for direct data access, and > standard SQL for data querying. > > Thoughts? > > D. >
