On Fri, Jan 9, 2015 at 6:46 AM, Yakov Zhdanov <[email protected]> wrote:
> I agree that this functionality should be deeply revised on migration to > Ignite, because of the issues you already mentioned. > > After reading your email I think that your proposal seems to be based on > assumption that streamer should always be used with windows or data have to > be persisted. I think it may not be so sometimes. It seems that better > concept would be - stages that need windowing, querying or persistence may > be backed by cache and we will provide all the necessary facilities. Event > routing, in my understanding, should stay pretty much the same with what we > have now. > I actually believe that in the spirit of ease-of-use and simplification, we should remove explicit stage declaration altogether from our streaming design. The way to support stages in Ignite would be to launch a new compute task or query every time new processing needs to be done. As far as event routing is concerned, we will be able to use stranded key-affinity-based routing. > > As far as integration with other products, give me a couple of days to > think it over. > Sounds good. > > Thanks! > -- > Yakov Zhdanov > > 2015-01-08 20:50 GMT+03:00 Dmitriy Setrakyan <[email protected]>: > > > I would like to start a discussion for a new Streaming design in Ignite. > In > > this email I would like to gather all opinions from the community, and > > then create a Jira ticket outlining main concepts. > > > > Currently Ignite Streaming works the following way: > > > > - Users deploy multiple Streamers > > - Data is added to Streamers via API calls > > - Once data is added to Streamers, it passes through multiple > > pre-configured Sliding Windows. > > - Indexes for sliding windows are manually maintained through listener > > callbacks. > > - Users can then use Java-based predicate queries to query into > Sliding > > Windows. > > > > More information on Ignite Streaming can be found here: > > http://doc.gridgain.org/latest/Data+Streaming > > > > Here are the disadvantages of this approach: > > > > - The biggest disadvantage is that the data does not end up in the > > Ignite Cache and, instead, ends up in another construct, called > Sliding > > Windows. > > - Another disadvantage is that there is no integration with well > > established products, like Kafka or Storm. > > - Also, usability is not ideal, as Indexes are manually maintained and > > must be explicitly queried via API (as opposed to SQL, where indexes > get > > automatically utilized in the background). > > > > The new design I am proposing would support streaming of the data > directly > > into Ignite caches. Caches already support data ingest, in many ways > > analogous to steaming, so we can reuse it to minimize the effort. The > main > > advantage here is that we automatically get SQL querying and indexing > > capabilities into the streamed data, in addition to many other standard > > cache features available in Ignite. > > > > Here are the main design concepts: > > > > - We already have IgniteDataLoader API which allows to ingest > streaming > > data into Ignite Caches. We can create wrappers for it to ingest data > > from > > different systems, like Kafka, Storm, etc... Basically we will have > > KafkaDataLoader, StormDataLoader, or even plain SocketDataLoader. > > > > - Sliding Windows would be handled via eviction policies in caches. We > > already have FIFO based eviction policy (equivalent to the Size-based > > sliding window), and we can add batch-based FIFO, Time-based FIFO, and > > Time-batch-based FIFO eviction policies in order to support all the > > sliding windows we support today. > > > > In my view this new approach is more natural and is simpler to use, since > > it will utilize standard Ignite Cache APIs for direct data access, and > > standard SQL for data querying. > > > > Thoughts? > > > > D. > > >
