I agree that this functionality should be deeply revised on migration to
Ignite, because of the issues you already mentioned.

After reading your email I think that your proposal seems to be based on
assumption that streamer should always be used with windows or data have to
be persisted. I think it may not be so sometimes. It seems that better
concept would be - stages that need windowing, querying or persistence may
be backed by cache and we will provide all the necessary facilities. Event
routing, in my understanding, should stay pretty much the same with what we
have now.

As far as integration with other products, give me a couple of days to
think it over.

Thanks!
--
Yakov Zhdanov

2015-01-08 20:50 GMT+03:00 Dmitriy Setrakyan <[email protected]>:

> I would like to start a discussion for a new Streaming design in Ignite. In
> this email I would like to gather all  opinions from the community, and
> then create a Jira ticket outlining main concepts.
>
> Currently Ignite Streaming works the following way:
>
>    - Users deploy multiple Streamers
>    - Data is added to Streamers via API calls
>    - Once data is added to Streamers, it passes through multiple
>    pre-configured Sliding Windows.
>    - Indexes for sliding windows are manually maintained through listener
>    callbacks.
>    - Users can then use Java-based predicate queries to query into Sliding
>    Windows.
>
> More information on Ignite Streaming can be found here:
> http://doc.gridgain.org/latest/Data+Streaming
>
> Here are the disadvantages of this approach:
>
>    - The biggest disadvantage is that the data does not end up in the
>    Ignite Cache and, instead, ends up in another construct, called Sliding
>    Windows.
>    - Another disadvantage is that there is no integration with well
>    established products, like Kafka or Storm.
>    - Also, usability is not ideal, as Indexes are manually maintained and
>    must be explicitly queried via API (as opposed to SQL, where indexes get
>    automatically utilized in the background).
>
> The new design I am proposing would support streaming of the data directly
> into Ignite caches. Caches already support data ingest, in many ways
> analogous to steaming, so we can reuse it to minimize the effort. The main
> advantage here is that we automatically get SQL querying and indexing
> capabilities into the streamed data, in addition to many other standard
> cache features available in Ignite.
>
> Here are the main design concepts:
>
>    - We already have IgniteDataLoader API which allows to ingest streaming
>    data into Ignite Caches. We can create wrappers for it to ingest data
> from
>    different systems, like Kafka, Storm, etc... Basically we will have
>    KafkaDataLoader, StormDataLoader, or even plain SocketDataLoader.
>
>    - Sliding Windows would be handled via eviction policies in caches. We
>    already have FIFO based eviction policy (equivalent to the Size-based
>    sliding window), and we can add batch-based FIFO, Time-based FIFO, and
>    Time-batch-based FIFO  eviction policies in order to support all the
>    sliding windows we support today.
>
> In my view this new approach is more natural and is simpler to use, since
> it will utilize standard Ignite Cache APIs for direct data access, and
> standard SQL for data querying.
>
> Thoughts?
>
> D.
>

Reply via email to