Hi Stephan, Gyula, Paris, How does streaming currently different in term of memory management? Currently we only have one MemoryManager which is used by both modes I believe.
- Henry On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <se...@apache.org> wrote: > I discussed a bit via Skype with Gyula and Paris. > > > We thought about the following way to do it: > > - We add a dedicated streaming mode for now. The streaming mode supersedes > the batch mode, so it can run both type of programs. > > - The streaming mode sets the memory manager to "lazy allocation". > -> So long as it runs pure streaming jobs, the full heap will be > available to window buffers and UDFs. > -> Batch programs can still run, so mixed workloads are not prevented. > Batch programs are a bit less robust there, because the memory manager does > not pre-allocate memory. UDFs can eat into Flink's memory portion. > > - The streaming mode starts the necessary configured components/services > for state backups > > > > Over the next versions, we want to bring these things together: > - use the managed memory for window buffers > - on-demand starting of the state backend > > Then, we deprecate the streaming mode, let both modes start the cluster in > the same way. > > > > > > On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org> > wrote: > >> Would it not be possible to start the snapshot service once the user >> starts the first streaming job? About 2) with checkpointing coming up, >> would it not make sense to shift to managed memory rather sooner than >> later. Then this point would become moot. >> >> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax >> <mj...@informatik.hu-berlin.de> wrote: >> > What would be the consequences on "mixed" programs? (If there is any >> > plan to support those?) >> > >> > Would it be necessary to have a third mode? Or would those programs >> > simple run in streaming mode? >> > >> > -Matthias >> > >> > On 05/21/2015 03:12 PM, Stephan Ewen wrote: >> >> Hi all! >> >> >> >> We discussed a while back about introducing a dedicated streaming mode >> for >> >> Flink. I would like to take a go at this and implement the changes, but >> >> discuss them before. >> >> >> >> >> >> Here is a brief summary why we wanted to introduce the dedicated >> streaming >> >> mode: >> >> Even though both batch and streaming are executed by the same execution >> >> engine, >> >> a streaming setup of Flink varies a bit from a batch setup: >> >> >> >> 1) The streaming cluster starts an additional service to store the >> >> distributed state snapshots. >> >> >> >> 2) Streaming mode uses memory a bit different, so we should configure >> the >> >> memory manager differently. This difference may eventually go away. >> >> >> >> >> >> >> >> Concretely, to implement this, I was thinking about introducing the >> >> following externally visible changes >> >> >> >> - Additional scripts "start-streaming-cluster.sh" and >> >> "start-streaming-local.sh" >> >> >> >> - An execution mode parameter for the TaskManager ("batch / streaming") >> >> >> >> - An execution mode parameter for the JobManager TaskManager ("batch / >> >> streaming") >> >> >> >> - All local executors and mini clusters need a flag that specifies >> whether >> >> they will start >> >> a streaming cluster, or a pure batch cluster. >> >> >> >> >> >> Anything else that comes to your minds? >> >> >> >> >> >> Greetings, >> >> Stephan >> >> >> > >>