Hi, streaming currently does not use any memory manager. All state is kept in Java Objects on the Java Heap, for example an ArrayList<> for the window buffer.
On Thu, May 21, 2015 at 11:56 PM, Henry Saputra <henry.sapu...@gmail.com> wrote: > Hi Stephan, Gyula, Paris, > > How does streaming currently different in term of memory management? > Currently we only have one MemoryManager which is used by both modes I > believe. > > - Henry > > On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <se...@apache.org> wrote: >> I discussed a bit via Skype with Gyula and Paris. >> >> >> We thought about the following way to do it: >> >> - We add a dedicated streaming mode for now. The streaming mode supersedes >> the batch mode, so it can run both type of programs. >> >> - The streaming mode sets the memory manager to "lazy allocation". >> -> So long as it runs pure streaming jobs, the full heap will be >> available to window buffers and UDFs. >> -> Batch programs can still run, so mixed workloads are not prevented. >> Batch programs are a bit less robust there, because the memory manager does >> not pre-allocate memory. UDFs can eat into Flink's memory portion. >> >> - The streaming mode starts the necessary configured components/services >> for state backups >> >> >> >> Over the next versions, we want to bring these things together: >> - use the managed memory for window buffers >> - on-demand starting of the state backend >> >> Then, we deprecate the streaming mode, let both modes start the cluster in >> the same way. >> >> >> >> >> >> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org> >> wrote: >> >>> Would it not be possible to start the snapshot service once the user >>> starts the first streaming job? About 2) with checkpointing coming up, >>> would it not make sense to shift to managed memory rather sooner than >>> later. Then this point would become moot. >>> >>> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax >>> <mj...@informatik.hu-berlin.de> wrote: >>> > What would be the consequences on "mixed" programs? (If there is any >>> > plan to support those?) >>> > >>> > Would it be necessary to have a third mode? Or would those programs >>> > simple run in streaming mode? >>> > >>> > -Matthias >>> > >>> > On 05/21/2015 03:12 PM, Stephan Ewen wrote: >>> >> Hi all! >>> >> >>> >> We discussed a while back about introducing a dedicated streaming mode >>> for >>> >> Flink. I would like to take a go at this and implement the changes, but >>> >> discuss them before. >>> >> >>> >> >>> >> Here is a brief summary why we wanted to introduce the dedicated >>> streaming >>> >> mode: >>> >> Even though both batch and streaming are executed by the same execution >>> >> engine, >>> >> a streaming setup of Flink varies a bit from a batch setup: >>> >> >>> >> 1) The streaming cluster starts an additional service to store the >>> >> distributed state snapshots. >>> >> >>> >> 2) Streaming mode uses memory a bit different, so we should configure >>> the >>> >> memory manager differently. This difference may eventually go away. >>> >> >>> >> >>> >> >>> >> Concretely, to implement this, I was thinking about introducing the >>> >> following externally visible changes >>> >> >>> >> - Additional scripts "start-streaming-cluster.sh" and >>> >> "start-streaming-local.sh" >>> >> >>> >> - An execution mode parameter for the TaskManager ("batch / streaming") >>> >> >>> >> - An execution mode parameter for the JobManager TaskManager ("batch / >>> >> streaming") >>> >> >>> >> - All local executors and mini clusters need a flag that specifies >>> whether >>> >> they will start >>> >> a streaming cluster, or a pure batch cluster. >>> >> >>> >> >>> >> Anything else that comes to your minds? >>> >> >>> >> >>> >> Greetings, >>> >> Stephan >>> >> >>> > >>>