Hi,
streaming currently does not use any memory manager. All state is kept
in Java Objects on the Java Heap, for example an ArrayList<> for the
window buffer.

On Thu, May 21, 2015 at 11:56 PM, Henry Saputra <henry.sapu...@gmail.com> wrote:
> Hi Stephan, Gyula, Paris,
>
> How does streaming currently different in term of memory management?
> Currently we only have one MemoryManager which is used by both modes I
> believe.
>
> - Henry
>
> On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <se...@apache.org> wrote:
>> I discussed a bit via Skype with Gyula and Paris.
>>
>>
>> We thought about the following way to do it:
>>
>>  - We add a dedicated streaming mode for now. The streaming mode supersedes
>> the batch mode, so it can run both type of programs.
>>
>>  - The streaming mode sets the memory manager to "lazy allocation".
>>     -> So long as it runs pure streaming jobs, the full heap will be
>> available to window buffers and UDFs.
>>     -> Batch programs can still run, so mixed workloads are not prevented.
>> Batch programs are a bit less robust there, because the memory manager does
>> not pre-allocate memory. UDFs can eat into Flink's memory portion.
>>
>>  - The streaming mode starts the necessary configured components/services
>> for state backups
>>
>>
>>
>> Over the next versions, we want to bring these things together:
>>   - use the managed memory for window buffers
>>   - on-demand starting of the state backend
>>
>> Then, we deprecate the streaming mode, let both modes start the cluster in
>> the same way.
>>
>>
>>
>>
>>
>> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org>
>> wrote:
>>
>>> Would it not be possible to start the snapshot service once the user
>>> starts the first streaming job? About 2) with checkpointing coming up,
>>> would it not make sense to shift to managed memory rather sooner than
>>> later. Then this point would become moot.
>>>
>>> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax
>>> <mj...@informatik.hu-berlin.de> wrote:
>>> > What would be the consequences on "mixed" programs? (If there is any
>>> > plan to support those?)
>>> >
>>> > Would it be necessary to have a third mode? Or would those programs
>>> > simple run in streaming mode?
>>> >
>>> > -Matthias
>>> >
>>> > On 05/21/2015 03:12 PM, Stephan Ewen wrote:
>>> >> Hi all!
>>> >>
>>> >> We discussed a while back about introducing a dedicated streaming mode
>>> for
>>> >> Flink. I would like to take a go at this and implement the changes, but
>>> >> discuss them before.
>>> >>
>>> >>
>>> >> Here is a brief summary why we wanted to introduce the dedicated
>>> streaming
>>> >> mode:
>>> >> Even though both batch and streaming are executed by the same execution
>>> >> engine,
>>> >> a streaming setup of Flink varies a bit from a batch setup:
>>> >>
>>> >> 1) The streaming cluster starts an additional service to store the
>>> >> distributed state snapshots.
>>> >>
>>> >> 2) Streaming mode uses memory a bit different, so we should configure
>>> the
>>> >> memory manager differently. This difference may eventually go away.
>>> >>
>>> >>
>>> >>
>>> >> Concretely, to implement this, I was thinking about introducing the
>>> >> following externally visible changes
>>> >>
>>> >>  - Additional scripts "start-streaming-cluster.sh" and
>>> >> "start-streaming-local.sh"
>>> >>
>>> >>  - An execution mode parameter for the TaskManager ("batch / streaming")
>>> >>
>>> >>  - An execution mode parameter for the JobManager TaskManager ("batch /
>>> >> streaming")
>>> >>
>>> >>  - All local executors and mini clusters need a flag that specifies
>>> whether
>>> >> they will start
>>> >>    a streaming cluster, or a pure batch cluster.
>>> >>
>>> >>
>>> >> Anything else that comes to your minds?
>>> >>
>>> >>
>>> >> Greetings,
>>> >> Stephan
>>> >>
>>> >
>>>

Reply via email to