Hi Stephan, Gyula, Paris,

How does streaming currently different in term of memory management?
Currently we only have one MemoryManager which is used by both modes I
believe.

- Henry

On Thu, May 21, 2015 at 12:34 PM, Stephan Ewen <se...@apache.org> wrote:
> I discussed a bit via Skype with Gyula and Paris.
>
>
> We thought about the following way to do it:
>
>  - We add a dedicated streaming mode for now. The streaming mode supersedes
> the batch mode, so it can run both type of programs.
>
>  - The streaming mode sets the memory manager to "lazy allocation".
>     -> So long as it runs pure streaming jobs, the full heap will be
> available to window buffers and UDFs.
>     -> Batch programs can still run, so mixed workloads are not prevented.
> Batch programs are a bit less robust there, because the memory manager does
> not pre-allocate memory. UDFs can eat into Flink's memory portion.
>
>  - The streaming mode starts the necessary configured components/services
> for state backups
>
>
>
> Over the next versions, we want to bring these things together:
>   - use the managed memory for window buffers
>   - on-demand starting of the state backend
>
> Then, we deprecate the streaming mode, let both modes start the cluster in
> the same way.
>
>
>
>
>
> On Thu, May 21, 2015 at 4:01 PM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> Would it not be possible to start the snapshot service once the user
>> starts the first streaming job? About 2) with checkpointing coming up,
>> would it not make sense to shift to managed memory rather sooner than
>> later. Then this point would become moot.
>>
>> On Thu, May 21, 2015 at 3:47 PM, Matthias J. Sax
>> <mj...@informatik.hu-berlin.de> wrote:
>> > What would be the consequences on "mixed" programs? (If there is any
>> > plan to support those?)
>> >
>> > Would it be necessary to have a third mode? Or would those programs
>> > simple run in streaming mode?
>> >
>> > -Matthias
>> >
>> > On 05/21/2015 03:12 PM, Stephan Ewen wrote:
>> >> Hi all!
>> >>
>> >> We discussed a while back about introducing a dedicated streaming mode
>> for
>> >> Flink. I would like to take a go at this and implement the changes, but
>> >> discuss them before.
>> >>
>> >>
>> >> Here is a brief summary why we wanted to introduce the dedicated
>> streaming
>> >> mode:
>> >> Even though both batch and streaming are executed by the same execution
>> >> engine,
>> >> a streaming setup of Flink varies a bit from a batch setup:
>> >>
>> >> 1) The streaming cluster starts an additional service to store the
>> >> distributed state snapshots.
>> >>
>> >> 2) Streaming mode uses memory a bit different, so we should configure
>> the
>> >> memory manager differently. This difference may eventually go away.
>> >>
>> >>
>> >>
>> >> Concretely, to implement this, I was thinking about introducing the
>> >> following externally visible changes
>> >>
>> >>  - Additional scripts "start-streaming-cluster.sh" and
>> >> "start-streaming-local.sh"
>> >>
>> >>  - An execution mode parameter for the TaskManager ("batch / streaming")
>> >>
>> >>  - An execution mode parameter for the JobManager TaskManager ("batch /
>> >> streaming")
>> >>
>> >>  - All local executors and mini clusters need a flag that specifies
>> whether
>> >> they will start
>> >>    a streaming cluster, or a pure batch cluster.
>> >>
>> >>
>> >> Anything else that comes to your minds?
>> >>
>> >>
>> >> Greetings,
>> >> Stephan
>> >>
>> >
>>

Reply via email to