Hi John, Yun,

Thank you for your feedback

@John

> It seems like operators would either choose isolation for the cluster’s
jobs
> or they would want to share the memory between jobs.
> I’m not sure I see the motivation to reserve only part of the memory for
sharing
> and allowing jobs to choose whether they will share or be isolated.

I see two related questions here:

1) Whether to allow mixed workloads within the same cluster.
I agree that most likely all the jobs will have the same "sharing"
requirement.
So we can drop "state.backend.memory.share-scope" from the proposal.

2) Whether to allow different memory consumers to use shared or exclusive
memory.
Currently, only RocksDB is proposed to use shared memory. For python, it's
non-trivial because it is job-specific.
So we have to partition managed memory into shared/exclusive and therefore
can NOT replace "taskmanager.memory.managed.shared-fraction" with some
boolean flag.

I think your question was about (1), just wanted to clarify why the
shared-fraction is needed.

@Yun

> I am just curious whether this could really bring benefits to our users
with such complex configuration logic.
I agree, and configuration complexity seems a common concern.
I hope that removing "state.backend.memory.share-scope" (as proposed above)
reduces the complexity.
Please share any ideas of how to simplify it further.

> Could you share some real experimental results?
I did an experiment to verify that the approach is feasible,
i.e. multilple jobs can share the same memory/block cache.
But I guess that's not what you mean here? Do you have any experiments in
mind?

> BTW, as talked before, I am not sure whether different lifecycles of
RocksDB state-backends
> would affect the memory usage of block cache & write buffer manager in
RocksDB.
> Currently, all instances would start and destroy nearly simultaneously,
> this would change after we introduce this feature with jobs running at
different scheduler times.
IIUC, the concern is that closing a RocksDB instance might close the
BlockCache.
I checked that manually and it seems to work as expected.
And I think that would contradict the sharing concept, as described in the
documentation [1].

[1]
https://github.com/facebook/rocksdb/wiki/Block-Cache

Regards,
Roman


On Wed, Nov 9, 2022 at 3:50 AM Yanfei Lei <fredia...@gmail.com> wrote:

> Hi Roman,
> Thanks for the proposal, this allows State Backend to make better use of
> memory.
>
> After reading the ticket, I'm curious about some points:
>
> 1. Is shared-memory only for the state backend? If both
> "taskmanager.memory.managed.shared-fraction: >0" and
> "state.backend.rocksdb.memory.managed: false" are set at the same time,
> will the shared-memory be wasted?
> 2. It's said that "Jobs 4 and 5 will use the same 750Mb of unmanaged memory
> and will compete with each other" in the example, how is the memory size of
> unmanaged part calculated?
> 3. For fine-grained-resource-management, the control
> of cpuCores, taskHeapMemory can still work, right?  And I am a little
> worried that too many memory-about configuration options are complicated
> for users to understand.
>
> Regards,
> Yanfei
>
> Roman Khachatryan <ro...@apache.org> 于2022年11月8日周二 23:22写道:
>
> > Hi everyone,
> >
> > I'd like to discuss sharing RocksDB memory across slots as proposed in
> > FLINK-29928 [1].
> >
> > Since 1.10 / FLINK-7289 [2], it is possible to:
> > - share these objects among RocksDB instances of the same slot
> > - bound the total memory usage by all RocksDB instances of a TM
> >
> > However, the memory is divided between the slots equally (unless using
> > fine-grained resource control). This is sub-optimal if some slots contain
> > more memory intensive tasks than the others.
> > Using fine-grained resource control is also often not an option because
> the
> > workload might not be known in advance.
> >
> > The proposal is to widen the scope of sharing memory to TM, so that it
> can
> > be shared across all RocksDB instances of that TM. That would reduce the
> > overall memory consumption in exchange for resource isolation.
> >
> > Please see FLINK-29928 [1] for more details.
> >
> > Looking forward to feedback on that proposal.
> >
> > [1]
> > https://issues.apache.org/jira/browse/FLINK-29928
> > [2]
> > https://issues.apache.org/jira/browse/FLINK-7289
> >
> > Regards,
> > Roman
> >
>

Reply via email to