philipportner commented on a change in pull request #1147:
URL: https://github.com/apache/systemds/pull/1147#discussion_r559782106
##########
File path: scripts/staging/unified-memory-manager/umm.md
##########
@@ -0,0 +1,144 @@
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% end comment %}
+-->
+
+# Unified Memory Manager - Design Document
+
+| **Author(s)** | Philipp Ortner |
+:-------------- |:----------------------------------------------------|
+
+## Description
+This document describes the initial design of an Unified Memory Manager
proposed
+for SystemDS.
+
+## Design
+The Unified Memory Manager, henceforth UMM, will act as a manager for heap
memory
+provided by the JVM.
+
+The UMM has a certain (% of JVM heap) amount of memory which it can distribute
for operands
+and a buffer pool.
+
+Operands are the compiled programs variables which extends the base class
+`Data`, e.g., `MatrixObject` and `StringObject`.
+The buffer pool manages
+in memory representation of dirty objects that don't exist on the HDFS or
+other persistent memory. This is currently done by the `LazyWriteBuffer`.
Intermediates
+could be represented in this buffer.
+
+These two memory areas each will have a min and max amount of memory it can
+occupy, meaning that the boundary for the areas can shift dynamically depending
+on the current load.
+
+||min|max|
+| ------------- |:-------------:| -----:|
+| operations | 50% | 70% |
+| buffer pool | 15% | 30% |
+
+The UMM will utilise the existing `CacheBlock` structure to manage the buffer
+pool area while it will use the new `OperandBlock (?)` structure to keep track
of
Review comment:
I may be using the terminology wrong, I actually meant what you
described.
I looked through the code again and I think I'm missing something.
The `MatrixObject`, `FrameObject` and the `TensorObject` implement the
`CacheBlock` interface.
The `LazyWriteBuffer` itself seems to manage `ByteBuffer` instances, which
in turn also hold `CacheBlocks`, at least for dense data.
Could you please elaborate again on a few things I seem to not be 100% clear
about.
1) Which current classes do you mean when you talk about operation memory?
2) Which classes are we talking about when you mean intermediates / dirty
objects that are supposed to go into the buffer pool area?
I'll try to answer them myself, maybe you can correct me.
ad1) As mentioned in the design document, derivatives of `Data` like
`MatrixObject`, `FrameObject` and `TensorObject`.
ad2) Here I'm not sure, looking at the code I would say these are also
`CacheableData` objects, for example in the `ExecutionContext` or
`EstimatorDensityMap`, these are working with `MatrixBlocks`, which ultimately
is also a `CacheBlock`. Only dense data, e.g., `DenseBlock` instances do not
hold `CacheBlock`.
It kinda feels like I'm missing something, we said the second `Block` does
not exist currently but it looks like the `CacheBlock` would suffice. In that
case the split memory area would just be a logical differentiation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]