johnyangk commented on issue #223: [NEMO-388] Off-heap memory management (reuse ByteBuffer) URL: https://github.com/apache/incubator-nemo/pull/223#issuecomment-515351226 @hy00nc I did some investigations on DirectByteBufferOutputStream#release. Please let me know if this makes sense. My suggestion: - Keep DirectByteBufferOutputStream#close empty - Remove DirectByteBufferOutputStream#release - Change from DirectByteBufferOutputStream#getDirectByteBufferList to DirectByteBufferOutputStream#getMemoryChunkList - In SerializedPartition#release, replace DirectByteBufferOutputStream#release with the direct invocation of memoryPoolAssigner.returnChunksToPool(dataList), and remove the offheap flag if possible. (Also using the "!committed" checker here to ensure that dataList is available would be nice) - In DirectByteBufferOutputStreamTest, remove all invocations of DirectByteBufferOutputStream#release, and create a separate test for MemoryPoolAssigner instead if needed ================================ Here is why. ByteArrayOutputStream, which is part of the Java API, has the following methods - toByteArray(): returns a copy of the inner byte[] - close(): does nothing (empty) I think the common pattern for using this class is: 1. instantiate a new object as a local variable, and write 2. toByteArray(): obtain a copied byte[] 3. close() (explicitly or by try-with-resources) 4. the local variable goes out of scope (the inner byte[] is garbage collected) 5. The copied byte[] is garbage collected independently (perhaps after inner byte[] is garbage collected) I think it is good to use DirectByteBufferOutputStream similarly, although our class shares the same underlying byte[] rather than copying it. This is how we're using this class at the moment. 1. Instantiate a new object as a class-level variable, and write 2. getDirectByteBufferList(): obtain ByteBufferList that points to the inner byte[] 3. close() (explicitly or by try-with-resources) 4. We now have two pointers to the same data: (1) the DirectByteBufferOutputStream variable, and (2) ByteBufferList 5. release() on the stream 6. Both pointers now point to a 'cleared' byte[] My issues with this pattern are: - ByteArrayOutputStream does not provide an API similar to release() - The relationship between close() and release() is a little confusing - There exist two methods that do the same thing: through releasing the stream or through directly releasing ByteBufferList - It may be better to isolate before-commit and after-commit behaviors in SerializedPartition Hence the above suggestion. 😄
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
