johnyangk commented on issue #223: [NEMO-388] Off-heap memory management (reuse 
ByteBuffer)
URL: https://github.com/apache/incubator-nemo/pull/223#issuecomment-515351226
 
 
   @hy00nc I did some investigations on DirectByteBufferOutputStream#release. 
   
   Please let me know if this makes sense. My suggestion:
   - Keep DirectByteBufferOutputStream#close empty
   - Remove DirectByteBufferOutputStream#release
   - Change from DirectByteBufferOutputStream#getDirectByteBufferList to 
DirectByteBufferOutputStream#getMemoryChunkList
   - In SerializedPartition#release, replace 
DirectByteBufferOutputStream#release with the direct invocation of 
memoryPoolAssigner.returnChunksToPool(dataList), and remove the offheap flag if 
possible. (Also using the "!committed" checker here to ensure that dataList is 
available would be nice)
   - In DirectByteBufferOutputStreamTest, remove all invocations of 
DirectByteBufferOutputStream#release, and create a separate test for 
MemoryPoolAssigner instead if needed
   
   ================================
   Here is why.
   
   ByteArrayOutputStream, which is part of the Java API, has the following 
methods
   - toByteArray(): returns a copy of the inner byte[]
   - close(): does nothing (empty)
   
   I think the common pattern for using this class is:
   1. instantiate a new object as a local variable, and write
   2. toByteArray(): obtain a copied byte[]
   3. close() (explicitly or by try-with-resources)
   4. the local variable goes out of scope (the inner byte[] is garbage 
collected)
   5. The copied byte[] is garbage collected independently (perhaps after inner 
byte[] is garbage collected)
   
   I think it is good to use DirectByteBufferOutputStream similarly, although 
our class shares the same underlying byte[] rather than copying it. This is how 
we're using this class at the moment.
   1. Instantiate a new object as a class-level variable, and write
   2. getDirectByteBufferList(): obtain ByteBufferList that points to the inner 
byte[]
   3. close() (explicitly or by try-with-resources)
   4. We now have two pointers to the same data: (1) the 
DirectByteBufferOutputStream variable, and (2) ByteBufferList 
   5. release() on the stream
   6. Both pointers now point to a 'cleared' byte[]
   
   My issues with this pattern are:
   - ByteArrayOutputStream does not provide an API similar to release() 
   - The relationship between close() and release() is a little confusing
   - There exist two methods that do the same thing: through releasing the 
stream or through directly releasing ByteBufferList
   - It may be better to isolate before-commit and after-commit behaviors in 
SerializedPartition
   
   Hence the above suggestion. 😄 
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to