saxenapranav opened a new pull request, #6010:
URL: https://github.com/apache/hadoop/pull/6010

   AbfsOutputStream doesnt close the dataBlock object created for the upload.
   
   What is the implication of not doing that:
   DataBlocks has three implementations:
   
   ByteArrayBlock
   1. This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
   2. This gets GCed.
   ByteBufferBlock:
   1. There is a defined DirectBufferPool from which it tries to request the 
directBuffer.
   2. If nothing in the pool, a new directBuffer is created.
   3. the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
   4. Since we are not calling the `close`:
       1. The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
       2. All the object can be GCed and the direct-memory allocated may be 
returned on the GC. What if the process crashes, the memory never goes back and 
cause memory issue on the machine.
   DiskBlock:
   1. This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().
    
   
   startUpload() gives an object of BlockUploadData which gives method of 
`toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
dataBlock.
   
    
   
   Method which uses the DataBlock object: 
https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298
   
   jira: https://issues.apache.org/jira/browse/HADOOP-18873


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to