[ 
https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892736#comment-17892736
 ] 

ASF GitHub Bot commented on HADOOP-18873:
-----------------------------------------

steveloughran commented on PR #6010:
URL: https://github.com/apache/hadoop/pull/6010#issuecomment-2437373896

   @zzccctv just checked this. direct memory is off heap, but it is still in 
process; process exit will reclaim it. It just doesn't move around the way 
objects on heap do.
   
   what is important is that as direct memory only consumes a small amount of 
on-heap memory (the object containing a pointer to it), it doesn't trigger GCs 
even if you have been leaking them. other GCs will trigger this, call 
finalize() on the references and release the DM then. 
   
   we have a test for this somewhere




> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> ------------------------------------------------------
>
>                 Key: HADOOP-18873
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18873
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 3.3.4
>            Reporter: Pranav Saxena
>            Assignee: Pranav Saxena
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
>  # ByteArrayBlock
>  ## This creates an object of DataBlockByteArrayOutputStream (child of 
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading 
> the array.
>  ## This gets GCed.
>  # ByteBufferBlock:
>  ## There is a defined *DirectBufferPool* from which it tries to request the 
> directBuffer.
>  ## If nothing in the pool, a new directBuffer is created.
>  ## the `close` method on the this object has the responsiblity of returning 
> back the buffer to pool so it can be reused.
>  ## Since we are not calling the `close`:
>  ### The pool is rendered of less use, since each request creates a new 
> directBuffer from memory.
>  ### All the object can be GCed and the direct-memory allocated may be 
> returned on the GC. What if the process crashes, the memory never goes back 
> and cause memory issue on the machine.
>  # DiskBlock:
>  ## This creates a file on disk on which the data-to-upload is written. This 
> file gets deleted in startUpload().close().
>  
> startUpload() gives an object of BlockUploadData which gives method of 
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
> dataBlock.
>  
> Method which uses the DataBlock object: 
> https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to