[
https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pranav Saxena updated HADOOP-18873:
-----------------------------------
Description:
AbfsOutputStream doesnt close the dataBlock object created for the upload.
What is the implication of not doing that:
DataBlocks has three implementations:
# ByteArrayBlock
## This creates an object of DataBlockByteArrayOutputStream (child of
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the
array.
## This gets GCed.
# ByteBufferBlock:
## There is a defined *DirectBufferPool* from which it tries to request the
directBuffer.
## If nothing in the pool, a new directBuffer is created.
## the `close` method on the this object has the responsiblity of returning
back the buffer to pool so it can be reused.
## Since we are not calling the `close`:
### The pool is rendered of less use, since each request creates a new
directBuffer from memory.
### All the object can be GCed and the direct-memory allocated may be returned
on the GC. What if the process crashes, the memory never goes back and cause
memory issue on the machine.
# DiskBlock:
## This creates a file on disk on which the data-to-upload is written. This
file gets deleted in startUpload().close().
startUpload() gives an object of BlockUploadData which gives method of
`toByteArray()` which is used in abfsOutputStream to get the byteArray in the
dataBlock.
was:
AbfsOutputStream doesnt close the dataBlock object created for the upload.
What is the implication of not doing that:
DataBlocks has three implementations:
# ByteArrayBlock
## This creates an object of DataBlockByteArrayOutputStream (child of
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the
array.
## This gets GCed.
# ByteBufferBlock:
## There is a defined *DirectBufferPool* from which it tries to request the
directBuffer.
## If nothing in the pool, a new directBuffer is created.
## the `close` method on the this object has the responsiblity of returning
back the buffer to pool so it can be reused.
## Since we are not calling the `close`:
### The pool is rendered of less use, since each request creates a new
directBuffer from memory.
### All the object can be GCed and the direct-memory allocated may be returned
on the GC. What if the process crashes, the memory never goes back and cause
memory issue on the machine.
# DiskBlock:
## This creates a file on disk on which the data-to-upload is written. This
file gets deleted in startUpload().close().
> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> ------------------------------------------------------
>
> Key: HADOOP-18873
> URL: https://issues.apache.org/jira/browse/HADOOP-18873
> Project: Hadoop Common
> Issue Type: Sub-task
> Affects Versions: 3.3.4
> Reporter: Pranav Saxena
> Assignee: Pranav Saxena
> Priority: Major
> Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
> # ByteArrayBlock
> ## This creates an object of DataBlockByteArrayOutputStream (child of
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading
> the array.
> ## This gets GCed.
> # ByteBufferBlock:
> ## There is a defined *DirectBufferPool* from which it tries to request the
> directBuffer.
> ## If nothing in the pool, a new directBuffer is created.
> ## the `close` method on the this object has the responsiblity of returning
> back the buffer to pool so it can be reused.
> ## Since we are not calling the `close`:
> ### The pool is rendered of less use, since each request creates a new
> directBuffer from memory.
> ### All the object can be GCed and the direct-memory allocated may be
> returned on the GC. What if the process crashes, the memory never goes back
> and cause memory issue on the machine.
> # DiskBlock:
> ## This creates a file on disk on which the data-to-upload is written. This
> file gets deleted in startUpload().close().
>
> startUpload() gives an object of BlockUploadData which gives method of
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the
> dataBlock.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]