[jira] [Commented] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

Colin Patrick McCabe (JIRA) Mon, 27 Oct 2014 18:35:05 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186195#comment-14186195
 ]


Colin Patrick McCabe commented on HDFS-7276:
--------------------------------------------

Thanks for doing the benchmarks.  I suppose there may be some benefit to this 
pooling.  I am surprised that it was as noticeable as it was.

{code}
185  private static volatile ByteArrayManager byteArrayManager;  
186     
187       private static ByteArrayManager getByteArrayManager(Conf conf) {
188         if (byteArrayManager == null) {
189           synchronized(DFSOutputStream.class) {
190             if (byteArrayManager == null) {
191               byteArrayManager = new ByteArrayManager(
192                   conf.writeByteArray_countThreshold,
193                   conf.writeByteArray_countLimit,
194                   conf.writeByteArray_countResetTimePeriodMs);
195             }
196           }
197         }
198         return byteArrayManager;
199       }
{code}

I think this might make sense to put into {{ClientContext.java}}.  That would 
also avoid the need for {{volatile}} and locking here.

{{ByteArrayManager#FixedLengthManager}}: I'm a bit concerned that if we have a 
lot of different sized packets, we may keep around a lot of extra stuff.  This 
is most likely to happen in the context of HFlush and HSync, I guess.  I 
haven't looked carefully enough at the code to see how much change this would 
require, but could we simply have {{ByteArrayManager}} return an array equal 
*or bigger than* the requested size?  This would avoid having so many different 
allocation pools.  Something like TreeMap could easily be used here to find an 
allocation by size...

> Limit the number of byte arrays used by DFSOutputStream
> -------------------------------------------------------
>
>                 Key: HDFS-7276
>                 URL: https://issues.apache.org/jira/browse/HDFS-7276
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h7276_20141021.patch, h7276_20141022.patch, 
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch, 
> h7276_20141027b.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of 
> outstanding packets could be large.  The byte arrays created by those packets 
> could occupy a lot of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7276) Limit the number of byte arrays used by DFSOutputStream

Reply via email to