[
https://issues.apache.org/jira/browse/HDFS-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186195#comment-14186195
]
Colin Patrick McCabe commented on HDFS-7276:
--------------------------------------------
Thanks for doing the benchmarks. I suppose there may be some benefit to this
pooling. I am surprised that it was as noticeable as it was.
{code}
185 private static volatile ByteArrayManager byteArrayManager;
186
187 private static ByteArrayManager getByteArrayManager(Conf conf) {
188 if (byteArrayManager == null) {
189 synchronized(DFSOutputStream.class) {
190 if (byteArrayManager == null) {
191 byteArrayManager = new ByteArrayManager(
192 conf.writeByteArray_countThreshold,
193 conf.writeByteArray_countLimit,
194 conf.writeByteArray_countResetTimePeriodMs);
195 }
196 }
197 }
198 return byteArrayManager;
199 }
{code}
I think this might make sense to put into {{ClientContext.java}}. That would
also avoid the need for {{volatile}} and locking here.
{{ByteArrayManager#FixedLengthManager}}: I'm a bit concerned that if we have a
lot of different sized packets, we may keep around a lot of extra stuff. This
is most likely to happen in the context of HFlush and HSync, I guess. I
haven't looked carefully enough at the code to see how much change this would
require, but could we simply have {{ByteArrayManager}} return an array equal
*or bigger than* the requested size? This would avoid having so many different
allocation pools. Something like TreeMap could easily be used here to find an
allocation by size...
> Limit the number of byte arrays used by DFSOutputStream
> -------------------------------------------------------
>
> Key: HDFS-7276
> URL: https://issues.apache.org/jira/browse/HDFS-7276
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Tsz Wo Nicholas Sze
> Attachments: h7276_20141021.patch, h7276_20141022.patch,
> h7276_20141023.patch, h7276_20141024.patch, h7276_20141027.patch,
> h7276_20141027b.patch
>
>
> When there are a lot of DFSOutputStream's writing concurrently, the number of
> outstanding packets could be large. The byte arrays created by those packets
> could occupy a lot of memory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)