[
https://issues.apache.org/jira/browse/HBASE-29123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Connell updated HBASE-29123:
------------------------------------
Description:
I look at many profile flamegraphs of my company's RegionServers. I sometimes
see memory allocation inside of {{org.apache.hadoop.io.compress.CodecPool}}
taking up roughly 1% of my CPU time. The point of a CodecPool is to avoid
allocating short-lived objects, so this is not good. Luckily, these allocations
can be avoided. Attached are three flamegraphs showing the allocations I'm
talking about.
I plan this ticket as the first of a series relating to decompression
performance. In the context of the overall series, it makes sense to fork
CodecPool out of hadoop-common and start a new copy of it in HBase. I'll do
that in this ticket and include my improvements:
Change the pool data structure from {{HashMap<Class<Compressor>,
HashSet<Compressor>>}} to {{ConcurrentHashMap<Class<Compressor>,
ConcurrentSkipListSet<Compressor>>}}. This allows the "borrow" code:
{code}
T codec = null;
Set<T> codecSet;
synchronized (pool) {
codecSet = pool.get(codecClass);
}
if (codecSet != null) {
synchronized (codecSet) {
if (!codecSet.isEmpty()) {
codec = codecSet.iterator().next();
codecSet.remove(codec);
}
}
}
{code}
to be re-written as:
{code}
if (codecClass == null) {
return null;
}
NavigableSet<T> codecSet = pool.get(codecClass);
if (codecSet != null) {
return codecSet.pollFirst();
} else {
return null;
}
{code}
thus avoiding the allocation of an iterator and the necessity of locking.
The lease counters are only read in unit tests, so I'll stop updating those
outside of testing.
was:
I look at many profile flamegraphs of my company's RegionServers. I sometimes
see memory allocation inside of {{org.apache.hadoop.io.compress.CodecPool}}
taking up roughly 1% of my CPU time. The point of a CodecPool is to avoid
allocating short-lived objects, so this is not good. Luckily, these allocations
can be avoided. Attached are three flamegraphs showing the allocations I'm
talking about.
I plan this ticket as the first of a series relating to decompression
performance. In the context of the overall series, it makes sense to fork
CodecPool out of hadoop-common and start a new copy of it in HBase. I'll do
that in this ticket and include my improvements:
Change the pool data structure from {{HashMap<Class<Compressor>,
HashSet<Compressor>>}} to {{ConcurrentMap<Class<Compressor>,
ConcurrentSkipListSet<Compressor>>}}. This allows the "borrow" code:
{code}
T codec = null;
Set<T> codecSet;
synchronized (pool) {
codecSet = pool.get(codecClass);
}
if (codecSet != null) {
synchronized (codecSet) {
if (!codecSet.isEmpty()) {
codec = codecSet.iterator().next();
codecSet.remove(codec);
}
}
}
{code}
to be re-written as:
{code}
if (codecClass == null) {
return null;
}
NavigableSet<T> codecSet = pool.get(codecClass);
if (codecSet != null) {
return codecSet.pollFirst();
} else {
return null;
}
{code}
thus avoiding the allocation of an iterator and the necessity of locking.
The lease counters are only read in unit tests, so I'll stop updating those
outside of testing.
> CodecPool has performance issues
> --------------------------------
>
> Key: HBASE-29123
> URL: https://issues.apache.org/jira/browse/HBASE-29123
> Project: HBase
> Issue Type: Improvement
> Reporter: Charles Connell
> Assignee: Charles Connell
> Priority: Minor
> Attachments: borrow-decompressor.html, lease-counting.html,
> return-decompressor.html
>
>
> I look at many profile flamegraphs of my company's RegionServers. I sometimes
> see memory allocation inside of {{org.apache.hadoop.io.compress.CodecPool}}
> taking up roughly 1% of my CPU time. The point of a CodecPool is to avoid
> allocating short-lived objects, so this is not good. Luckily, these
> allocations can be avoided. Attached are three flamegraphs showing the
> allocations I'm talking about.
> I plan this ticket as the first of a series relating to decompression
> performance. In the context of the overall series, it makes sense to fork
> CodecPool out of hadoop-common and start a new copy of it in HBase. I'll do
> that in this ticket and include my improvements:
> Change the pool data structure from {{HashMap<Class<Compressor>,
> HashSet<Compressor>>}} to {{ConcurrentHashMap<Class<Compressor>,
> ConcurrentSkipListSet<Compressor>>}}. This allows the "borrow" code:
> {code}
> T codec = null;
> Set<T> codecSet;
> synchronized (pool) {
> codecSet = pool.get(codecClass);
> }
> if (codecSet != null) {
> synchronized (codecSet) {
> if (!codecSet.isEmpty()) {
> codec = codecSet.iterator().next();
> codecSet.remove(codec);
> }
> }
> }
> {code}
> to be re-written as:
> {code}
> if (codecClass == null) {
> return null;
> }
> NavigableSet<T> codecSet = pool.get(codecClass);
> if (codecSet != null) {
> return codecSet.pollFirst();
> } else {
> return null;
> }
> {code}
> thus avoiding the allocation of an iterator and the necessity of locking.
> The lease counters are only read in unit tests, so I'll stop updating those
> outside of testing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)