[
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266226#comment-17266226
]
Benjamin Lerer edited comment on CASSANDRA-16201 at 1/15/21, 5:37 PM:
----------------------------------------------------------------------
[~marcuse] I did not have the time to go through all the branches yet.
For the 4.0 branch I got a few comments:
* Would it not make sense to use an {{HashMultiset<ByteBuffer>}} rathen than a
{{Map<ByteBuffer, Integer>}}? according to the [guava
documentation|https://github.com/google/guava/wiki/NewCollectionTypesExplained#multiset]
they seems to have been developped with that scenario in mind.
* In {{BatchStatement.getMutations}}:
{code}
partitionCounts.computeIfAbsent(stmt.metadata.id, k -> new
HashMap<>());
Map<ByteBuffer, Integer> perKeyCounts =
partitionCounts.get(stmt.metadata.id);
{code}
Should be:
{code}
Map<ByteBuffer, Integer> perKeyCounts =
partitionCounts.computeIfAbsent(stmt.metadata.id, k -> new HashMap<>());
{code}
Will it make sense to extract {{k -> new HashMap<>()}} in a variable
initialized before the loop ?
* Regarding the single table update detection, I wonder if it will not be
more efficient to do the comparison on the {{TableId}} rather than the metadata.
was (Author: blerer):
[~marcuse] I did not have the time to go through all the branches yet.
For the 4.0 branch I got a few comments:
* Would it not make sense to use an {{HashMultiset<ByteBuffer>}} rathen than a
{{Map<ByteBuffer, Integer>}}. according to the [guava
documentation|https://github.com/google/guava/wiki/NewCollectionTypesExplained#multiset]
they seems to have been developped with that scenario in mind.
* In {{BatchStatement.getMutations}}:
{code}
partitionCounts.computeIfAbsent(stmt.metadata.id, k -> new
HashMap<>());
Map<ByteBuffer, Integer> perKeyCounts =
partitionCounts.get(stmt.metadata.id);
{code}
Should be:
{code}
Map<ByteBuffer, Integer> perKeyCounts =
partitionCounts.computeIfAbsent(stmt.metadata.id, k -> new HashMap<>());
{code}
Will it make sense to extract {{k -> new HashMap<>()}} in a variable
initialized before the loop ?
* Regarding the single table update detection, would it not make sense to do
the comparison on the {{TableId}} rather than the metadata?
> Reduce amount of allocations during batch statement execution
> -------------------------------------------------------------
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Other
> Reporter: Thomas Steinmaurer
> Assignee: Marcus Eriksson
> Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: 16201_jfr_3023_alloc.png, 16201_jfr_3023_obj.png,
> 16201_jfr_3118_alloc.png, 16201_jfr_3118_obj.png, 16201_jfr_40b3_alloc.png,
> 16201_jfr_40b3_obj.png, screenshot-1.png, screenshot-2.png, screenshot-3.png,
> screenshot-4.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile,
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always
> only 1 {{BTreeRow}} in the {{BTree}}.
> !screenshot-1.png|width=100%!
> So it seems we have many, many 20K elemnts pre-allocated object arrays
> resulting in a shallow heap of 80K each, although there is only one element
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]