[jira] [Comment Edited] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

Benjamin Lerer (Jira) Fri, 15 Jan 2021 09:38:08 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266226#comment-17266226
 ]


Benjamin Lerer edited comment on CASSANDRA-16201 at 1/15/21, 5:37 PM:
----------------------------------------------------------------------

[~marcuse] I did not have the time to go through all the branches yet.
For the 4.0 branch I got a few comments:
* Would it not make sense to use an {{HashMultiset<ByteBuffer>}} rathen than a 
{{Map<ByteBuffer, Integer>}}? according to the [guava 
documentation|https://github.com/google/guava/wiki/NewCollectionTypesExplained#multiset]
 they seems to have been developped with that scenario in mind.
* In {{BatchStatement.getMutations}}:
    {code}
            partitionCounts.computeIfAbsent(stmt.metadata.id, k -> new 
HashMap<>());
            Map<ByteBuffer, Integer> perKeyCounts = 
partitionCounts.get(stmt.metadata.id);
    {code}
   Should be:
    {code}
            Map<ByteBuffer, Integer> perKeyCounts = 
partitionCounts.computeIfAbsent(stmt.metadata.id, k -> new HashMap<>());
    {code}
    Will it make sense to extract {{k -> new HashMap<>()}} in a variable 
initialized before the loop ?
*  Regarding the  single table update detection, I wonder if it will not be 
more efficient to do the comparison on the {{TableId}} rather than the metadata.


was (Author: blerer):
[~marcuse] I did not have the time to go through all the branches yet.
For the 4.0 branch I got a few comments:
* Would it not make sense to use an {{HashMultiset<ByteBuffer>}} rathen than a 
{{Map<ByteBuffer, Integer>}}. according to the [guava 
documentation|https://github.com/google/guava/wiki/NewCollectionTypesExplained#multiset]
 they seems to have been developped with that scenario in mind.
* In {{BatchStatement.getMutations}}:
    {code}
            partitionCounts.computeIfAbsent(stmt.metadata.id, k -> new 
HashMap<>());
            Map<ByteBuffer, Integer> perKeyCounts = 
partitionCounts.get(stmt.metadata.id);
    {code}
   Should be:
    {code}
            Map<ByteBuffer, Integer> perKeyCounts = 
partitionCounts.computeIfAbsent(stmt.metadata.id, k -> new HashMap<>());
    {code}
    Will it make sense to extract {{k -> new HashMap<>()}} in a variable 
initialized before the loop ?
*  Regarding the  single table update detection, would it not make sense to do 
the comparison on the {{TableId}} rather than the metadata?

> Reduce amount of allocations during batch statement execution
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-16201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Other
>            Reporter: Thomas Steinmaurer
>            Assignee: Marcus Eriksson
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>         Attachments: 16201_jfr_3023_alloc.png, 16201_jfr_3023_obj.png, 
> 16201_jfr_3118_alloc.png, 16201_jfr_3118_obj.png, 16201_jfr_40b3_alloc.png, 
> 16201_jfr_40b3_obj.png, screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

Reply via email to